Unsupervised Learning#

No Output (Label): The algorithm works solely with input data without any labeled output.

Creates a New Data Representation:

  • Transforms the data into a more interpretable form compared to its original representation.

Aim: Discovering Hidden Structures

  • The primary goal is to uncover underlying structures or distributions within the data.

Tasks:

  • Clustering:

    • Divides the data into distinct groups of similar items.

  • Dimension Reduction:

    • Reduces the number of features while preserving the essential characteristics of the data.

  • Outlier Detection:

    • Identifies data points that deviate significantly from the rest of the data.

Examples:

  • Topic Extraction: Identifying themes or topics from a collection of text documents (e.g., reviews).

  • Organization of Pictures: Grouping images based on similarities.

DISADVANTAGE: No Success Metric

  • Since there are no labeled outputs, it’s challenging to measure the algorithm’s accuracy or success.

APPLICATIONS

  • Exploratory Data Analysis: Often used to gain a deeper understanding of the data.

  • Preprocessing: Serves as a preprocessing step for supervised learning algorithms.

Iris Data#

  • Consider only the first two features for visualizition purposes

from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
f_names = load_iris().feature_names
f_names
['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']
t_names = load_iris().target_names
t_names
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
# Iris in Supervised Learning
import matplotlib.pyplot as plt
import seaborn as sns
plt.title('Iris in Supervised Learning')
sns.scatterplot( x=X_train[:,0], y=X_train[:,1], hue=y_train, palette='bright' )
plt.xlabel(f_names[0])
plt.ylabel(f_names[1]);
_images/282f156ca386b4598a2d303aee0bed584b9ca2961a301b6d5b2ffe0e077e4038.png
# Iris in Unsupervised Learning
plt.title('Iris in Unsupervised Learning')
train_targets_iris = [t_names[i] for i in y_train]
sns.scatterplot( x=X_train[:,0], y=X_train[:,1], color='black')
plt.xlabel(f_names[0])
plt.ylabel(f_names[1]);
_images/e274b5c7361d8f759375973b8723279153fdcf8f9f681623f368bc439967be62.png