1.1 Fundamentals

1. Elbow Method

The elbow method is a heuristic technique used to determine the optimal number of clusters () in -means clustering.

How it works:

Run -means clustering for different values of (e.g., to ).
For each , calculate the within-cluster sum of squares (WCSS), also called inertia — it measures the total squared distance between each point and the centroid of its cluster.
Plot on the x-axis and WCSS on the y-axis.

Interpretation:

As increases, WCSS decreases because clusters are smaller.
The “elbow point” is where the rate of WCSS reduction sharply slows — like the bend in an arm.
This elbow is considered the optimal because it balances compact clusters with model simplicity.

-means is sensitive to outliers due to its use of Euclidean distance:

Effect of outliers:
- Outliers can pull centroids toward themselves, distorting cluster assignments.
- Can lead to poor cluster quality, especially if outliers lie far from actual data clusters.
Why this happens:
- -means minimizes squared distances, which exaggerates the impact of points far from the centroid.

Mitigations:

Kernel PCA (Principal Component Analysis) extends PCA to non-linear relationships by using the kernel trick.

Key ideas:

Standard PCA works by projecting data into directions of maximum variance — assumes linearity.
Kernel PCA:
- First maps data into a higher-dimensional feature space via a non-linear kernel function (e.g., RBF, polynomial).
- Performs PCA in this new space without computing the coordinates explicitly — this is the kernel trick.

Steps:

Use case: Ideal for when data lies on a non-linear manifold, e.g., spiral or concentric circles.

Feature	PCA	t-SNE
Type	Linear	Non-linear
Goal	Maximize global variance	Preserve local structure (similarity)
Distance metric	Euclidean	Conditional probabilities (based on similarity)
Interpretability	High (components are linear combinations of features)	Low (axes have no clear meaning)
Scalability	Fast, scalable to large datasets	Slow, computationally expensive
Output	Can be any number of dimensions	Typically 2 or 3D
Use case	Preprocessing, compression, noise reduction	Data visualization (e.g., clusters in high-dimensional data)

Summary:

Use PCA when your goal is compression or noise reduction.
Use t-SNE when your goal is visualizing clusters or manifolds in high-dimensional datasets.