1.1 Fundamentals
1. Elbow Method
The elbow method is a heuristic technique used to determine the optimal number of clusters (
How it works:
- Run -means clustering for different values of (e.g., to ). 
- For each , calculate the within-cluster sum of squares (WCSS), also called inertia — it measures the total squared distance between each point and the centroid of its cluster. 
- Plot on the x-axis and WCSS on the y-axis. 
Interpretation:
- As increases, WCSS decreases because clusters are smaller. 
- The “elbow point” is where the rate of WCSS reduction sharply slows — like the bend in an arm.
- This elbow is considered the optimal because it balances compact clusters with model simplicity. 
2. 
- 
Effect of outliers: - Outliers can pull centroids toward themselves, distorting cluster assignments.
- Can lead to poor cluster quality, especially if outliers lie far from actual data clusters.
 
- 
Why this happens: - -means minimizes squared distances, which exaggerates the impact of points far from the centroid. 
 
Mitigations:
- Use -medoids or DBSCAN, which are more robust. 
- Preprocess data with outlier detection and removal.
3. Kernel PCA for Non-linear Dimensionality Reduction
Kernel PCA (Principal Component Analysis) extends PCA to non-linear relationships by using the kernel trick.
Key ideas:
- 
Standard PCA works by projecting data into directions of maximum variance — assumes linearity. 
- 
Kernel PCA: - First maps data into a higher-dimensional feature space via a non-linear kernel function (e.g., RBF, polynomial).
- Performs PCA in this new space without computing the coordinates explicitly — this is the kernel trick.
 
Steps:
- Choose a kernel function . 
- Compute the kernel matrix . 
- Center , then compute its eigenvalues and eigenvectors. 
- Project data onto top eigenvectors (principal components) of . 
Use case: Ideal for when data lies on a non-linear manifold, e.g., spiral or concentric circles.
4. Compare and Contrast PCA and t-SNE
| Feature | PCA | t-SNE | 
|---|---|---|
| Type | Linear | Non-linear | 
| Goal | Maximize global variance | Preserve local structure (similarity) | 
| Distance metric | Euclidean | Conditional probabilities (based on similarity) | 
| Interpretability | High (components are linear combinations of features) | Low (axes have no clear meaning) | 
| Scalability | Fast, scalable to large datasets | Slow, computationally expensive | 
| Output | Can be any number of dimensions | Typically 2 or 3D | 
| Use case | Preprocessing, compression, noise reduction | Data visualization (e.g., clusters in high-dimensional data) | 
Summary:
- Use PCA when your goal is compression or noise reduction.
- Use t-SNE when your goal is visualizing clusters or manifolds in high-dimensional datasets.
 
 