3.2 Kernel PCA
One of the powerful ideas behind Kernel PCA is that you do not need to explicitly map data to a high-dimensional space. This is where the kernel trick comes in — it allows us to compute dot products in the high-dimensional space without ever computing the coordinates in that space.
🔑 Key Idea
Instead of mapping data points
1. 🧠 Example: Kernel PCA with RBF Kernel
from sklearn.datasets import make_circlesfrom sklearn.decomposition import KernelPCAimport matplotlib.pyplot as plt
# 1. Create nonlinear data (concentric circles)X, y = make_circles(n_samples=400, factor=0.3, noise=0.05)
# 2. Apply kernel PCA with RBF kernelkpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)X_kpca = kpca.fit_transform(X)
# 3. Plot resultsplt.figure(figsize=(12, 5))
# Original dataplt.subplot(1, 2, 1)plt.title("Original Data")plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
# After Kernel PCAplt.subplot(1, 2, 2)plt.title("Kernel PCA (RBF Kernel)")plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y, cmap='viridis')
plt.show()
🔍 What’s Happening Here?
- The original data lies in
but is non-linearly separable. - Kernel PCA with an RBF kernel (Gaussian) acts as if it has mapped the data into a higher-dimensional space, where linear separation becomes possible.
- The
gamma
parameter controls the non-linearity: small→ broader influence; large → tight influence.
You never see the high-dimensional space. All computations use the kernel matrix
1.1 🤔 Can I Manually Map to Higher Dimensions?
Yes, for educational purposes, you could manually define a nonlinear mapping (e.g.,
1.2 Test Sample Projection
Projecting test samples in Kernel PCA is a subtle point and different from regular PCA.
🧠 Why It’s Tricky
In regular PCA, you project test samples using:
where
In Kernel PCA, you don’t have an explicit transformation
✅ How to Project Test Samples in Kernel PCA
Let’s say:
- You trained Kernel PCA on training samples
- You have a test sample
you want to project
The projection of
Where:
is the -th eigenvector of the centered kernel matrix (kernel between test sample and training data) handles kernel centering for test data
💡 In Practice: Use transform()
in Scikit-learn
If you’re using sklearn
, you don’t need to code this manually. You can simply call .transform()
:
from sklearn.decomposition import KernelPCA
# Fit kernel PCA on training datakpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)X_train_kpca = kpca.fit_transform(X_train)
# Project test samplesX_test_kpca = kpca.transform(X_test)
This automatically:
- Computes
between each test and training sample - Applies kernel centering
- Projects using the learned components
2. Maths Behind Kernel PCA
step-by-step breakdown of the math behind kernel PCA’s eigenvalue problem, focusing on clarity and insight into how it works under the hood.
🔢 1. Goal of Kernel PCA
Just like classical PCA, the goal is to find the directions (principal components) that maximize variance. But in Kernel PCA, we do this in an implicit high-dimensional space
We use a nonlinear mapping:
…but we never compute
🧮 2. Covariance Matrix in Feature Space
In PCA, you compute the covariance matrix:
In Kernel PCA, define:
We want to solve:
This is infeasible to solve directly because
🎯 3. Representing in Terms of Training Data
We assume that the solution vector
So the eigenproblem becomes:
Plug in the expression for
Interchange sums:
Now define the kernel matrix:
So you get:
Now since
✅ 4. Final Eigenvalue Problem
You solve:
This is an eigenvalue problem involving the kernel matrix
🔄 5. Centering the Kernel Matrix
Just like data needs to be zero-centered in PCA, the kernel matrix needs to be centered:
where
📈 6. Projecting New Data Points
To project a new point
(after centering adjustment)
🧠 Summary of Steps
- Compute kernel matrix
using a kernel like RBF, polynomial, etc. - Center
. - Solve eigenvalue problem:
- Normalize
so that each eigenvector has unit length in feature space. - To project a point, use dot product of kernel vector with
.