3.2 Kernel PCA

One of the powerful ideas behind Kernel PCA is that you do not need to explicitly map data to a high-dimensional space. This is where the kernel trick comes in — it allows us to compute dot products in the high-dimensional space without ever computing the coordinates in that space.

🔑 Key Idea

Instead of mapping data points to a higher-dimensional space , we define a kernel function . Kernel PCA works directly with the kernel matrix, avoiding the need to know explicitly.

1. 🧠 Example: Kernel PCA with RBF Kernel

from sklearn.datasets import make_circles
from sklearn.decomposition import KernelPCA
import matplotlib.pyplot as plt

# 1. Create nonlinear data (concentric circles)
X, y = make_circles(n_samples=400, factor=0.3, noise=0.05)

# 2. Apply kernel PCA with RBF kernel
kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
X_kpca = kpca.fit_transform(X)

# 3. Plot results
plt.figure(figsize=(12, 5))

# Original data
plt.subplot(1, 2, 1)
plt.title("Original Data")
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')

# After Kernel PCA
plt.subplot(1, 2, 2)
plt.title("Kernel PCA (RBF Kernel)")
plt.scatter(X_kpca[:, 0], X_kpca[:, 1], c=y, cmap='viridis')

plt.show()

🔍 What’s Happening Here?

The original data lies in but is non-linearly separable.
Kernel PCA with an RBF kernel (Gaussian) acts as if it has mapped the data into a higher-dimensional space, where linear separation becomes possible.
The gamma parameter controls the non-linearity: small → broader influence; large → tight influence.

You never see the high-dimensional space. All computations use the kernel matrix .

1.1 🤔 Can I Manually Map to Higher Dimensions?

Yes, for educational purposes, you could manually define a nonlinear mapping (e.g., ), but this doesn’t scale to real problems or complex kernels.

1.2 Test Sample Projection

Projecting test samples in Kernel PCA is a subtle point and different from regular PCA.

🧠 Why It’s Tricky

In regular PCA, you project test samples using:

where is the matrix of principal components.

In Kernel PCA, you don’t have an explicit transformation — so you can’t directly project new samples unless you use the kernel trick again.

✅ How to Project Test Samples in Kernel PCA

Let’s say:

You trained Kernel PCA on training samples
You have a test sample you want to project

The projection of onto the -th principal component is:

Where:

is the -th eigenvector of the centered kernel matrix
(kernel between test sample and training data)
handles kernel centering for test data

💡 In Practice: Use `transform()` in Scikit-learn

If you’re using sklearn, you don’t need to code this manually. You can simply call .transform():

from sklearn.decomposition import KernelPCA

# Fit kernel PCA on training data
kpca = KernelPCA(n_components=2, kernel='rbf', gamma=15)
X_train_kpca = kpca.fit_transform(X_train)

# Project test samples
X_test_kpca = kpca.transform(X_test)

This automatically:

Computes between each test and training sample
Applies kernel centering
Projects using the learned components

2. Maths Behind Kernel PCA

step-by-step breakdown of the math behind kernel PCA’s eigenvalue problem, focusing on clarity and insight into how it works under the hood.

🔢 1. Goal of Kernel PCA

Just like classical PCA, the goal is to find the directions (principal components) that maximize variance. But in Kernel PCA, we do this in an implicit high-dimensional space where linear separation might be possible.

We use a nonlinear mapping:

…but we never compute directly. Instead, we rely on the kernel trick.

🧮 2. Covariance Matrix in Feature Space

In PCA, you compute the covariance matrix:

In Kernel PCA, define:

We want to solve:

This is infeasible to solve directly because may be infinite-dimensional.

🎯 3. Representing in Terms of Training Data

We assume that the solution vector lies in the span of the :

So the eigenproblem becomes:

Plug in the expression for :

Interchange sums:

Now define the kernel matrix:

So you get:

Now since are linearly independent, we can equate coefficients:

✅ 4. Final Eigenvalue Problem

You solve:

This is an eigenvalue problem involving the kernel matrix .

🔄 5. Centering the Kernel Matrix

Just like data needs to be zero-centered in PCA, the kernel matrix needs to be centered:

where is an matrix with all entries .

📈 6. Projecting New Data Points

To project a new point onto the -th principal component:

(after centering adjustment)

🧠 Summary of Steps

Compute kernel matrix using a kernel like RBF, polynomial, etc.
Center .
Solve eigenvalue problem:
Normalize so that each eigenvector has unit length in feature space.
To project a point, use dot product of kernel vector with .