kpca

fun <T> kpca(data: DataFrame, kernel: MercerKernel<DoubleArray>, k: Int, threshold: Double = 1.0E-4): KernelPCA

Kernel principal component analysis. Kernel PCA is an extension of principal component analysis (PCA) using techniques of kernel methods. Using a kernel, the originally linear operations of PCA are done in a reproducing kernel Hilbert space with a non-linear mapping.

In practice, a large data set leads to a large Kernel/Gram matrix K, and storing K may become a problem. One way to deal with this is to perform clustering on your large dataset, and populate the kernel with the means of those clusters. Since even this method may yield a relatively large K, it is common to compute only the top P eigenvalues and eigenvectors of K.

Kernel PCA with an isotropic kernel function is closely related to metric MDS. Carrying out metric MDS on the kernel matrix K produces an equivalent configuration of points as the distance (2(1 - K(xi, xj)))1/2 computed in feature space.

Kernel PCA also has close connections with Isomap, LLE, and Laplacian eigenmaps.

====References:====

  • Bernhard Scholkopf, Alexander Smola, and Klaus-Robert Muller. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 1998.

Parameters

data

training data.

kernel

Mercer kernel to compute kernel matrix.

k

choose top k principal components used for projection.

threshold

only principal components with eigenvalues larger than the given threshold will be kept.