Class RandomProjection
- All Implemented Interfaces:
Serializable
,Function<Tuple,
,Tuple> Transform
Fortunately, we can reduce the dimension of the data far more drastically for the particular case of mixtures of Gaussians. In fact, we can map the data into just d = O(log k) dimensions, where k is the number of Gaussians. Therefore, the amount of data we will need is only polynomial in k. Note that this projected dimension is independent of the number of data points and of their original dimension. Experiments show that a value of log k works nicely.
Besides, even if the original clusters are highly eccentric (that is, far from spherical), random projection will make them more spherical. Note that eccentric clusters are problematic for the EM algorithm because intermediate covariance matrices may become singular or close to singular. Note that for high enough dimension, almost the entire Gaussian distribution lies in a thin shell.
References
- S. Dasgupta. Experiments with random projection. UAI, 2000.
- D. Achlioptas. Database-friendly random projections. 2001.
- Chinmay Hegde, Michael Wakin, and Richard Baraniuk. Random projections for manifold learning. NIPS, 2007.
- See Also:
-
Field Summary
Fields inherited from class smile.feature.extraction.Projection
columns, projection, schema
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic RandomProjection
Generates a non-sparse random projection.static RandomProjection
Generates a sparse random projection.Methods inherited from class smile.feature.extraction.Projection
apply, apply, apply, apply, postprocess, preprocess
-
Constructor Details
-
RandomProjection
Constructor.- Parameters:
projection
- the projection matrix.columns
- the columns to transform when applied on Tuple/DataFrame.
-
-
Method Details
-
of
Generates a non-sparse random projection.- Parameters:
n
- the dimension of input space.p
- the dimension of feature space.columns
- the columns to transform when applied on Tuple/DataFrame.- Returns:
- the model.
-
sparse
Generates a sparse random projection.- Parameters:
n
- the dimension of input space.p
- the dimension of feature space.columns
- the columns to transform when applied on Tuple/DataFrame.- Returns:
- the model.
-