Package smile.clustering
Class XMeans
java.lang.Object
smile.clustering.PartitionClustering
smile.clustering.CentroidClustering<double[],double[]>
smile.clustering.XMeans
- All Implemented Interfaces:
Serializable
,Comparable<CentroidClustering<double[],
double[]>>
X-Means clustering algorithm, an extended K-Means which tries to
automatically determine the number of clusters based on BIC scores.
Starting with only one cluster, the X-Means algorithm goes into action
after each run of K-Means, making local decisions about which subset of the
current centroids should split themselves in order to better fit the data.
The splitting decision is done by computing the Bayesian Information
Criterion (BIC).
References
- Dan Pelleg and Andrew Moore. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. ICML, 2000.
- See Also:
-
Field Summary
Fields inherited from class smile.clustering.CentroidClustering
centroids, distortion
Fields inherited from class smile.clustering.PartitionClustering
k, OUTLIER, size, y
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected double
distance
(double[] x, double[] y) The distance function.static XMeans
fit
(double[][] data, int kmax) Clustering data with the number of clusters determined by X-Means algorithm automatically.static XMeans
fit
(double[][] data, int kmax, int maxIter, double tol) Clustering data with the number of clusters determined by X-Means algorithm automatically.Methods inherited from class smile.clustering.CentroidClustering
compareTo, predict, toString
Methods inherited from class smile.clustering.PartitionClustering
run, seed
-
Constructor Details
-
XMeans
public XMeans(double distortion, double[][] centroids, int[] y) Constructor.- Parameters:
distortion
- the total distortion.centroids
- the centroids of each cluster.y
- the cluster labels.
-
-
Method Details
-
distance
protected double distance(double[] x, double[] y) Description copied from class:CentroidClustering
The distance function.- Specified by:
distance
in classCentroidClustering<double[],
double[]> - Parameters:
x
- an observation.y
- the other observation.- Returns:
- the distance.
-
fit
Clustering data with the number of clusters determined by X-Means algorithm automatically.- Parameters:
data
- the input data of which each row is an observation.kmax
- the maximum number of clusters.- Returns:
- the model.
-
fit
Clustering data with the number of clusters determined by X-Means algorithm automatically.- Parameters:
data
- the input data of which each row is an observation.kmax
- the maximum number of clusters.maxIter
- the maximum number of iterations for k-means.tol
- the tolerance of k-means convergence test.- Returns:
- the model.
-