Package smile.clustering
Class PartitionClustering
java.lang.Object
smile.clustering.PartitionClustering
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
CentroidClustering
,DBSCAN
,DENCLUE
,MEC
,SpectralClustering
Partition clustering. Partition methods classify the observations
into distinct non-overlapping groups.
- See Also:
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic <T extends PartitionClustering & Comparable<? super T>>
TRuns a clustering algorithm multiple times and return the best one (e.g.static <T> double[]
seed
(T[] data, T[] medoids, int[] y, ToDoubleBiFunction<T, T> distance) Initialize cluster membership of input objects with K-Means++ algorithm.toString()
-
Field Details
-
OUTLIER
public static final int OUTLIERCluster label for outliers or noises.- See Also:
-
k
public final int kThe number of clusters. -
y
public final int[] yThe cluster labels of data. -
size
public final int[] sizeThe number of observations in each cluster.
-
-
Constructor Details
-
PartitionClustering
public PartitionClustering(int k, int[] y) Constructor.- Parameters:
k
- the number of clusters.y
- the cluster labels.
-
-
Method Details
-
toString
-
seed
Initialize cluster membership of input objects with K-Means++ algorithm. Many clustering methods, e.g. k-means, need an initial clustering configuration as a seed.K-Means++ is based on the intuition of spreading the k initial cluster centers away from each other. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its distance squared to the point's closest cluster center.
The exact algorithm is as follows:
- Choose one center uniformly at random from among the data points.
- For each data point x, compute D(x), the distance between x and the nearest center that has already been chosen.
- Choose one new data point at random as a new center, using a weighted probability distribution where a point x is chosen with probability proportional to D2(x).
- Repeat Steps 2 and 3 until k centers have been chosen.
- Now that the initial centers have been chosen, proceed using standard k-means clustering.
- D. Arthur and S. Vassilvitskii. "K-means++: the advantages of careful seeding". ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.
- Anna D. Peterson, Arka P. Ghosh and Ranjan Maitra. A systematic evaluation of different methods for initializing the K-means clustering algorithm. 2010.
- Type Parameters:
T
- the type of input object.- Parameters:
data
- data objects array of size n.medoids
- an array of size k to store cluster medoids on output.y
- an array of size n to store cluster labels on output.distance
- the distance function.- Returns:
- an array of size n to store the distance of each observation to nearest medoid.
-
run
public static <T extends PartitionClustering & Comparable<? super T>> T run(int runs, Supplier<T> clustering) Runs a clustering algorithm multiple times and return the best one (e.g. smallest distortion).- Type Parameters:
T
- the data type.- Parameters:
runs
- the number of runs.clustering
- the clustering algorithm.- Returns:
- the model.
-