smile.clustering.MEC<T>

Type Parameters:: T - the data type of model input objects.

All Implemented Interfaces:: Serializable, Comparable<MEC<T>>

public class MEC<T> extends Partitioning implements Comparable<MEC<T>>

Non-parametric Minimum Conditional Entropy Clustering. This method performs very well especially when the exact number of clusters is unknown. The method can also correctly reveal the structure of data and effectively identify outliers simultaneously.

The clustering criterion is based on the conditional entropy H(C | x), where C is the cluster label and x is an observation. According to Fano's inequality, we can estimate C with a low probability of error only if the conditional entropy H(C | X) is small. MEC also generalizes the criterion by replacing Shannon's entropy with Havrda-Charvat's structural α-entropy. Interestingly, the minimum entropy criterion based on structural α-entropy is equal to the probability error of the nearest neighbor method when α= 2. To estimate p(C | x), MEC employs Parzen density estimation, a nonparametric approach.

MEC is an iterative algorithm starting with an initial partition given by any other clustering methods, e.g. k-means, CLARNAS, hierarchical clustering, etc. Note that a random initialization is NOT appropriate.

References

See Also:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

MEC.Options

MEC hyperparameters.
Constructor Summary

Constructors

Constructor

Description

MEC(int k, int[] group, double entropy, double radius, RNNSearch<T,T> nns)

Constructor.
Method Summary

Modifier and Type

Method

Description

int

compareTo(MEC<T> o)

double

entropy()

Returns the conditional entropy of clusters.

static <T> MEC<T>

fit(T[] data, Distance<T> distance, int k, double radius)

Clustering the data.

static <T> MEC<T>

fit(T[] data, RNNSearch<T,T> nns, int[] group, MEC.Options options)

Clustering the data.

int

predict(T x)

Cluster a new instance.

double

radius()

Returns the neighborhood radius.

String

toString()

Methods inherited from class smile.clustering.Partitioning
group, group, k, size, size

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Constructor Details
- MEC
  
  public MEC(int k, int[] group, double entropy, double radius, RNNSearch<T,T> nns)
  
  Constructor.
  
  Parameters:
  
  k - the number of clusters.
  
  group - the cluster labels.
  
  entropy - the conditional entropy of clusters.
  
  radius - the neighborhood radius.
  
  nns - the data structure for neighborhood search.
Method Details
- entropy
  
  public double entropy()
  
  Returns the conditional entropy of clusters.
  
  Returns:
  
  the conditional entropy of clusters.
- radius
  
  public double radius()
  
  Returns the neighborhood radius.
  
  Returns:
  
  the neighborhood radius.
- compareTo
  
  public int compareTo(MEC<T> o)
  
  Specified by:
  
  compareTo in interface Comparable<T>
- fit
  
  public static <T> MEC<T> fit(T[] data, Distance<T> distance, int k, double radius)
  
  Clustering the data.
  
  Type Parameters:
  
  T - the data type.
  
  Parameters:
  
  data - the observations.
  
  distance - the distance function.
  
  k - the number of clusters. Note that this is just a hint. The final number of clusters may be less.
  
  radius - the neighborhood radius.
  
  Returns:
  
  the model.
- fit
  
  public static <T> MEC<T> fit(T[] data, RNNSearch<T,T> nns, int[] group, MEC.Options options)
  
  Clustering the data.
  
  Type Parameters:
  
  T - the data type.
  
  Parameters:
  
  data - the observations.
  
  nns - the neighborhood search data structure.
  
  group - the initial clustering assignment.
  
  options - the hyperparameters.
  
  Returns:
  
  the model.
- predict
  
  public int predict(T x)
  
  Cluster a new instance.
  
  Parameters:
  
  x - a new instance.
  
  Returns:
  
  the cluster label. Note that it may be Clustering.OUTLIER.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Partitioning

Class MEC<T>

References

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class smile.clustering.Partitioning

Methods inherited from class java.lang.Object

Constructor Details

MEC

Method Details

entropy

radius

compareTo

fit

fit

predict

toString