Package smile.neighbor
Class MPLSH<E>
java.lang.Object
smile.neighbor.LSH<E>
smile.neighbor.MPLSH<E>
- Type Parameters:
E
- the type of data objects in the hash table.
- All Implemented Interfaces:
Serializable
,KNNSearch<double[],
,E> RNNSearch<double[],
E>
Multi-Probe Locality-Sensitive Hashing. LSH is an efficient algorithm for
approximate nearest neighbor search in high dimensional spaces
by performing probabilistic dimension reduction of data. The basic idea
is to hash the input items so that similar items are mapped to the same
buckets with high probability (the number of buckets being much smaller
than the universe of possible input items). A drawback of LSH is the
requirement for a large number of hash tables in order to achieve good
search quality. Multi-probe LSH is designed to overcome this drawback.
Multi-probe LSH intelligently probes multiple buckets that are likely to
contain query results in a hash table.
By default, the query object (reference equality) is excluded from the neighborhood.
References
- Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. Multi-probe LSH: efficient indexing for high-dimensional similarity search. VLDB, 2007.
- Alexis Joly and Olivier Buisson. A posteriori multi-probe locality sensitive hashing. ACM international conference on Multimedia, 2008.
- See Also:
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Fits the posteriori multiple probe algorithm.void
Fits the posteriori multiple probe algorithm.void
Train the posteriori multiple probe algorithm.protected void
initHashTable
(int d, int L, int k, double w, int H) Initialize the hash tables.nearest
(double[] q) Returns the nearest neighbor.nearest
(double[] q, double recall, int T) Returns the approximate nearest neighbor.void
Retrieves the neighbors in a fixed radius of query object, i.e.void
Search the neighbors in the given radius of query object, i.e.search
(double[] q, int k) Retrieves the k nearest neighbors to the query key.search
(double[] q, int k, double recall, int T) Returns the approximate k-nearest neighbors.toString()
-
Constructor Details
-
MPLSH
public MPLSH(int d, int L, int k, double w) Constructor.- Parameters:
d
- the dimensionality of data.L
- the number of hash tables.k
- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.
-
MPLSH
public MPLSH(int d, int L, int k, double w, int H) Constructor.- Parameters:
d
- the dimensionality of data.L
- the number of hash tables.k
- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose an r value that is too large, which will increase the query time.H
- the number of buckets of hash tables.
-
-
Method Details
-
initHashTable
protected void initHashTable(int d, int L, int k, double w, int H) Description copied from class:LSH
Initialize the hash tables.- Overrides:
initHashTable
in classLSH<E>
- Parameters:
d
- the dimensionality of data.L
- the number of hash tables.k
- the number of random projection hash functions, which is usually set to log(N) where N is the dataset size.w
- the width of random projections. It should be sufficiently away from 0. But we should not choose a w value that is too large, which will increase the query time.H
- the size of universal hash tables.
-
toString
-
fit
Fits the posteriori multiple probe algorithm.- Parameters:
range
- the neighborhood search data structure.samples
- the training samples.radius
- the radius for range search.
-
fit
Fits the posteriori multiple probe algorithm.- Parameters:
range
- the neighborhood search data structure.samples
- the training samples.radius
- the radius for range search.Nz
- the number of quantized values.
-
fit
public void fit(RNNSearch<double[], double[]> range, double[][] samples, double radius, int Nz, double sigma) Train the posteriori multiple probe algorithm.- Parameters:
range
- the neighborhood search data structure.samples
- the training samples.radius
- the radius for range search.Nz
- the number of quantized values.sigma
- the Parzen window width.
-
nearest
Description copied from interface:KNNSearch
Returns the nearest neighbor. In machine learning, we often build a nearest neighbor search data structure, and then search with object in the same dataset. The object itself is of course the nearest one with distance 0. Since this is generally useless, we check the reference during the search and excludes the query object from the results. -
nearest
Returns the approximate nearest neighbor. A posteriori multiple probe model has to be trained already.- Parameters:
q
- the query object.recall
- the expected recall rate.T
- the maximum number of probes.- Returns:
- the approximate nearest neighbor.
-
search
Description copied from interface:KNNSearch
Retrieves the k nearest neighbors to the query key. -
search
Returns the approximate k-nearest neighbors. A posteriori multiple probe model has to be trained already.- Parameters:
q
- the query object.k
- the number of nearest neighbors to search for.recall
- the expected recall rate.T
- the maximum number of probes.- Returns:
- the approximate k-nearest neighbors.
-
search
Description copied from interface:RNNSearch
Retrieves the neighbors in a fixed radius of query object, i.e.d(q, v) <= radius
. -
search
public void search(double[] q, double radius, List<Neighbor<double[], E>> neighbors, double recall, int T) Search the neighbors in the given radius of query object, i.e.d(q, v) <= radius
.- Parameters:
q
- the query object.radius
- the radius of search range.neighbors
- the list to store found neighbors in the given range on output.recall
- the expected recall rate.T
- the maximum number of probes.
-