Package smile.neighbor
Class SNLSH<K,V>
java.lang.Object
smile.neighbor.SNLSH<K,V>
- Type Parameters:
K
- the type of keys.V
- the type of associated objects.
- All Implemented Interfaces:
Serializable
,RNNSearch<K,
V>
Locality-Sensitive Hashing for Signatures.
LSH is an efficient algorithm for approximate nearest neighbor search
in high dimensional spaces by performing probabilistic dimension reduction
of data. The basic idea is to hash the input items so that similar items
are mapped to the same buckets with high probability (the number of buckets
being much smaller than the universe of possible input items).
If we are given signatures for the sets, we may divide them into bands,
and only measure the similarity of a pair of sets if they are identical
in at least one band. By choosing the size of bands appropriately, we can
eliminate most of the pairs that do not meet our threshold of similarity.
By default, the query object (reference equality) is excluded from the
neighborhood. Note that you may observe weird behavior with String objects.
JVM will pool the string literal objects. So the below variables
String a = "ABC";
String b = "ABC";
String c = "AB" + "C";
are actually equal in reference test a == b == c
. With toy data
that you type explicitly in the code, this will cause problems. Fortunately,
the data would be generally read from secondary storage in production.
References
- Moses S. Charikar. Similarity Estimation Techniques from Rounding Algorithms
- See Also:
-
Constructor Summary
-
Method Summary
-
Constructor Details
-
SNLSH
Constructor.- Parameters:
L
- the number of bands/hash tables.hash
- simhash function.
-
-
Method Details
-
put
Adds a new item.- Parameters:
key
- the key.value
- the value.
-
search
Description copied from interface:RNNSearch
Retrieves the neighbors in a fixed radius of query object, i.e.d(q, v) <= radius
.
-