Package smile.hash
Interface SimHash<T>
- Type Parameters:
T
- the data type of set objects.
public interface SimHash<T>
SimHash is a technique for quickly estimating how similar two sets are.
The algorithm is used by the Google Crawler to find near duplicate pages.
-
Method Summary
-
Method Details
-
hash
Return the hash code.- Parameters:
x
- the object.- Returns:
- the hash code.
-
of
Returns theSimHash
for a set of generic features (represented as byte[]).- Parameters:
features
- the generic features.- Returns:
- the
SimHash
.
-
text
Returns theSimHash
for string tokens.- Returns:
- the
SimHash
for string tokens.
-