Package smile.nlp.collocation
Class Bigram
java.lang.Object
smile.nlp.Bigram
smile.nlp.collocation.Bigram
- All Implemented Interfaces:
Comparable<Bigram>
Collocations are expressions of multiple words which commonly co-occur.
A bigram collocation is a pair of words w1 w2 that appear together with
statistically significance.
-
Field Summary
Modifier and TypeFieldDescriptionfinal int
The frequency of bigram in the corpus.final double
The chi-square statistical score of the collocation. -
Constructor Summary
-
Method Summary
-
Field Details
-
count
public final int countThe frequency of bigram in the corpus. -
score
public final double scoreThe chi-square statistical score of the collocation.
-
-
Constructor Details
-
Bigram
Constructor.- Parameters:
w1
- the first word of bigram.w2
- the second word of bigram.count
- the frequency of bigram in the corpus.score
- the chi-square statistical score of collocation in a corpus.
-
-
Method Details
-
toString
-
compareTo
- Specified by:
compareTo
in interfaceComparable<Bigram>
-
of
Finds top k bigram collocations in the given corpus.- Parameters:
corpus
- the corpus.k
- the top k bigram to compute.minFrequency
- The minimum frequency of bigram in the corpus.- Returns:
- the significant bigram collocations in the descending order of likelihood ratio.
-
of
Finds bigram collocations in the given corpus whose p-value is less than the given threshold.- Parameters:
corpus
- the corpus.p
- the p-value thresholdminFrequency
- The minimum frequency of bigram in the corpus.- Returns:
- the significant bigram collocations in descending order of likelihood ratio.
-