Package smile.nlp.tokenizer
package smile.nlp.tokenizer
Sentence splitter and word tokenizer.
-
ClassDescriptionA sentence splitter based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).A word tokenizer based on the java.text.BreakIterator, which supports multiple natural languages (selected by locale setting).A paragraph splitter segments text into paragraphs.A word tokenizer that tokenizes English sentences using the conventions used by the Penn Treebank.A sentence splitter segments text into sentences (a string of words satisfying the grammatical rules of a language).This is a simple paragraph splitter.This is a simple sentence splitter for English.A word tokenizer that tokenizes English sentences with some differences from TreebankWordTokenizer, notably on handling not-contractions.A token is a string of characters, categorized according to the rules as a symbol.