Package smile.llm.tokenizer
Class SentencePiece
java.lang.Object
smile.llm.tokenizer.SentencePiece
- All Implemented Interfaces:
Tokenizer
SentencePiece is an unsupervised text tokenizer by Google.
SentencePiece implements BPE and unigram language model.
-
Constructor Details
-
SentencePiece
Constructor.- Parameters:
path
- The SentencePiece model file path.- Throws:
IOException
- if fail to load the model.
-
-
Method Details
-
encode
Description copied from interface:Tokenizer
Encodes a string into a list of token IDs. -
encode
Description copied from interface:Tokenizer
Encodes a string into a list of token IDs. -
decode
Description copied from interface:Tokenizer
Decodes a list of token IDs into a string. -
tokenize
Description copied from interface:Tokenizer
Segments text into tokens.
-