Class LancasterStemmer

java.lang.Object
smile.nlp.stemmer.LancasterStemmer
All Implemented Interfaces:
Function<String,String>, Stemmer

public class LancasterStemmer extends Object implements Stemmer
The Paice/Husk Lancaster stemming algorithm. The stemmer is a conflation based iterative stemmer. The stemmer, although remaining efficient and easily implemented, is known to be very strong and aggressive. The stemmer utilizes a single table of rules, each of which may specify the removal or replacement of an ending. For details, see

References

  1. Paice, Another stemmer, SIGIR Forum, 24(3), 56-61, 1990.
  • Constructor Details

    • LancasterStemmer

      public LancasterStemmer()
      Constructor with default rules. By default, the stemmer will not strip prefix from words.
    • LancasterStemmer

      public LancasterStemmer(boolean stripPrefix)
      Constructor with default rules.
      Parameters:
      stripPrefix - true if the stemmer will strip prefix such as kilo, micro, milli, intra, ultra, mega, nano, pico, pseudo.
    • LancasterStemmer

      public LancasterStemmer(InputStream customizedRules) throws IOException
      Constructor with customized rules. By default, the stemmer will not strip prefix from words.
      Parameters:
      customizedRules - an input stream to read customized rules.
      Throws:
      IOException - when fails to read the rule file.
    • LancasterStemmer

      public LancasterStemmer(InputStream customizedRules, boolean stripPrefix) throws IOException
      Constructor with customized rules.
      Parameters:
      customizedRules - an input stream to read customized rules.
      stripPrefix - true if the stemmer will strip prefix such as kilo, micro, milli, intra, ultra, mega, nano, pico, pseudo.
      Throws:
      IOException - when fails to read the rule file.
  • Method Details

    • stem

      public String stem(String word)
      Description copied from interface: Stemmer
      Transforms a word into its root form.
      Specified by:
      stem in interface Stemmer
      Parameters:
      word - the word.
      Returns:
      the stem.