resources
Interfaces, Classes, Traits and Enums
- Tokenizer
- This class has a collection of methods for English locale specific tokenization. In particular, it has a stemmer, a stop word remover (for use mainly in word cloud creation), and a part of speech tagger (for question answering). The stemmer is my stab at implementing the Porter Stemmer algorithm presented http://tartarus.org/~martin/PorterStemmer/def.txt The code is based on the non-thread safe C version given by Martin Porter.