resources
Interfaces, Classes, Traits and Enums
- Tokenizer
- This class has a collection of methods for French locale specific
tokenization. In particular, it has a stemmer, a stop word remover (for
use mainly in word cloud creation). The stemmer is my stab at re-implementing
the stemmer algorithm given at http://snowball.tartarus.org and was
inspired by http://snowball.tartarus.org/otherlangs/french_javascript.txt
Here given a word, its stem is that part of the word that
is common to all its inflected variants. For example,
tall is common to tall, taller, tallest. A stemmer takes
a word and tries to produce its stem.