resources
Interfaces, Classes, Traits and Enums
- Tokenizer
- Hindi specific tokenization code. In particular, it has a stemmer,
The stemmer is my stab at porting Ljiljana Dolamic (University of Neuchatel,
www.unine.ch/info/clef/) Java stemming algorithm:
http://members.unine.ch/jacques.savoy/clef/HindiStemmerLight.java.txt
Here given a word, its stem is that part of the word that
is common to all its inflected variants. For example,
tall is common to tall, taller, tallest. A stemmer takes
a word and tries to produce its stem.