resources
Interfaces, Classes, Traits and Enums
- Tokenizer
- Persian specific tokenization code. In particular, it has a stemmer,
The stemmer is a modified variant (handling prefixes slightly differently)
of my stab at porting Nick Patch's Perl port,
https://metacpan.org/pod/Lingua::Stem::UniNE::FA, of the
stemming algorithm by Ljiljana Dolamic and Jacques
Savoy of the University of Neuchâtel. The Java version of this is at
http://members.unine.ch/jacques.savoy/clef/persianStemmerUnicode.txt
(beware of Java's handling of Unicode).