resources
Interfaces, Classes, Traits and Enums
- Tokenizer
- This class has a collection of methods for Russian locale specific
tokenization. In particular, it has a stemmer, a stop word remover (for
use mainly in word cloud creation). The stemmer is a modification
(with bug fixes ) of Dennis Kreminsky's stemmer from:
http://snowball.tartarus.org/otherlangs/russian_php5.txt
Here given a word, its stem is that part of the word that
is common to all its inflected variants. For example,
tall is common to tall, taller, tallest. A stemmer takes
a word and tries to produce its stem.