Yioop_V9.5_Source_Code

Tokenizer
in package

Application

Japanese specific tokenization code. Typically, tokenizer.php either contains a stemmer for the language in question or it specifies how many characters in a char gram

$char_gram_len

How many characters in a char gram for this locale


    public
    static    int
    $char_gram_len
     = 3

$stop_words

A list of frequently occurring terms for this locale which should be excluded from certain kinds of queries. This is also used for language detection


    public
    static    mixed
    $stop_words
     = ['ように', '私は', '彼の', 'その', '彼', 'た', 'ために', '上の', 'アール', 'とともに', '彼ら', 'ある', 'アット', '一つ', '持っている', 'この', 'から', 'バイ', 'ホット', '言葉', 'しかし', '何', 'いくつかの', 'です', 'それ', 'あなた', 'または', '持っていた', 'インクルード', 'の', 'へ', 'そして', 'は', 'で', '我々', '缶', 'アウト', 'その他', 'だった', 'これ', 'やる', 'それらの', '時間', 'もし', '意志', '方法', '前記', 'の', 'それぞれ', '言う', 'し', 'セット', '個', '欲しい', '空気', 'よく', 'また', '遊ぶ', '小さい', '終わり', '置く', 'ホーム', '読む', '手', 'ポート', '大きい', 'スペル', '加える', 'さらに', '土地', 'ここに', 'しなければならない', '大きい', '高い', 'そのような', '続く', '行為', 'なぜ', '頼む', '人々', '変更', '行ってきました', '光', '種類', 'オフ', '必要', '家', '絵', '試す', '私たち', '再び', '動物', 'ポイント', '母', '世界', '近く', 'ビルド', '自己', '地球', '父']

stopwordsRemover()

Removes the stop words from the page (used for Word Cloud generation and language detection)


    public
            static        stopwordsRemover(mixed $data) : mixed

Parameters

$data : mixed: either a string or an array of string to remove stop words from

Return values

mixed —

$data with no stop words

Yioop_V9.5_Source_Code_Documentation

Tokenizer
in package

Application

Tags

Table of Contents

Properties

$char_gram_len

$stop_words

Tags

Methods

stopwordsRemover()

Parameters

Return values

Search results

Tokenizer in package Application

Tags

Table of Contents

Properties

$char_gram_len

$stop_words

Tags

Methods

stopwordsRemover()

Parameters

Return values

Tokenizer
in package

Application