Yioop_V9.5_Source_Code

Tokenizer
in package

Application

Bengali specific tokenization code. Typically, tokenizer.php either contains a stemmer for the language in question or it specifies how many characters in a char gram

$char_gram_len

How many characters in a char gram for this locale


    public
    static    int
    $char_gram_len
     = 5

$stop_words

A list of frequently occurring terms for this locale which should be excluded from certain kinds of queries. This is also used for language detection


    public
    static    mixed
    $stop_words
     = ['হিসাবে', 'আমি', 'তার', 'যে', 'তিনি', 'ছিল', 'জন্য', 'উপর', 'হয়', 'সঙ্গে', 'তারা', 'হতে', 'এ', 'এক', 'আছে', 'এই', 'থেকে', 'দ্বারা', 'গরম', 'শব্দ', 'কিন্তু', 'কি', 'কিছু', 'হয়', 'এটা', 'আপনি', 'বা', 'ছিল', 'দী', 'এর', 'থেকে', 'এবং', 'একটি', 'মধ্যে', 'আমরা', 'করতে', 'পারেন', 'আউট', 'অন্যান্য', 'ছিল', 'যা', 'কি', 'তাদের', 'সময়', 'যদি', 'অভিলাষ', 'কিভাবে', 'তিনি বলেন,', 'একটি', 'প্রতিটি', 'বলুন', 'না', 'সেট', 'তিন', 'চান', 'বায়ু', 'ভাল', 'এছাড়াও', 'খেলা', 'ছোট', 'শেষ', 'করা', 'হোম', 'পড়া', 'হাত', 'পোর্ট', 'বড়', 'বানান', 'যোগ করা', 'এমনকি', 'জমি', 'এখানে', 'অবশ্যই', 'বড়', 'উচ্চ', 'এমন', 'অনুসরণ করা', 'আইন', 'কেন', 'জিজ্ঞাসা', 'পুরুষ', 'পরিবর্তন', 'গিয়েছিলাম', 'আলো', 'ধরনের', 'বন্ধ', 'প্রয়োজন', 'ঘর', 'ছবি', 'চেষ্টা', 'আমাদের', 'আবার', 'পশু', 'বিন্দু', 'মা', 'বিশ্বের', 'কাছাকাছি', 'নির্মাণ', 'স্ব', 'পৃথিবী', 'বাবা']

stopwordsRemover()

Removes the stop words from the page (used for Word Cloud generation and language detection)


    public
            static        stopwordsRemover(mixed $data) : mixed

Parameters

$data : mixed: either a string or an array of string to remove stop words from

Return values

mixed —

$data with no stop words

Yioop_V9.5_Source_Code_Documentation

Tokenizer
in package

Application

Tags

Table of Contents

Properties

$char_gram_len

$stop_words

Tags

Methods

stopwordsRemover()

Parameters

Return values

Search results

Tokenizer in package Application

Tags

Table of Contents

Properties

$char_gram_len

$stop_words

Tags

Methods

stopwordsRemover()

Parameters

Return values

Tokenizer
in package

Application