Adjustment to bloom filter size calc, slight refactoring IndexDictionary and Index Shard, Improvements to guessLocaleFromString to handle hindi, more work on nwordgram filters, simplify and speed up each summarizer, also refactor and get sentences more accurately, improve english tokenizer's parser for question answering, tweak some test cases, a=chris

Author Chris Pollett <chris@pollett.org>
Author date 2018-06-30 21:Jun:th
Author local date 2018-06-30 14:Jun:th -0700
Committer Chris Pollett <chris@pollett.org>
Committer date 2018-06-30 21:Jun:th
Committer local date 2018-06-30 14:Jun:th -0700
Commit df92a32b3913b23b7e9a7fe492836d04288f8f15
Tree fb0b09b20531e53be1aed1339c15eb938de9fe19
Parent 2256124124589641b0e8b7245c56e4fb44b5cfe6
Adjustment to bloom filter size calc, slight refactoring IndexDictionary and Index Shard, Improvements to guessLocaleFromString to handle hindi, more work on nwordgram filters, simplify and speed up each summarizer, also refactor and get sentences more accurately, improve english tokenizer's parser for question answering, tweak some test cases, a=chris
Affected files:
src/configs/Config.php
src/configs/TokenTool.php
src/controllers/components/CrawlComponent.php
src/executables/ArcTool.php
src/executables/Fetcher.php
src/library/BloomFilterFile.php
src/library/IndexDictionary.php
src/library/IndexShard.php
src/library/LocaleFunctions.php
src/library/NWordGrams.php
src/library/PhraseParser.php
src/library/UrlParser.php
src/library/processors/HtmlProcessor.php
src/library/processors/TextProcessor.php
src/library/summarizers/CentroidSummarizer.php
src/library/summarizers/CentroidWeightedSummarizer.php
src/library/summarizers/GraphBasedSummarizer.php
src/library/summarizers/ScrapeSummarizer.php
src/library/summarizers/Summarizer.php
src/locale/ar/resources/Tokenizer.php
src/locale/de/resources/Tokenizer.php
src/locale/en_US/resources/Tokenizer.php
src/locale/en_US/resources/all_aux_grams.txt
src/locale/en_US/resources/all_word_grams.ftr
src/locale/es/resources/Tokenizer.php
src/locale/fa/resources/Tokenizer.php
src/locale/fr_FR/resources/Tokenizer.php
src/locale/hi/resources/Tokenizer.php
src/locale/hi/resources/all_aux_grams.txt
src/locale/hi/resources/all_word_grams.ftr
src/locale/it/resources/Tokenizer.php
src/locale/ru/resources/Tokenizer.php
tests/EnTokenizerTest.php
tests/IndexDictionaryTest.php
tests/IndexShardTest.php
tests/PhraseParserTest.php
ViewGit