Yioop_V9.5_Source_Code_Documentation

tests

Interfaces, Classes, Traits and Enums

BloomFilterFileTest
Used to test that the BloomFilterFile class provides the basic functionality of a persistent set. I.e., we can insert things into it, and we can do membership testing
BmpProcessorTest
UnitTest for the BmpProcessor class. A BmpProcessor is used to process a .bmp file and extract summary from it. This class tests the processing of an .bmp file.
BPlusTreeTest
Yioop B+-tree Unit Class
CrawlQueueBundleTest
UnitTest for the CrawlQueueBundle class.
DeTokenizerTest
Code used to test the German stemming algorithm.
DocxProcessorTest
UnitTest for the DocxProcessor class. It is used to process docx files which are a zip of an xml-based format
ElTokenizerTest
Code used to test the Greek stemming algorithm.
EnTokenizerTest
Code used to test the English stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/porter/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/porter/output.txt Code uses original Porter stemmer, not Porter 2
EpubProcessorTest
UnitTest for the EpubProcessor class. An EpubProcessor is used to process a .epub (ebook publishing standard) file and extract summary from it. This class tests the processing of an .epub file format by EpubProcessor.
EsTokenizerTest
Code used to test the French stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/spanish/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/spanish/output.txt
FaTokenizerTest
Code used to test the Persian stemming algorithm. The inputs for the algorithm came from the sample text file for the Hamshahri Collection found at http://ece.ut.ac.ir/DBRG/Hamshahri/download.html The stemmed results come from the Java program that the PHP stemmer is based off of at http://members.unine.ch/jacques.savoy/clef/persianStemmerArabic.txt
FetchUrlTest
Used to test auxiliary functions related to downloading pages with the FetchUrl class.
FrTokenizerTest
Code used to test the French stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/french/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/french/output.txt
HashTableTest
Used to test that the HashTable class properly stores key value pairs, handles insert, deletes, collisions okay. It should also detect when table is full
HiTokenizerTest
Code used to test the Hindi stemming algorithm. The inputs for the algorithm came from the sample text file for the The stemmed results come from the Java program that the PHP stemmer is based off of at http://members.unine.ch/jacques.savoy/clef/HindiStemmerLight.java.txt which has since been modified to try to improve accuracy
IconProcessorTest
UnitTest for the IconProcessor class. A IconProcessor is used to process a .ico file and extract summary from it. This class tests the processing of an .ico file.
IndexDictionaryTest
Used to test that the IndexDictionary class can properly add shards and retrieve correct posting slice ranges in the shards.
IndexDocumentBundleTest
Used to test that the IndexDocumentBundle class can properly add and retrieve documents. Check its prepareMethod correctly deduplicates documents before inverted index creation. Tests inverted index creation and adding terms to IndexDocumentBundle's BPlusTree. Check look up of documents according to term.
IndexManagerTest
Used to run unit tests for the IndexManager class. IndexManager acts a a resource manager for the open indexes used to process a query.
IndexShardTest
Used to test that the IndexShard class can properly add new documents and retrieve those documents by word. Checks that doc offsets can be updated, shards can be saved and reloaded
ItTokenizerTest
My code for testing the Italian stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/italian/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/italian/output.txt
LinearHashTableTest
Used to test that the LinearHashTable class properly stores key value pairs, handles insert, deletes, retrievals okay.
NlTokenizerTest
Code used to test the Dutch stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/Dutch/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/Dutch/output.txt
PackedTableToolsTest
Used to test the PackedTableTools class. PackedTableTools are used for reading and storing rows with respect to some signature
PdfProcessorTest
UnitTest for the PdfProcessor class. A PdfProcessor is used to process a .pdf file and extract summary from it. This class tests the processing of an .pdf file.
PhraseParserTest
Used to test that the PhraseParser class. Want to make sure bigram extracting works correctly
PptxProcessorTest
UnitTest for the PptxProcessor class. It is used to process pptx files which are a zip of an xml-based format
PriorityQueueTest
Used to test the PriorityQueue class that is used to figure out which URL to crawl next
PtTokenizerTest
Code used to test the Portuguese stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/porter/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/porter/output.txt Code uses original Porter stemmer, not Porter 2
QueueServerTest
Used to test functions related to scheduling websites to crawl for a web crawl (the responsibility of a QueueServer)
RuTokenizerTest
Code used to test the Russian stemming algorithm. The inputs for the algorithm are words in http://snowball.tartarus.org/algorithms/russian/voc.txt and the resulting stems are compared with the stem words in http://snowball.tartarus.org/algorithms/russian/output.txt
ScraperManagerTest
Code used to test Web Scrapers.
Sha1JavascriptTest
Used to test the Javascript implementation of the sha1 function.
StringArrayTest
Used to test that the StringArray class properly stores/retrieves values, and can handle loading and saving
TrieTest
Used to test that the Trie class properly stores words that could be used for an autosuggest dictionary
UrlParserTest
Used to test that the UrlParser class. For now, want to see that the method canonicalLink is working correctly and that isPathMemberRegexPaths (used in robot_processor.php) works
UtilityTest
Used to test the various methods in utility, in particular, those related to posting lists and time.
VersionManagerTest
UnitTests for the VersionManager class.
WebArchiveTest
UnitTest for the WebArchive class. A web archive is used to store array-based objects persistently to a file. This class tests storing and retrieving from such an archive.
WikiParserTest
Tests the functionality of WikiParser used when processing Wikipedia dumps and used for Yioop's internal wiki infrastructure
WordIteratorTest
Tests the functionality of the WordIterator class used to iterate over documents in an IndexDocumentBundle containing a term.
XlsxProcessorTest
Used to test that the XlsxProcessor class provides the basic functionality of getting the tile, description, languages and links
ZhTokenizerTest
Used to test Named Entity Tagging and Part of Speech Tagging for the Chinese Language. Word segmentation is already tested in

Search results