tests
Interfaces, Classes, Traits and Enums
- BloomFilterFileTest
- Used to test that the BloomFilterFile class provides the basic functionality
of a persistent set. I.e., we can insert things into it, and we can do
membership testing
- BmpProcessorTest
- UnitTest for the BmpProcessor class. A BmpProcessor is used to process
a .bmp file and extract summary from it. This
class tests the processing of an .bmp file.
- BPlusTreeTest
- Yioop B+-tree Unit Class
- CrawlQueueBundleTest
- UnitTest for the CrawlQueueBundle class.
- DeTokenizerTest
- Code used to test the German stemming algorithm.
- DocxProcessorTest
- UnitTest for the DocxProcessor class. It is used to process
docx files which are a zip of an xml-based format
- ElTokenizerTest
- Code used to test the Greek stemming algorithm.
- EnTokenizerTest
- Code used to test the English stemming algorithm. The inputs for the
algorithm are words in
http://snowball.tartarus.org/algorithms/porter/voc.txt and the resulting
stems are compared with the stem words in
http://snowball.tartarus.org/algorithms/porter/output.txt
Code uses original Porter stemmer, not Porter 2
- EpubProcessorTest
- UnitTest for the EpubProcessor class. An EpubProcessor is used to process
a .epub (ebook publishing standard) file and extract summary from it. This
class tests the processing of an .epub file format by EpubProcessor.
- EsTokenizerTest
- Code used to test the French stemming algorithm. The inputs for the
algorithm are words in
http://snowball.tartarus.org/algorithms/spanish/voc.txt and the resulting
stems are compared with the stem words in
http://snowball.tartarus.org/algorithms/spanish/output.txt
- FaTokenizerTest
- Code used to test the Persian stemming algorithm. The inputs for the
algorithm came from the sample text file for the Hamshahri Collection
found at http://ece.ut.ac.ir/DBRG/Hamshahri/download.html
The stemmed results come from the Java program that the PHP stemmer is
based off of at
http://members.unine.ch/jacques.savoy/clef/persianStemmerArabic.txt
- FetchUrlTest
- Used to test auxiliary functions related to downloading pages with the
FetchUrl class.
- FrTokenizerTest
- Code used to test the French stemming algorithm. The inputs for the
algorithm are words in
http://snowball.tartarus.org/algorithms/french/voc.txt and the resulting
stems are compared with the stem words in
http://snowball.tartarus.org/algorithms/french/output.txt
- HashTableTest
- Used to test that the HashTable class properly stores key value pairs,
handles insert, deletes, collisions okay. It should also detect when
table is full
- HiTokenizerTest
- Code used to test the Hindi stemming algorithm. The inputs for the
algorithm came from the sample text file for the
The stemmed results come from the Java program that the PHP stemmer is
based off of at
http://members.unine.ch/jacques.savoy/clef/HindiStemmerLight.java.txt
which has since been modified to try to improve accuracy
- IconProcessorTest
- UnitTest for the IconProcessor class. A IconProcessor is used to process
a .ico file and extract summary from it. This
class tests the processing of an .ico file.
- IndexDictionaryTest
- Used to test that the IndexDictionary class can properly add shards
and retrieve correct posting slice ranges in the shards.
- IndexDocumentBundleTest
- Used to test that the IndexDocumentBundle class can properly add and
retrieve documents. Check its prepareMethod correctly deduplicates
documents before inverted index creation. Tests inverted index creation
and adding terms to IndexDocumentBundle's BPlusTree. Check look up of
documents according to term.
- IndexManagerTest
- Used to run unit tests for the IndexManager class. IndexManager acts a
a resource manager for the open indexes used to process a query.
- IndexShardTest
- Used to test that the IndexShard class can properly add new documents
and retrieve those documents by word. Checks that doc offsets can be
updated, shards can be saved and reloaded
- ItTokenizerTest
- My code for testing the Italian stemming algorithm. The inputs for the
algorithm are words in
http://snowball.tartarus.org/algorithms/italian/voc.txt and the resulting
stems are compared with the stem words in
http://snowball.tartarus.org/algorithms/italian/output.txt
- LinearHashTableTest
- Used to test that the LinearHashTable class properly stores key value pairs,
handles insert, deletes, retrievals okay.
- NlTokenizerTest
- Code used to test the Dutch stemming algorithm. The inputs for the
algorithm are words in
http://snowball.tartarus.org/algorithms/Dutch/voc.txt and the resulting
stems are compared with the stem words in
http://snowball.tartarus.org/algorithms/Dutch/output.txt
- PackedTableToolsTest
- Used to test the PackedTableTools class. PackedTableTools are used for
reading and storing rows with respect to some signature
- PdfProcessorTest
- UnitTest for the PdfProcessor class. A PdfProcessor is used to process
a .pdf file and extract summary from it. This
class tests the processing of an .pdf file.
- PhraseParserTest
- Used to test that the PhraseParser class. Want to make sure bigram
extracting works correctly
- PptxProcessorTest
- UnitTest for the PptxProcessor class. It is used to process
pptx files which are a zip of an xml-based format
- PriorityQueueTest
- Used to test the PriorityQueue class that is used to figure out which URL
to crawl next
- PtTokenizerTest
- Code used to test the Portuguese stemming algorithm. The inputs for the
algorithm are words in
http://snowball.tartarus.org/algorithms/porter/voc.txt and the resulting
stems are compared with the stem words in
http://snowball.tartarus.org/algorithms/porter/output.txt
Code uses original Porter stemmer, not Porter 2
- QueueServerTest
- Used to test functions related to scheduling websites to crawl for
a web crawl (the responsibility of a QueueServer)
- RuTokenizerTest
- Code used to test the Russian stemming algorithm. The inputs for the
algorithm are words in
http://snowball.tartarus.org/algorithms/russian/voc.txt and the resulting
stems are compared with the stem words in
http://snowball.tartarus.org/algorithms/russian/output.txt
- ScraperManagerTest
- Code used to test Web Scrapers.
- Sha1JavascriptTest
- Used to test the Javascript implementation of the sha1 function.
- StringArrayTest
- Used to test that the StringArray class properly stores/retrieves values,
and can handle loading and saving
- TrieTest
- Used to test that the Trie class properly stores words that
could be used for an autosuggest dictionary
- UrlParserTest
- Used to test that the UrlParser class. For now, want to see that the
method canonicalLink is working correctly and that
isPathMemberRegexPaths (used in robot_processor.php) works
- UtilityTest
- Used to test the various methods in utility, in particular, those
related to posting lists and time.
- VersionManagerTest
- UnitTests for the VersionManager class.
- WebArchiveTest
- UnitTest for the WebArchive class. A web archive is used to store
array-based objects persistently to a file. This class tests storing and
retrieving from such an archive.
- WikiParserTest
- Tests the functionality of WikiParser used when processing Wikipedia dumps
and used for Yioop's internal wiki infrastructure
- WordIteratorTest
- Tests the functionality of the WordIterator class used to iterate over
documents in an IndexDocumentBundle containing a term.
- XlsxProcessorTest
- Used to test that the XlsxProcessor class provides the basic functionality
of getting the tile, description, languages and links
- ZhTokenizerTest
- Used to test Named Entity Tagging and Part of Speech Tagging for the
Chinese Language. Word segmentation is already tested in