IndexDocumentBundleTest
extends UnitTest
in package
Used to test that the IndexDocumentBundle class can properly add and retrieve documents. Check its prepareMethod correctly deduplicates documents before inverted index creation. Tests inverted index creation and adding terms to IndexDocumentBundle's BPlusTree. Check look up of documents according to term.
Table of Contents
- case_name = "TestCase"
- The suffix that all TestCase methods need to have to be called by run()
- TEST_DIR = __DIR__ . '/test_files/index_document_test'
- Prefix of folders for index document test
- $current_method : string
- Contains the value of the next test case to be run can be used by setUp
- $index_archive : IndexDocumentBundle
- Holds the IndexDocumentBundle used for test purposes
- $test_case_results : array<string|int, mixed>
- Used to store the results for each test sub case
- $test_objects : array<string|int, mixed>
- Used to hold objects to be used in tests
- __construct() : mixed
- Constructor should be overridden to do any set up that occurs before and test cases
- addGetPagesTestCase() : mixed
- Tests that after adding pages to an IndexArchiveBundle, the page, and its summary can be retrieved.
- addPartitionPostingsDictionaryTestCase() : mixed
- Tests the complete process of going for documents, dedup, building an inverted index and adding the result to the IndexDocumentBundle's inverted index. To this after the above is done perform lookup's of terms known to be in the indexed documents and check the properties of the returned posting lists.
- assertEqual() : mixed
- Checks that $x and $y are the same, the result of the test is added to $this->test_case_results
- assertFalse() : mixed
- Checks that $x can coerced to false, the result of the test is added to $this->test_case_results
- assertNotEqual() : mixed
- Checks that $x and $y are not the same, the result of the test is added to $this->test_case_results
- assertTrue() : mixed
- Checks that $x can coerced to true, the result of the test is added to $this->test_case_results
- buildInvertedIndexPartitionTestCase() : mixed
- Tests the process of added documents to the IndexDocumentBundle, then building an inverted index from this. To check after the above is done perform lookup's of terms known to have posting list and then checking the properties of the returned posting lists.
- prepareIndexTestCase() : mixed
- Tests the prepareIndexMap method which is used to deduplicate pages before an inverted index of a partition is made. Tests adding pages pages with the same doc_id to make sure will get grouped together Grouping also affect how documents are scored so tests this as well.
- run() : array<string|int, mixed>
- Execute each of the test cases of this unit test and return the results
- saveDescriptionTestCase() : mixed
- Checks if the constructor of the IndexDocumentBundle correctly save the constructor info such as the bundle description
- setUp() : mixed
- Sets up an array to keep track of what linear hash tables we've made so that we can delete them when done a test.
- tearDown() : mixed
- Deletes all the Linear Hash tables in $this->table_dirs
- docidFromInt() : string
- Computes a 24 byte docId by padding an int to the left with 0's
- docidFromIntKeys() : string
- docids are typically made from three 8byte strings. This function takes three ints and left pads each with '0' (\x30) and concatenates then to make a 24 byte docid. As docids use their 8 byte to say whether the id is for a document (replace with 'd') or a link (replace with 'l') this function uses the value of the $is_doc flag to determine which value overwrite the 8th byte with.
Constants
case_name
The suffix that all TestCase methods need to have to be called by run()
public
mixed
case_name
= "TestCase"
TEST_DIR
Prefix of folders for index document test
public
mixed
TEST_DIR
= __DIR__ . '/test_files/index_document_test'
Properties
$current_method
Contains the value of the next test case to be run can be used by setUp
public
string
$current_method
$index_archive
Holds the IndexDocumentBundle used for test purposes
public
IndexDocumentBundle
$index_archive
$test_case_results
Used to store the results for each test sub case
public
array<string|int, mixed>
$test_case_results
$test_objects
Used to hold objects to be used in tests
public
array<string|int, mixed>
$test_objects
Methods
__construct()
Constructor should be overridden to do any set up that occurs before and test cases
public
__construct() : mixed
Return values
mixed —addGetPagesTestCase()
Tests that after adding pages to an IndexArchiveBundle, the page, and its summary can be retrieved.
public
addGetPagesTestCase() : mixed
Return values
mixed —addPartitionPostingsDictionaryTestCase()
Tests the complete process of going for documents, dedup, building an inverted index and adding the result to the IndexDocumentBundle's inverted index. To this after the above is done perform lookup's of terms known to be in the indexed documents and check the properties of the returned posting lists.
public
addPartitionPostingsDictionaryTestCase() : mixed
Return values
mixed —assertEqual()
Checks that $x and $y are the same, the result of the test is added to $this->test_case_results
public
assertEqual(mixed $x, mixed $y[, string $description = "" ]) : mixed
Parameters
- $x : mixed
-
a first item to compare
- $y : mixed
-
a second item to compare
- $description : string = ""
-
information about this test subcase
Return values
mixed —assertFalse()
Checks that $x can coerced to false, the result of the test is added to $this->test_case_results
public
assertFalse(mixed $x[, string $description = "" ]) : mixed
Parameters
- $x : mixed
-
item to check
- $description : string = ""
-
information about this test subcase
Return values
mixed —assertNotEqual()
Checks that $x and $y are not the same, the result of the test is added to $this->test_case_results
public
assertNotEqual(mixed $x, mixed $y[, string $description = "" ]) : mixed
Parameters
- $x : mixed
-
a first item to compare
- $y : mixed
-
a second item to compare
- $description : string = ""
-
information about this test subcase
Return values
mixed —assertTrue()
Checks that $x can coerced to true, the result of the test is added to $this->test_case_results
public
assertTrue(mixed $x[, string $description = "" ]) : mixed
Parameters
- $x : mixed
-
item to check
- $description : string = ""
-
information about this test subcase
Return values
mixed —buildInvertedIndexPartitionTestCase()
Tests the process of added documents to the IndexDocumentBundle, then building an inverted index from this. To check after the above is done perform lookup's of terms known to have posting list and then checking the properties of the returned posting lists.
public
buildInvertedIndexPartitionTestCase() : mixed
Return values
mixed —prepareIndexTestCase()
Tests the prepareIndexMap method which is used to deduplicate pages before an inverted index of a partition is made. Tests adding pages pages with the same doc_id to make sure will get grouped together Grouping also affect how documents are scored so tests this as well.
public
prepareIndexTestCase() : mixed
Return values
mixed —run()
Execute each of the test cases of this unit test and return the results
public
run([string $method = null ]) : array<string|int, mixed>
Parameters
- $method : string = null
-
if not null then the method to run, else run all methods
Return values
array<string|int, mixed> —test case results
saveDescriptionTestCase()
Checks if the constructor of the IndexDocumentBundle correctly save the constructor info such as the bundle description
public
saveDescriptionTestCase() : mixed
Return values
mixed —setUp()
Sets up an array to keep track of what linear hash tables we've made so that we can delete them when done a test.
public
setUp() : mixed
Return values
mixed —tearDown()
Deletes all the Linear Hash tables in $this->table_dirs
public
tearDown() : mixed
Return values
mixed —docidFromInt()
Computes a 24 byte docId by padding an int to the left with 0's
protected
docidFromInt(int $i) : string
Parameters
- $i : int
-
integer to make docId from
Return values
string —docid made by padding
docidFromIntKeys()
docids are typically made from three 8byte strings. This function takes three ints and left pads each with '0' (\x30) and concatenates then to make a 24 byte docid. As docids use their 8 byte to say whether the id is for a document (replace with 'd') or a link (replace with 'l') this function uses the value of the $is_doc flag to determine which value overwrite the 8th byte with.
protected
docidFromIntKeys(int $i_hash_url, int $j_hash_page, int $k_hash_host[, bool $is_doc = true ]) : string
Parameters
- $i_hash_url : int
-
an int for first 8 bytes (in non-artificial docids would be for the crawlHash of url document from)
- $j_hash_page : int
-
an int for first 8 bytes (in non-artificial docids would be for the crawlHash of document)
- $k_hash_host : int
-
an int for first 8 bytes (in non-artificial docids would be for the crawlHash of hostname of site document from)
- $is_doc : bool = true
-
whether the hash is for a document or a link
Return values
string —24 byte docid.