Yioop_V9.5_Source_Code_Documentation

IndexDocumentBundleTest extends UnitTest
in package

Used to test that the IndexDocumentBundle class can properly add and retrieve documents. Check its prepareMethod correctly deduplicates documents before inverted index creation. Tests inverted index creation and adding terms to IndexDocumentBundle's BPlusTree. Check look up of documents according to term.

Table of Contents

case_name  = "TestCase"
The suffix that all TestCase methods need to have to be called by run()
TEST_DIR  = __DIR__ . '/test_files/index_document_test'
Prefix of folders for index document test
$current_method  : string
Contains the value of the next test case to be run can be used by setUp
$index_archive  : IndexDocumentBundle
Holds the IndexDocumentBundle used for test purposes
$test_case_results  : array<string|int, mixed>
Used to store the results for each test sub case
$test_objects  : array<string|int, mixed>
Used to hold objects to be used in tests
__construct()  : mixed
Constructor should be overridden to do any set up that occurs before and test cases
addGetPagesTestCase()  : mixed
Tests that after adding pages to an IndexArchiveBundle, the page, and its summary can be retrieved.
addPartitionPostingsDictionaryTestCase()  : mixed
Tests the complete process of going for documents, dedup, building an inverted index and adding the result to the IndexDocumentBundle's inverted index. To this after the above is done perform lookup's of terms known to be in the indexed documents and check the properties of the returned posting lists.
assertEqual()  : mixed
Checks that $x and $y are the same, the result of the test is added to $this->test_case_results
assertFalse()  : mixed
Checks that $x can coerced to false, the result of the test is added to $this->test_case_results
assertNotEqual()  : mixed
Checks that $x and $y are not the same, the result of the test is added to $this->test_case_results
assertTrue()  : mixed
Checks that $x can coerced to true, the result of the test is added to $this->test_case_results
buildInvertedIndexPartitionTestCase()  : mixed
Tests the process of added documents to the IndexDocumentBundle, then building an inverted index from this. To check after the above is done perform lookup's of terms known to have posting list and then checking the properties of the returned posting lists.
prepareIndexTestCase()  : mixed
Tests the prepareIndexMap method which is used to deduplicate pages before an inverted index of a partition is made. Tests adding pages pages with the same doc_id to make sure will get grouped together Grouping also affect how documents are scored so tests this as well.
run()  : array<string|int, mixed>
Execute each of the test cases of this unit test and return the results
saveDescriptionTestCase()  : mixed
Checks if the constructor of the IndexDocumentBundle correctly save the constructor info such as the bundle description
setUp()  : mixed
Sets up an array to keep track of what linear hash tables we've made so that we can delete them when done a test.
tearDown()  : mixed
Deletes all the Linear Hash tables in $this->table_dirs
docidFromInt()  : string
Computes a 24 byte docId by padding an int to the left with 0's
docidFromIntKeys()  : string
docids are typically made from three 8byte strings. This function takes three ints and left pads each with '0' (\x30) and concatenates then to make a 24 byte docid. As docids use their 8 byte to say whether the id is for a document (replace with 'd') or a link (replace with 'l') this function uses the value of the $is_doc flag to determine which value overwrite the 8th byte with.

Constants

case_name

The suffix that all TestCase methods need to have to be called by run()

public mixed case_name = "TestCase"

TEST_DIR

Prefix of folders for index document test

public mixed TEST_DIR = __DIR__ . '/test_files/index_document_test'

Properties

$current_method

Contains the value of the next test case to be run can be used by setUp

public string $current_method

$test_case_results

Used to store the results for each test sub case

public array<string|int, mixed> $test_case_results

$test_objects

Used to hold objects to be used in tests

public array<string|int, mixed> $test_objects

Methods

__construct()

Constructor should be overridden to do any set up that occurs before and test cases

public __construct() : mixed
Return values
mixed

addGetPagesTestCase()

Tests that after adding pages to an IndexArchiveBundle, the page, and its summary can be retrieved.

public addGetPagesTestCase() : mixed
Return values
mixed

addPartitionPostingsDictionaryTestCase()

Tests the complete process of going for documents, dedup, building an inverted index and adding the result to the IndexDocumentBundle's inverted index. To this after the above is done perform lookup's of terms known to be in the indexed documents and check the properties of the returned posting lists.

public addPartitionPostingsDictionaryTestCase() : mixed
Return values
mixed

assertEqual()

Checks that $x and $y are the same, the result of the test is added to $this->test_case_results

public assertEqual(mixed $x, mixed $y[, string $description = "" ]) : mixed
Parameters
$x : mixed

a first item to compare

$y : mixed

a second item to compare

$description : string = ""

information about this test subcase

Return values
mixed

assertFalse()

Checks that $x can coerced to false, the result of the test is added to $this->test_case_results

public assertFalse(mixed $x[, string $description = "" ]) : mixed
Parameters
$x : mixed

item to check

$description : string = ""

information about this test subcase

Return values
mixed

assertNotEqual()

Checks that $x and $y are not the same, the result of the test is added to $this->test_case_results

public assertNotEqual(mixed $x, mixed $y[, string $description = "" ]) : mixed
Parameters
$x : mixed

a first item to compare

$y : mixed

a second item to compare

$description : string = ""

information about this test subcase

Return values
mixed

assertTrue()

Checks that $x can coerced to true, the result of the test is added to $this->test_case_results

public assertTrue(mixed $x[, string $description = "" ]) : mixed
Parameters
$x : mixed

item to check

$description : string = ""

information about this test subcase

Return values
mixed

buildInvertedIndexPartitionTestCase()

Tests the process of added documents to the IndexDocumentBundle, then building an inverted index from this. To check after the above is done perform lookup's of terms known to have posting list and then checking the properties of the returned posting lists.

public buildInvertedIndexPartitionTestCase() : mixed
Return values
mixed

prepareIndexTestCase()

Tests the prepareIndexMap method which is used to deduplicate pages before an inverted index of a partition is made. Tests adding pages pages with the same doc_id to make sure will get grouped together Grouping also affect how documents are scored so tests this as well.

public prepareIndexTestCase() : mixed
Return values
mixed

run()

Execute each of the test cases of this unit test and return the results

public run([string $method = null ]) : array<string|int, mixed>
Parameters
$method : string = null

if not null then the method to run, else run all methods

Return values
array<string|int, mixed>

test case results

saveDescriptionTestCase()

Checks if the constructor of the IndexDocumentBundle correctly save the constructor info such as the bundle description

public saveDescriptionTestCase() : mixed
Return values
mixed

setUp()

Sets up an array to keep track of what linear hash tables we've made so that we can delete them when done a test.

public setUp() : mixed
Return values
mixed

tearDown()

Deletes all the Linear Hash tables in $this->table_dirs

public tearDown() : mixed
Return values
mixed

docidFromInt()

Computes a 24 byte docId by padding an int to the left with 0's

protected docidFromInt(int $i) : string
Parameters
$i : int

integer to make docId from

Return values
string

docid made by padding

docidFromIntKeys()

docids are typically made from three 8byte strings. This function takes three ints and left pads each with '0' (\x30) and concatenates then to make a 24 byte docid. As docids use their 8 byte to say whether the id is for a document (replace with 'd') or a link (replace with 'l') this function uses the value of the $is_doc flag to determine which value overwrite the 8th byte with.

protected docidFromIntKeys(int $i_hash_url, int $j_hash_page, int $k_hash_host[, bool $is_doc = true ]) : string
Parameters
$i_hash_url : int

an int for first 8 bytes (in non-artificial docids would be for the crawlHash of url document from)

$j_hash_page : int

an int for first 8 bytes (in non-artificial docids would be for the crawlHash of document)

$k_hash_host : int

an int for first 8 bytes (in non-artificial docids would be for the crawlHash of hostname of site document from)

$is_doc : bool = true

whether the hash is for a document or a link

Return values
string

24 byte docid.


        

Search results