Yioop_V9.5_Source_Code_Documentation

IndexManager
in package
implements CrawlConstants

Class used to manage open IndexArchiveBundle's while performing a query. Ensures an easy place to obtain references to these bundles and ensures only one object per bundle is instantiated in a Singleton-esque way.

Tags
author

Chris Pollett

Interfaces, Classes, Traits and Enums

CrawlConstants
Shared constants and enums used by components that are involved in the crawling process

Table of Contents

INDEX_CACHE_SIZE  = 1000
Max number of IndexArchiveBundles that can be cached
$index_times  : array<string|int, mixed>
List of entries of the form name of bundle => time when cached
$indexes  : array<string|int, mixed>
Open IndexArchiveBundle's managed by this manager
clearCache()  : mixed
Clears the static variables in which caches of read in indexes and dictionary info is stored.
discountedNumDocsTerm()  : int
Returns the number of document that a given term or phrase appears in in the given index where we discount later generation -- those with lower document rank more
getIndex()  : object
Returns a reference to the managed copy of an IndexArchiveBundle object with a given timestamp or feed (for handling media feeds)
getVersion()  : int
Returns the version of the index, so that Yioop can determine how to do word lookup.The only major change to the format was when word_id's went from 8 to 20 bytes which happened around Unix time 1369754208.
getWordInfo()  : array<string|int, mixed>
Gets an array of posting list positions for each shard in the bundle $index_name for the word id $term_id

Constants

INDEX_CACHE_SIZE

Max number of IndexArchiveBundles that can be cached

public mixed INDEX_CACHE_SIZE = 1000

Properties

$index_times

List of entries of the form name of bundle => time when cached

public static array<string|int, mixed> $index_times = []

$indexes

Open IndexArchiveBundle's managed by this manager

public static array<string|int, mixed> $indexes = []

Methods

clearCache()

Clears the static variables in which caches of read in indexes and dictionary info is stored.

public static clearCache() : mixed
Return values
mixed

discountedNumDocsTerm()

Returns the number of document that a given term or phrase appears in in the given index where we discount later generation -- those with lower document rank more

public static discountedNumDocsTerm(string $term, string $index_name) : int
Parameters
$term : string

what to look up in the indexes dictionary no mask is used for this look up

$index_name : string

index to look up term or phrase in

Return values
int

number of documents

getIndex()

Returns a reference to the managed copy of an IndexArchiveBundle object with a given timestamp or feed (for handling media feeds)

public static getIndex(string $index_name) : object
Parameters
$index_name : string

timestamp of desired IndexArchiveBundle

Return values
object

the desired IndexArchiveBundle reference

getVersion()

Returns the version of the index, so that Yioop can determine how to do word lookup.The only major change to the format was when word_id's went from 8 to 20 bytes which happened around Unix time 1369754208.

public static getVersion(string $index_name) : int
Parameters
$index_name : string

unix timestamp of index

Return values
int

0 - if the original format for Yioop indexes; 1 -if 20 byte word_id format

getWordInfo()

Gets an array of posting list positions for each shard in the bundle $index_name for the word id $term_id

public static getWordInfo(string $index_name, string $term_id[, int $threshold = -1 ][, int $start_generation = -1 ][, int $num_distinct_generations = -1 ][, bool $with_remaining_total = false ]) : array<string|int, mixed>
Parameters
$index_name : string

bundle to look for $term_id in

$term_id : string

id of phrase or word to look up in bundle dictionary

$threshold : int = -1

after the number of results exceeds this amount stop looking for more dictionary entries.

$start_generation : int = -1

what generation in the index to start finding occurrence of phrase from

$num_distinct_generations : int = -1

from $start_generation how many generation to search forward to

$with_remaining_total : bool = false

whether to total number of postings found as well or not

Return values
array<string|int, mixed>

either [total, sequence of four tuples] or sequence of four tuples: (index_shard generation, posting_list_offset, length, exact id that match $term_id)


        

Search results