Yioop_V9.5_Source_Code

FeedDocumentBundle extends IndexDocumentBundle
in package

Application

Subclass of IndexDocumentBundle with bloom filters to make it easy to check if a news feed item has been added to the bundle already before adding it

ARCHIVE_INFO_FILE

File name used to store within the folder of the IndexDocumentBundle parameter/configuration information about the bundle


    public
        mixed
    ARCHIVE_INFO_FILE
    = "archive_info.txt"

DEFAULT_PARAMETERS

Default values for the configuration parameters of an IndexDocumentBundle


    public
        mixed
    DEFAULT_PARAMETERS
    = ["DESCRIPTION" => "", "VERSION" => self::DEFAULT_VERSION]

DEFAULT_VERSION

The version of this IndexDocumentBundle. The lowest format number is 3.0 as prior inverted index/document stores used IndexArchiveBundle's


    public
        mixed
    DEFAULT_VERSION
    = "3.2"

DICTIONARY_FOLDER

Subfolder of IndexDocumentBundle to store the btree with term => posting list information (i.e., the inverted index)


    public
        mixed
    DICTIONARY_FOLDER
    = "dictionary"

DOC_MAP_FILENAME

Partition i in an IndexDocumentBundle has a subfolder i within self::POSITIONS_DOC_MAP_FOLDER. Within this subfolder i, self::DOC_MAP_FILENAME is the name of the file used to store the document map for the partition. The document map consists of a sequence of records associated with each doc_id of a document stored in the partition. The first record is ["POS" => $num_words, "SCORE" => floatval($global_score_for_document)]. The second record is: ["POS" => $length_of_title_of_document, "SCORE" => floatval($num_description_scores)]] Here a description score is a score for the importance for a section of a document. Subsequence records, list [POS => the length of the jth section of the document, SCORE => its score].


    public
        mixed
    DOC_MAP_FILENAME
    = "doc_map"

DOCID_LEN

Length of DocIds used by this IndexDocumentBundle


    public
        mixed
    DOCID_LEN
    = 24

DOCID_PART_LEN

DocIds are made of three parts: hash of url, hash of document, hash of url hostname. Each of these hashes is DOCID_PART_LEN long


    public
        mixed
    DOCID_PART_LEN
    = 8

DOCUMENTS_FOLDER

Folder used to store the partition data of this IndexDocumentBundle These will consists of .txt.gz files for each partition which are used to store summaries of documents and actual documents (web pages) and .ix files which are used to store doc_id and the associated offsets to their summary and actual document within the .txt.gz file


    public
        mixed
    DOCUMENTS_FOLDER
    = "documents"

LAST_ENTRIES_FILENAME

Name of the last entries file used to help compute difference lists for doc_map_index, and position list offsets used in postings for the partition. This file is also used to track the total number of occurrences of term in a partition


    public
        mixed
    LAST_ENTRIES_FILENAME
    = "last_entries"

NEXT_PARTITION_FILE

The filename of a file that is used to keep track of the integer that says what is the next partition with documents that can be added to this IndexDocumentBundle's dictionary. I.e., It should be that next_partition <= save_partition


    public
        mixed
    NEXT_PARTITION_FILE
    = "next_partition.txt"

OLD_ITEM_TIME

how long in seconds before a feed item expires


    public
        mixed
    OLD_ITEM_TIME
    = 4 * \seekquarry\yioop\configs\ONE_WEEK

PARTITION_FILENAMES

Names for the files which appear within a partition sub-folder


    public
        mixed
    PARTITION_FILENAMES
    = [self::DOC_MAP_FILENAME, self::LAST_ENTRIES_FILENAME, self::POSITIONS_FILENAME, self::POSTINGS_FILENAME]

POSITIONS_DOC_MAP_FOLDER

Name of the folder used to hold position lists and document maps. Within this folder there is a subfolder for each partition which contains a doc_map file, postings file for the docs within the partition, position lists file for those postings, and a last_entries file used in the computation of difference list for doc_map_index and position list offsets, as well as number of occurrences of terms.


    public
        mixed
    POSITIONS_DOC_MAP_FOLDER
    = "positions_doc_maps"

POSITIONS_FILENAME

Name of the file within a partitions positions_doc_maps folder used to contain the partition's position list for all terms in partition.


    public
        mixed
    POSITIONS_FILENAME
    = "positions"

POSTINGS_BUFFER_SIZE

How many bytes of posting to buffer before writing, when addPartitionPostingsDictionary


    public
        mixed
    POSTINGS_BUFFER_SIZE
    = 1000000

POSTINGS_FILENAME

Name of the file within a partition's positions_doc_maps folder with posting information for all terms in that partition. This consists of key value pairs term_id => posting records for all documents with that term.


    public
        mixed
    POSTINGS_FILENAME
    = "postings"

TEMP_POSTINGS_FILENAME

Temporary name for postings from a POSTINGS_FILENAME file while they are being compressed.


    public
        mixed
    TEMP_POSTINGS_FILENAME
    = "temp_postings"

TERMID_LEN

Length of TermIds used by this IndexDocumentBundle


    public
        mixed
    TERMID_LEN
    = 16

$archive_info

Holds property value pairs concerning the configuration of the current IndexDocumentBundle


    public
        array<string|int, mixed>
    $archive_info

$db

Reference to a DatasourceManager to communicate with the database to get a list of search sources (news feeds) associated with this feed bundle


    public
        DatasourceManager
    $db

$description

A short text name for this IndexDocumentBundle


    public
        string
    $description

$dictionary

IndexDictionary for all shards in the IndexArchiveBundle This contains entries of the form (word, num_shards with word, posting list info 0th shard containing the word, posting list info 1st shard containing the word, ...)


    public
        object
    $dictionary

$dir_name

Folder name to use for this IndexDocumentBundle


    public
        string
    $dir_name

$doc_map

Associative array of docid=>doc_record pairs


    public
        array<string|int, mixed>
    $doc_map

$doc_map_counter

Keeps track of the number of documents present in the current partition


    public
        int
    $doc_map_counter

$doc_map_tools

Used to read and write data to the $doc_map array


    public
        PackedTableTools
    $doc_map_tools

$documents

PartitionDocumentBundle for web page documents


    public
        object
    $documents

$extract_phrase_time

Holds the total time needed to extract phrases (sequences of adjacent words) from site descriptions for a partition


    public
        int
    $extract_phrase_time

$feeds

Array of information about the search sources (news feeds) that were used to collect news items stored in this bundle


    public
        array<string|int, mixed>
    $feeds

$filter_a

Used to store unique identifiers of feed items that have been stored in this FeedArchiveBundle. This filter_a is used for checking if items are already in the archive, when it has URL_FILTER_SIZE/2 items filter_b is added to as well as filter_a. When filter_a is of size URL_FILTER_SIZE filter_a is deleted, filter_b is renamed to filter_a and the process is repeated.


    public
        BloomFilterFile
    $filter_a

$filter_b

Auxiliary BloomFilterFile used in checking if feed items are in this archive or not.


    public
        BloomFilterFile
    $filter_b

@see $filter_a

$last_entries

Used to keep track of the previous values posting quantities so difference lists can be computed. For example, previous $doc_map_index, previous position list offset. It also tracks the total number of occurrences of a term within a partition.


    public
        array<string|int, mixed>
    $last_entries

$last_entries_tools

Used to read and write data to the $last_entries array


    public
        PackedTableTools
    $last_entries_tools

$next_partition_to_add

structure contains info about the current partition


    public
        array<string|int, mixed>
    $next_partition_to_add

$positions

A string consisting of a concatenated sequence term position information for each document in turn and within this for each term in that document.


    public
        string
    $positions

$postings

Associative array $term_id => posting list records for that term in the partition.


    public
        array<string|int, mixed>
    $postings

$postings_tools

Used to read and write data to the $postings array


    public
        PackedTableTools
    $postings_tools

$unpack_len_map

Array of string lengths each of $unpack_maps codes consumes


    public
        array<string|int, mixed>
    $unpack_len_map

$unpack_map

Map from int -> three character unpack string used to unpack posting info


    public
        array<string|int, mixed>
    $unpack_map

__construct()

Makes or initializes an FeedArchiveBundle with the provided parameters


    public
                    __construct(string $dir_name, mixed $db[, bool $read_only_archive = true ][, string $description = null ][, int $num_docs_per_partition = CNUM_DOCS_PER_PARTITION ]) : mixed

Parameters

$dir_name : string: folder name to store this bundle
$db : mixed
$read_only_archive : bool = true: whether to open archive only for reading or reading and writing
$description : string = null: a text name/serialized info about this IndexDocumentBundle
$num_docs_per_partition : int = CNUM_DOCS_PER_PARTITION: the number of pages to be stored in a single shard

Return values

mixed —

addFilters()

Adds the key (often GUID) of a feed item to the bloom filter pair associated with this archive. This always adds to filter a, if filter a is more than half full it adds to filter b. If filter a is full it is deletedand filter b is renamed filter a and te process continues where a new filter b is created when this becomee half full.


    public
                    addFilters(string $key) : mixed

Parameters

$key : string: unique identifier of a feed item

Return values

mixed —

addPages()

Add the array of $pages to the documents PartitionDocumentBundle


    public
                    addPages(array<string|int, mixed> $pages, int $visited_urls_count) : bool

Parameters

$pages : array<string|int, mixed>: data to store
$visited_urls_count : int: number to add to the count of visited urls (visited urls is a smaller number than the total count of objects stored in the index).

Return values

bool —

success or failure of adding the pages

addPagesAndSeenKeys()

Adds pages of feed items to document bundle and adds their unique hashes (guids)) to bloom filters so they are not reindexed


    public
                    addPagesAndSeenKeys(array<string|int, mixed> $pages, int $visited_urls_count) : bool

Parameters

$pages : array<string|int, mixed>: array of feed items
$visited_urls_count : int: number of feed items

Return values

bool —

whether or not succeeded in adding pages

addPartitionPostingsDictionary()

Adds the previously constructed inverted index $partition to the inverted index of the whole bundle


    public
                    addPartitionPostingsDictionary([int $partition = -1 ][, string $taking_too_long_touch = null ]) : mixed

Parameters

$partition : int = -1: which partitions inverted index to add, by default the current save partition
$taking_too_long_touch : string = null: a filename of a file to touch so its last modified time becomes the current time. In a typical Yioop crawl this is done for the CrawlConstants::crawl_status_file file to prevent Yioop's web interface from stopping the crawl because it has seen no recent progress activity on a crawl.

Return values

mixed —

addScoresDocMap()

Used to add a doci_id => doc_record to the current partition's document map ($this->doc_map). A doc record records the number of words in the document, an overall length of the document, the length of its title, scores for each of the sentences included into the summary for the documents, and classifier scores for each classifier that was used by the crawl.


    public
                    addScoresDocMap(string $doc_id, int $num_words, float $score, int $host_keywords_end_pos, int $title_end_pos, int $path_keywords_end_pos, array<string|int, mixed> $description_scores, array<string|int, mixed> $user_ranks) : mixed

Parameters

$doc_id : string: new document id to add a record for
$num_words : int: number of terms in the document associated with the doc-id
$score : float: overall score for the important of this document
$host_keywords_end_pos : int: end of the portion of the document summary containing terms coming from the hostname
$title_end_pos : int: end of the portion of the document summary containing terms in the title
$path_keywords_end_pos : int: length of the portion of the document summary containing terms in the url path
$description_scores : array<string|int, mixed>: pairs of the form (length of summary portion, score for that portion)
$user_ranks : array<string|int, mixed>: for each user defined classifier for this crawl the float score of the classifier on this document

Return values

mixed —

addTermCountsTrendingTable()

Updates TRENDING_TERM, hourly, daily, and weekly top term occurrences.


    public
                    addTermCountsTrendingTable(array<string|int, mixed> $term_counts) : mixed

Removes entries older than a week

Parameters

$term_counts : array<string|int, mixed>: for the most recent update of the feed index, it should be an array [$lang => [$term => $occurrences]] for the top NUM_TRENDING terms per language

Return values

mixed —

addTermPostingLists()

Adds posting records associated to a document to the posting lists for a partition.


    public
                    addTermPostingLists(int $position_offset, int $doc_length, array<string|int, mixed> $word_lists, array<string|int, mixed> $meta_ids, int $doc_map_index) : mixed

Parameters

$position_offset : int: number of header bytes that might be used before including any position data in the file that positions will eventually be stored.
$doc_length : int: length of document in terms for the document for which we are adding posting data.
$word_lists : array<string|int, mixed>: term => positions within current document of that term for the document whose posting data we are adding
$meta_ids : array<string|int, mixed>: meta terms associated with the document we are adding. An example, meta term might be "media:news"
$doc_map_index : int: which document within the partition is the one we are adding. I.e., 5 would mean there were 5 earlier documents whose postings we have already added.

Return values

mixed —

buildInvertedIndexPartition()

Copies all feeds items newer than $age to a new shard, then deletes old index shard and database entries older than $age. Finally sets copied shard to be active. If this method is going to take max_execution_time/2 it returns false, so an additional job can be schedules; otherwise it returns true


    public
                    buildInvertedIndexPartition([int $partition = -1 ][, string $taking_too_long_touch = null ][, bool $just_stats = false ]) : mixed

Parameters

$partition : int = -1: bundle partition to build inverted index for
$taking_too_long_touch : string = null: name of file to touch if building inverted index takes too long (whether SCHEDULES_DIR/ . "/{$this->channel}-" . CrawmConstants::crawl_status_file has been recently modified) is used in crawling to see if have run out of new data and the crawl can stopped.
$just_stats : bool = false: whether to just compute stats on the inverted or to actually save the results

Return values

mixed —

whether job executed to completion (true or false) if !$just_stats, otherwise, an array with NUM_DOCS, NUM_LINKS, and TERM_STATISTICS (the latter having term frequency info)

calculateMetas()

Used to calculate the meta words for RSS feed items


    public
                    calculateMetas(string $lang, int $pubdate, string $source_name, string $guid[, string $media_category = "news" ]) : array<string|int, mixed>

Parameters

$lang : string: the locale_tag of the feed item
$pubdate : int: UNIX timestamp publication date of item
$source_name : string: the name of the feed
$guid : string: the guid of the item
$media_category : string = "news": determines what media: metas to inject. Default is news.

Return values

array<string|int, mixed> —

$meta_ids meta words found

computeDocId()

Given a $site array of information about a web page/document. Use CrawlConstant::URL and CrawlConstant::HASH fields to compute a unique doc id for the array.


    public
            static        computeDocId(array<string|int, mixed> $site) : string

Parameters

$site : array<string|int, mixed>: site to compute doc_id for

Return values

string —

doc_id

contains()

Whether the active filter for this feed contain thee feed item of thee supplied key


    public
                    contains(string $key) : bool

Parameters

$key : string: the feed item id to check if in archive

Return values

bool —

true if it is in the archive, false otherwise

deDeltaPostingsSumFrequencies()

Within postings DOC_MAP_INDEX and POSITION_OFFSETS to position lists are stored as delta lists (difference over previous values), this method undoes the delta list to restore the actual DELTA_DOC_MAP_INDEX and POSITION_OFFSETS values. It also computes the of the frequencies of items within the list of postings. This method is current only used for active partition in an index (the one whose terms haven't yet been added to the B+-tree).


    public
                    deDeltaPostingsSumFrequencies(array<string|int, mixed> &$postings) : int

Parameters

$postings : array<string|int, mixed>: a reference to an array of posting lists for a term (this will be changed by this method)

Return values

int —

sum of the frequencies of term occurrences as given by the above postings

findNumSlashes()

Finds number of '/' in the url after the hostname represented by doc_id $key.


    public
            static        findNumSlashes(string $key) : mixed

Parameters

$key : string: to find '/' count

Return values

mixed —

forceSave()

Forces the current shard to be saved


    public
                    forceSave() : mixed

Return values

mixed —

getArchiveInfo()

Gets the description, count of documents, and number of partitions of the documents store in the supplied directory. If the file arc_description.txt exists, this is viewed as a dummy index archive for the sole purpose of allowing conversions of downloaded data such as arc files into Yioop! format.


    public
            static        getArchiveInfo(string $dir_name) : array<string|int, mixed>

Parameters

$dir_name : string: path to a directory containing a documents IndexDocumentBundle

Return values

array<string|int, mixed> —

summary of the given archive

getCachePage()

Given the $doc_id of a document and a $partition to look for it in return's the cached page of the document if present and [] otherwise


    public
                    getCachePage(string $doc_id, int $partition) : array<string|int, mixed>

Parameters

$doc_id : string: of document to look up
$partition : int: to look for document in

Return values

array<string|int, mixed> —

desired page cache or [] if look up failed

getParamModifiedTime()

Returns the last time the archive info of the bundle was modified.


    public
            static        getParamModifiedTime(string $dir_name) : mixed

Parameters

$dir_name : string: folder with archive bundle

Return values

mixed —

getPartitionBaseFolder()

Gets the file path corresponding to the partition with index $partition


    public
                    getPartitionBaseFolder(int $partition) : string

Parameters

$partition : int: desired partition index

Return values

string —

file path to where this partitions index data is stored (Not the original documents which are stored in the PartitionDocumentBundle)

getPostingsString()

Get the postings stored in the postings file in a partition from $offset to $offset+len remove the 255 encoding.


    public
                    getPostingsString(int $partition, int $offset, int $len) : string

Parameters

$partition : int: partition to retrieve posting from
$offset : int: byte offset int partition/postings file to look for them
$len : int: length of the posting list to retrieve.

Return values

string —

encoded posting list data -- vbyte encoded number of postings, followed by the posting data in PacktableTools format

getSummary()

Given the $doc_id of a document and a $partition to look for it in return's the document summary info if present and [] otherwise.


    public
                    getSummary(string $doc_id, int $partition) : array<string|int, mixed>

Parameters

$doc_id : string: of document to look up
$partition : int: to look for document in

Return values

array<string|int, mixed> —

desired summary or [] if look up failed

getWordInfo()

Gets an array of posting list positions for each shard in the bundle $index_name for the word id $term_id


    public
                    getWordInfo(string $term_id[, int $threshold = -1 ], mixed $offset[, mixed $num_partitions = -1 ][, bool $with_remaining_total = false ]) : array<string|int, mixed>

Parameters

$term_id : string: id of phrase or word to look up in bundle dictionary
$threshold : int = -1: after the number of results exceeds this amount stop looking for more dictionary entries.
$offset : mixed
$num_partitions : mixed = -1
$with_remaining_total : bool = false: whether to total number of postings found as well or not

Return values

array<string|int, mixed> —

either [total, sequence of four tuples] or sequence of four tuples: (index_shard generation, posting_list_offset, length, exact id that match $term_id)

invertOneSite()

Used to create inverted index for one site and add its information to the current partition.


    public
                    invertOneSite(array<string|int, mixed> $site, array<string|int, mixed> $url_info, int &$link_cnt) : string

Parameters

$site : array<string|int, mixed>: site to invert
$url_info : array<string|int, mixed>: collection of url and hash's of documents which map to the same document
$link_cnt : int: current count of number of links discovered so far

Return values

string —

$site_url canonical url for site

isACldDocId()

Checks if a doc_id $key is that of a Company level domain (cld) or www.cld.


    public
            static        isACldDocId(string $key) : mixed

I.e., a url https://yahoo.com/ or https://www.yahoo.com/ as opposed to https://foo.yahoo.com/

Parameters

$key : string: to check if doc or not

Return values

mixed —

isAHostDocId()

Checks if a doc_id $key is that of a host url.


    public
            static        isAHostDocId(string $key) : mixed

I.e., a url https://www.yahoo.com/ as opposed to https://www.yahoo.com/foo

Parameters

$key : string: to check if doc or not

Return values

mixed —

isAWikipediaPage()

Checks if a doc_id $key is that of a Wikipedia page.


    public
            static        isAWikipediaPage(string $key) : mixed

Parameters

$key : string: to check if Wikipedia page or not

Return values

mixed —

isType()

Checks if a doc_id corresponds to a particular large scale type among external_link, internal_link, link (union of previous two), binary, feed, image, text, video, document (union of previous five)


    public
            static        isType(string $key, mixed $types) : bool

Parameters

$key : string: to check if doc or not
$types : mixed

Return values

bool —

true if a document

prepareIndexMap()

As pre-step to calculating the inverted index information for a partition this method groups documents and links to documents into single objects.


    public
                    prepareIndexMap(int $partition[, array<string|int, mixed> $test_index = [] ]) : array<string|int, mixed>

It also does simple deduplication of documents that have the same hash. It then returns an array of the grouped document data. Grouping is done by giving a score to each document based on (number of doc in index - order doc added). For two entries with the same hash_url, a document will be chosen over a link as the representative; otherwise, the one with higher score will be chosen as the representative. The representative document is given the sum of the scores of its constituents. A second phase where documents are grouped by hash of the text body is also done. Finally, the returned documents are sorted by their scores. So the order of documents from this process is roughly in the order of importance.

Parameters

$partition : int: index of partition to do deduplication for in the case that test index is empty
$test_index : array<string|int, mixed> = []: is non-null only when doing testing of what this method does. In which case, it should consist of an array of $doc_id => string represent a possible record for that doc. As deduplication is done entirely based on component of the doc_id (hash_url, doc_type, hash_doc, hash_host) the string doesn't matter too much.

Return values

array<string|int, mixed> —

groups doc_id => records associated with that doc_id

setArchiveInfo()

Sets the archive info struct for the web archive bundle associated with this bundle. This struct has fields like: DESCRIPTION (serialized store of global parameters of the crawl like seed sites, timestamp, etc).


    public
            static        setArchiveInfo(string $dir_name, array<string|int, mixed> $update_info) : mixed

Parameters

$dir_name : string: folder with archive bundle
$update_info : array<string|int, mixed>: struct with above fields

Return values

mixed —

stopIndexing()

Used when a crawl stops to perform final dictionary operations to produce a working stand-alone index.


    public
                    stopIndexing() : mixed

Return values

mixed —

unpackPostings()

Given the postings as a string for a partition for a term, unpacks them into an array of postings, doing de-delta of doc_map_indices and de-delta of positions. Each posting represents occurrence of a term in a documents, so the frequency component is the number of occurrences of the term in the document. This method also computes the sum of these frequencies over all postings in partition.


    public
                    unpackPostings(string $postings_string) : array<string|int, mixed>

Parameters

$postings_string : string: compress string representation of a set of postings for a term

Return values

array<string|int, mixed> —

a pair [array of unpacked postings, sum of frequencies of all the postings]

updateDictionary()

For every partition between next partition and save partition, adds the posting list information to the dictionary BPlusTree. At the end of this process next partition and save partition should be the same


    public
                    updateDictionary([string $taking_too_long_touch = null ][, bool $till_equal = true ]) : mixed

Parameters

$taking_too_long_touch : string = null: a filename of a file to touch so its last modified time becomes the current time. In a typical Yioop crawl this is done for the CrawlConstants::crawl_status_file file to prevent Yioop's web interface from stopping the crawl because it has seen no recent progress activity on a crawl.
$till_equal : bool = true: is set to true will keep adding each partition up till the save partition if set to false, oln;y adds one partition

Return values

mixed —

updateTrendingTermCounts()

Updates trending term counts based on the string from the current feed item.


    public
                    updateTrendingTermCounts(array<string|int, mixed> &$term_counts, string $source_phrase, array<string|int, mixed> $word_or_phrase_list, string $media_category, string $source_name, string $lang, int $pubdate[, string $source_stop_regex = "" ]) : mixed

Parameters

$term_counts : array<string|int, mixed>: lang => [term => occurrences]
$source_phrase : string: original non-stemmed phrase from feed item to adjust $term_counts with. Used to remember non-stemmed terms. We assume we have already extracted position lists from
$word_or_phrase_list : array<string|int, mixed>: associate array of stemmed_word_or_phrase => positions in feed item of where occurs
$media_category : string: of feed source the item case from. We trending counts grouped by media category
$source_name : string: of feed source the item case from. We exclude from counts the name of the feed source
$lang : string: locale_tag for this feed item
$pubdate : int: timestamp when string was published (used in weighting)
$source_stop_regex : string = "": a regex to remove terms which occur frequently for this particular source

Return values

mixed —

FeedDocumentBundle extends IndexDocumentBundle in package Application

Tags

Table of Contents

Constants

ARCHIVE_INFO_FILE

DEFAULT_PARAMETERS

DEFAULT_VERSION

DICTIONARY_FOLDER

DOC_MAP_FILENAME

DOCID_LEN

DOCID_PART_LEN

DOCUMENTS_FOLDER

LAST_ENTRIES_FILENAME

NEXT_PARTITION_FILE

OLD_ITEM_TIME

PARTITION_FILENAMES

POSITIONS_DOC_MAP_FOLDER

POSITIONS_FILENAME

POSTINGS_BUFFER_SIZE

POSTINGS_FILENAME

TEMP_POSTINGS_FILENAME

TERMID_LEN

Properties

$archive_info

$db

$description

$dictionary

$dir_name

$doc_map

$doc_map_counter

$doc_map_tools

$documents

$extract_phrase_time

$feeds

$filter_a

$filter_b

$last_entries

$last_entries_tools

$next_partition_to_add

$positions

$postings

$postings_tools

$unpack_len_map

$unpack_map

Methods

__construct()

Parameters

Return values

addFilters()

Parameters

Return values

addPages()

Parameters

Return values

addPagesAndSeenKeys()

Parameters

Return values

addPartitionPostingsDictionary()

Parameters

Return values

addScoresDocMap()

Parameters

Return values

addTermCountsTrendingTable()

Parameters

Return values

addTermPostingLists()

Parameters

Return values

buildInvertedIndexPartition()

Parameters

Return values

calculateMetas()

Parameters

Return values

computeDocId()

Parameters

Return values

contains()

Parameters

FeedDocumentBundle extends IndexDocumentBundle
in package

Application