DictionaryUpdater
in package
implements
CrawlConstants
Tags
Interfaces, Classes, Traits and Enums
- CrawlConstants
- Shared constants and enums used by components that are involved in the crawling process
Table of Contents
- getArchiveKind() : string
- Given a folder name, determines the kind of bundle (if any) it holds.
- run() : mixed
- The main code for the dictionary updater, updates the the dictionary for the IndexDocumentBundle at $bundle_path running on channel $channel from its current next_partition to process to the current save partition. Partitions are groups of documents that have been downloaded, but whose words ave not necessarily been add to the dicitionary for the bundle.
Methods
getArchiveKind()
Given a folder name, determines the kind of bundle (if any) it holds.
public
static getArchiveKind(string $archive_path) : string
It does this based on the expected location of the description.txt file, or arc_description.ini (in the case of a non-yioop archive)
Parameters
- $archive_path : string
-
the path to archive folder
Return values
string —the archive bundle type, either: WebArchiveBundle or IndexArchiveBundle
run()
The main code for the dictionary updater, updates the the dictionary for the IndexDocumentBundle at $bundle_path running on channel $channel from its current next_partition to process to the current save partition. Partitions are groups of documents that have been downloaded, but whose words ave not necessarily been add to the dicitionary for the bundle.
public
static run(int $channel, string $bundle_path) : mixed
Parameters
- $channel : int
-
the channel the crawl is running on. Used in naming lock files
- $bundle_path : string
-
the path to the IndexDocumentBundle or FeedDucumentBundle we are adding dictionary info for