DescriptionUpdateJob
extends MediaJob
in package
A media job to periodically update descriptions of Wiki resources using Description Search Sources
Table of Contents
- NEEDS_DESCRIPTION_FILE = \seekquarry\yioop\configs\APP_DIR . "/resources/needs_descriptions.txt"
- File to tell DescriptionUpdateJob that a wiki resource needs a description
- $controller : object
- If MediaJob was instantiated in the web app, the controller that instatiated it
- $db : object
- Datasource object used to run db queries related to fes items
- $media_updater : object
- If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
- $name_server_does_client_tasks : bool
- Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
- $name_server_does_client_tasks_only : bool
- Whether this MediaJob performs name server only tasks
- $page_id_thumb_folder_paths : array<string|int, mixed>
- Resource and thumb folders for all the wiki pages that have resources that need descriptions
- $tasks : array<string|int, mixed>
- The most recently received from the name server tasks for this MediaJob
- $update_time : int
- Time in current epoch when description last updated
- __construct() : mixed
- Instiates the MediaJob with a reference to the object that instatiated it
- checkPrerequisites() : bool
- Only update if its been more than a day since the last update and there are resources requiring description update
- doTasks() : mixed
- For each resource requiring description update, use the description search sources to find information
- execNameServer() : array<string|int, mixed>
- Executes a method on the name server's JobController.
- finishTasks() : mixed
- This method is called on the name server to finish processing any data returned by MediaUpdater clients.
- getCurrentMachine() : string
- Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
- getDetails() : string
- Fetches the details on the url page using the xpaths values configured in search source
- getJobName() : string
- Gets the class name (less namespace and the word Job ) of the current MediaJob
- getTasks() : array<string|int, mixed>
- Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
- init() : mixed
- Initializes the last update time to far in the past so, description will get immediately updated. Sets up connection to DB to fetch description search sources
- matchResourceSourcePathTerms() : bool
- Checks if the terms wiki page name followed by a path to a wiki resource contain the terms in a description search source string which would trigger that search source to get used
- nondistributedTasks() : mixed
- Get the description search sources from the local database and use those to run the same task as in the distributed setting
- parseDescriptionAuxInfo() : mixed
- Parses out the components of the auxiliary field of a description source.
- prepareTasks() : mixed
- This method is called on the name server to prepare data for any MediaUpdater clients.
- processItem() : array<string|int, mixed>
- Processes $item, a DOMElement representing a search result for a description for the wiki resource $name, extracting a title and url. Form the title a match score with $name is obtained. This score and url as well as in test mode log messages are returned.
- putTasks() : array<string|int, mixed>
- After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
- run() : mixed
- Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
- updateResourcesDescription() : string
- Updates/finds descriptions for resources listed in a needs_description.txt in a wiki pages thumb subfolder.
Constants
NEEDS_DESCRIPTION_FILE
File to tell DescriptionUpdateJob that a wiki resource needs a description
public
mixed
NEEDS_DESCRIPTION_FILE
= \seekquarry\yioop\configs\APP_DIR . "/resources/needs_descriptions.txt"
Properties
$controller
If MediaJob was instantiated in the web app, the controller that instatiated it
public
object
$controller
$db
Datasource object used to run db queries related to fes items
public
object
$db
$media_updater
If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
public
object
$media_updater
$name_server_does_client_tasks
Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
public
bool
$name_server_does_client_tasks
$name_server_does_client_tasks_only
Whether this MediaJob performs name server only tasks
public
bool
$name_server_does_client_tasks_only
$page_id_thumb_folder_paths
Resource and thumb folders for all the wiki pages that have resources that need descriptions
public
array<string|int, mixed>
$page_id_thumb_folder_paths
$tasks
The most recently received from the name server tasks for this MediaJob
public
array<string|int, mixed>
$tasks
$update_time
Time in current epoch when description last updated
public
int
$update_time
Methods
__construct()
Instiates the MediaJob with a reference to the object that instatiated it
public
__construct([object $media_updater = null ][, object $controller = null ]) : mixed
Parameters
- $media_updater : object = null
-
a reference to the media updater that instatiated this object (if being run in MediaUpdater)
- $controller : object = null
-
a reference to the controller that instantiated this object (if being run in the web app)
Return values
mixed —checkPrerequisites()
Only update if its been more than a day since the last update and there are resources requiring description update
public
checkPrerequisites() : bool
Return values
bool —whether its been a daysince the last update
doTasks()
For each resource requiring description update, use the description search sources to find information
public
doTasks(array<string|int, mixed> $tasks) : mixed
Parameters
- $tasks : array<string|int, mixed>
-
array of description sources
Return values
mixed —the result of carrying out that processing
execNameServer()
Executes a method on the name server's JobController.
public
static execNameServer(string $command[, string $args = null ]) : array<string|int, mixed>
It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.
Parameters
- $command : string
-
the method to invoke on the name server
- $args : string = null
-
additional arguments to be passed to the name server
Return values
array<string|int, mixed> —data returned by the name server.
finishTasks()
This method is called on the name server to finish processing any data returned by MediaUpdater clients.
public
finishTasks() : mixed
Return values
mixed —getCurrentMachine()
Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
public
static getCurrentMachine() : string
Return values
string —hash of current machine url
getDetails()
Fetches the details on the url page using the xpaths values configured in search source
public
getDetails( $page, $source[, mixed $test_mode = false ]) : string
Parameters
- $page :
-
string the html string of the details page
- $source :
-
array search source details
- $test_mode : mixed = false
Return values
string —details found using xpaths
getJobName()
Gets the class name (less namespace and the word Job ) of the current MediaJob
public
static getJobName() : string
Return values
string —name of the current job
getTasks()
Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
public
getTasks(int $machine_id[, array<string|int, mixed> $data = null ]) : array<string|int, mixed>
Parameters
- $machine_id : int
-
id of client requesting data
- $data : array<string|int, mixed> = null
-
any additional info about data being requested
Return values
array<string|int, mixed> —work for the client to process
init()
Initializes the last update time to far in the past so, description will get immediately updated. Sets up connection to DB to fetch description search sources
public
init() : mixed
Return values
mixed —matchResourceSourcePathTerms()
Checks if the terms wiki page name followed by a path to a wiki resource contain the terms in a description search source string which would trigger that search source to get used
public
matchResourceSourcePathTerms(mixed $page_name_resource_path, string $source_term_string) : bool
Parameters
- $page_name_resource_path : mixed
- $source_term_string : string
-
a comma separated list of terms used by a description source to see if it can supply a description of the given resource.
Return values
bool —whether the path contained any of the source trigger terms
nondistributedTasks()
Get the description search sources from the local database and use those to run the same task as in the distributed setting
public
nondistributedTasks() : mixed
Return values
mixed —parseDescriptionAuxInfo()
Parses out the components of the auxiliary field of a description source.
public
static parseDescriptionAuxInfo(mixed &$source) : mixed
Parameters
- $source : mixed
Return values
mixed —prepareTasks()
This method is called on the name server to prepare data for any MediaUpdater clients.
public
prepareTasks() : mixed
Return values
mixed —processItem()
Processes $item, a DOMElement representing a search result for a description for the wiki resource $name, extracting a title and url. Form the title a match score with $name is obtained. This score and url as well as in test mode log messages are returned.
public
processItem( $item, $name, $source, $dom[, mixed $test_mode = false ]) : array<string|int, mixed>
Parameters
- $item :
-
DOMNode representing one possible description search result
- $name :
-
the wiki resource name we are trying to get a description of
- $source :
-
the source associative array with information about how to extract description from the current dom document and dom node.
- $dom :
-
DOMDocument of whole document node is from, used in creating DOMXpath object for quering $item.
- $test_mode : mixed = false
Return values
array<string|int, mixed> —$score, $url, $test_results $score of $item as a likely source for a description for the wiki resource $name, $url that $item point to with more information, $test_results log messages if in test mode.
putTasks()
After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
public
putTasks(int $machine_id, mixed $data) : array<string|int, mixed>
Parameters
- $machine_id : int
-
id of client that is sending data to name server
- $data : mixed
-
results of computation done by client
Return values
array<string|int, mixed> —any response information to send back to the client
run()
Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
public
run() : mixed
Return values
mixed —updateResourcesDescription()
Updates/finds descriptions for resources listed in a needs_description.txt in a wiki pages thumb subfolder.
public
updateResourcesDescription(array<string|int, mixed> $sources[, mixed $page_id_thumb_folder_path = "" ][, bool $test_mode = false ]) : string
It does this by iterating over all configured description search sources a until a match is found. It then saves the description in file at given resource thumb folder path
Parameters
- $sources : array<string|int, mixed>
-
associative array containing details of all search sources
- $page_id_thumb_folder_path : mixed = ""
- $test_mode : bool = false
-
used to return string in test mode
Return values
string —if $test_mode true