Yioop_V9.5_Source_Code_Documentation

DescriptionUpdateJob extends MediaJob
in package

A media job to periodically update descriptions of Wiki resources using Description Search Sources

Table of Contents

NEEDS_DESCRIPTION_FILE  = \seekquarry\yioop\configs\APP_DIR . "/resources/needs_descriptions.txt"
File to tell DescriptionUpdateJob that a wiki resource needs a description
$controller  : object
If MediaJob was instantiated in the web app, the controller that instatiated it
$db  : object
Datasource object used to run db queries related to fes items
$media_updater  : object
If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
$name_server_does_client_tasks  : bool
Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
$name_server_does_client_tasks_only  : bool
Whether this MediaJob performs name server only tasks
$page_id_thumb_folder_paths  : array<string|int, mixed>
Resource and thumb folders for all the wiki pages that have resources that need descriptions
$tasks  : array<string|int, mixed>
The most recently received from the name server tasks for this MediaJob
$update_time  : int
Time in current epoch when description last updated
__construct()  : mixed
Instiates the MediaJob with a reference to the object that instatiated it
checkPrerequisites()  : bool
Only update if its been more than a day since the last update and there are resources requiring description update
doTasks()  : mixed
For each resource requiring description update, use the description search sources to find information
execNameServer()  : array<string|int, mixed>
Executes a method on the name server's JobController.
finishTasks()  : mixed
This method is called on the name server to finish processing any data returned by MediaUpdater clients.
getCurrentMachine()  : string
Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
getDetails()  : string
Fetches the details on the url page using the xpaths values configured in search source
getJobName()  : string
Gets the class name (less namespace and the word Job ) of the current MediaJob
getTasks()  : array<string|int, mixed>
Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
init()  : mixed
Initializes the last update time to far in the past so, description will get immediately updated. Sets up connection to DB to fetch description search sources
matchResourceSourcePathTerms()  : bool
Checks if the terms wiki page name followed by a path to a wiki resource contain the terms in a description search source string which would trigger that search source to get used
nondistributedTasks()  : mixed
Get the description search sources from the local database and use those to run the same task as in the distributed setting
parseDescriptionAuxInfo()  : mixed
Parses out the components of the auxiliary field of a description source.
prepareTasks()  : mixed
This method is called on the name server to prepare data for any MediaUpdater clients.
processItem()  : array<string|int, mixed>
Processes $item, a DOMElement representing a search result for a description for the wiki resource $name, extracting a title and url. Form the title a match score with $name is obtained. This score and url as well as in test mode log messages are returned.
putTasks()  : array<string|int, mixed>
After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
run()  : mixed
Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
updateResourcesDescription()  : string
Updates/finds descriptions for resources listed in a needs_description.txt in a wiki pages thumb subfolder.

Constants

NEEDS_DESCRIPTION_FILE

File to tell DescriptionUpdateJob that a wiki resource needs a description

public mixed NEEDS_DESCRIPTION_FILE = \seekquarry\yioop\configs\APP_DIR . "/resources/needs_descriptions.txt"

Properties

$controller

If MediaJob was instantiated in the web app, the controller that instatiated it

public object $controller

$media_updater

If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater

public object $media_updater

$name_server_does_client_tasks

Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks

public bool $name_server_does_client_tasks

$name_server_does_client_tasks_only

Whether this MediaJob performs name server only tasks

public bool $name_server_does_client_tasks_only

$page_id_thumb_folder_paths

Resource and thumb folders for all the wiki pages that have resources that need descriptions

public array<string|int, mixed> $page_id_thumb_folder_paths

$tasks

The most recently received from the name server tasks for this MediaJob

public array<string|int, mixed> $tasks

$update_time

Time in current epoch when description last updated

public int $update_time

Methods

__construct()

Instiates the MediaJob with a reference to the object that instatiated it

public __construct([object $media_updater = null ][, object $controller = null ]) : mixed
Parameters
$media_updater : object = null

a reference to the media updater that instatiated this object (if being run in MediaUpdater)

$controller : object = null

a reference to the controller that instantiated this object (if being run in the web app)

Return values
mixed

checkPrerequisites()

Only update if its been more than a day since the last update and there are resources requiring description update

public checkPrerequisites() : bool
Return values
bool

whether its been a daysince the last update

doTasks()

For each resource requiring description update, use the description search sources to find information

public doTasks(array<string|int, mixed> $tasks) : mixed
Parameters
$tasks : array<string|int, mixed>

array of description sources

Return values
mixed

the result of carrying out that processing

execNameServer()

Executes a method on the name server's JobController.

public static execNameServer(string $command[, string $args = null ]) : array<string|int, mixed>

It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.

Parameters
$command : string

the method to invoke on the name server

$args : string = null

additional arguments to be passed to the name server

Return values
array<string|int, mixed>

data returned by the name server.

finishTasks()

This method is called on the name server to finish processing any data returned by MediaUpdater clients.

public finishTasks() : mixed
Return values
mixed

getCurrentMachine()

Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request

public static getCurrentMachine() : string
Return values
string

hash of current machine url

getDetails()

Fetches the details on the url page using the xpaths values configured in search source

public getDetails( $page,  $source[, mixed $test_mode = false ]) : string
Parameters
$page :

string the html string of the details page

$source :

array search source details

$test_mode : mixed = false
Return values
string

details found using xpaths

getJobName()

Gets the class name (less namespace and the word Job ) of the current MediaJob

public static getJobName() : string
Return values
string

name of the current job

getTasks()

Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.

public getTasks(int $machine_id[, array<string|int, mixed> $data = null ]) : array<string|int, mixed>
Parameters
$machine_id : int

id of client requesting data

$data : array<string|int, mixed> = null

any additional info about data being requested

Return values
array<string|int, mixed>

work for the client to process

init()

Initializes the last update time to far in the past so, description will get immediately updated. Sets up connection to DB to fetch description search sources

public init() : mixed
Return values
mixed

matchResourceSourcePathTerms()

Checks if the terms wiki page name followed by a path to a wiki resource contain the terms in a description search source string which would trigger that search source to get used

public matchResourceSourcePathTerms(mixed $page_name_resource_path, string $source_term_string) : bool
Parameters
$page_name_resource_path : mixed
$source_term_string : string

a comma separated list of terms used by a description source to see if it can supply a description of the given resource.

Return values
bool

whether the path contained any of the source trigger terms

nondistributedTasks()

Get the description search sources from the local database and use those to run the same task as in the distributed setting

public nondistributedTasks() : mixed
Return values
mixed

parseDescriptionAuxInfo()

Parses out the components of the auxiliary field of a description source.

public static parseDescriptionAuxInfo(mixed &$source) : mixed
Parameters
$source : mixed
Return values
mixed

prepareTasks()

This method is called on the name server to prepare data for any MediaUpdater clients.

public prepareTasks() : mixed
Return values
mixed

processItem()

Processes $item, a DOMElement representing a search result for a description for the wiki resource $name, extracting a title and url. Form the title a match score with $name is obtained. This score and url as well as in test mode log messages are returned.

public processItem( $item,  $name,  $source,  $dom[, mixed $test_mode = false ]) : array<string|int, mixed>
Parameters
$item :

DOMNode representing one possible description search result

$name :

the wiki resource name we are trying to get a description of

$source :

the source associative array with information about how to extract description from the current dom document and dom node.

$dom :

DOMDocument of whole document node is from, used in creating DOMXpath object for quering $item.

$test_mode : mixed = false
Return values
array<string|int, mixed>

$score, $url, $test_results $score of $item as a likely source for a description for the wiki resource $name, $url that $item point to with more information, $test_results log messages if in test mode.

putTasks()

After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server

public putTasks(int $machine_id, mixed $data) : array<string|int, mixed>
Parameters
$machine_id : int

id of client that is sending data to name server

$data : mixed

results of computation done by client

Return values
array<string|int, mixed>

any response information to send back to the client

run()

Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.

public run() : mixed
Return values
mixed

updateResourcesDescription()

Updates/finds descriptions for resources listed in a needs_description.txt in a wiki pages thumb subfolder.

public updateResourcesDescription(array<string|int, mixed> $sources[, mixed $page_id_thumb_folder_path = "" ][, bool $test_mode = false ]) : string

It does this by iterating over all configured description search sources a until a match is found. It then saves the description in file at given resource thumb folder path

Parameters
$sources : array<string|int, mixed>

associative array containing details of all search sources

$page_id_thumb_folder_path : mixed = ""
$test_mode : bool = false

used to return string in test mode

Return values
string

if $test_mode true


        

Search results