Yioop_V9.5_Source_Code_Documentation

PodcastDownloadJob extends MediaJob
in package

A media job to periodically download Podcasts and store them as resources of a Wiki Page

Table of Contents

ITEM_EXPIRES_TIME  = \seekquarry\yioop\configs\ONE_WEEK
how long in seconds before a podcast item expires
MAX_PODCASTS_ONE_GO  = 100
Mamimum number of feeds to download in one try
$controller  : object
If MediaJob was instantiated in the web app, the controller that instatiated it
$db  : object
Datasource object used to run db queries related to fes items (for storing and updating them)
$group_model  : object
Instance of group model used to get a list of podcasts
$media_updater  : object
If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
$name_server_does_client_tasks  : bool
Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
$name_server_does_client_tasks_only  : bool
Whether this MediaJob performs name server only tasks
$tasks  : array<string|int, mixed>
The most recently received from the name server tasks for this MediaJob
$update_time  : int
Time in current epoch when feeds last updated
__construct()  : mixed
Instiates the MediaJob with a reference to the object that instatiated it
checkPrerequisites()  : bool
Only update if its been more than an hour since the last update
doTasks()  : mixed
For each podcast source downloads the podcast web file, checks which podcast items are not in the database, adds them.
downloadPodcastItem()  : string
Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.
downloadPodcastItemIfNew()  : bool
Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.
execNameServer()  : array<string|int, mixed>
Executes a method on the name server's JobController.
finishTasks()  : mixed
This method is called on the name server to finish processing any data returned by MediaUpdater clients.
getCurrentMachine()  : string
Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
getJobName()  : string
Gets the class name (less namespace and the word Job ) of the current MediaJob
getLinkFromQueryPage()  : string
Used to extract a URL from a page either as a string of in dom form and to canonicalize it based on a starting url.
getTasks()  : array<string|int, mixed>
Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
init()  : mixed
Initializes the last update time to far in the past so, feeds will get immediately updated. Sets up connect to DB to store feeds items, and makes it so the same media job runs both on name server and client Media Updaters
nondistributedTasks()  : mixed
Get the media sources from the local database and use those to run the the same task as in the distributed setting
parsePodcastAuxInfo()  : mixed
Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.
prepareTasks()  : mixed
This method is called on the name server to prepare data for any MediaUpdater clients.
processFeedPodcast()  : mixed
Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.
processHtmlPodcast()  : array<string|int, mixed>
Used to download the media item associated with an HTML scrape podcast
putTasks()  : array<string|int, mixed>
After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
run()  : mixed
Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
updatePodcastsOneGo()  : mixed
For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.
getPage()  : string
Downloads the internet page with the give url.
makeFileNamePattern()  : string
Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder
makeFolder()  : bool
Makes a directory in a way compatible with yioop's error handling.

Constants

ITEM_EXPIRES_TIME

how long in seconds before a podcast item expires

public mixed ITEM_EXPIRES_TIME = \seekquarry\yioop\configs\ONE_WEEK

MAX_PODCASTS_ONE_GO

Mamimum number of feeds to download in one try

public mixed MAX_PODCASTS_ONE_GO = 100

Properties

$controller

If MediaJob was instantiated in the web app, the controller that instatiated it

public object $controller

$db

Datasource object used to run db queries related to fes items (for storing and updating them)

public object $db

$group_model

Instance of group model used to get a list of podcasts

public object $group_model

$media_updater

If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater

public object $media_updater

$name_server_does_client_tasks

Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks

public bool $name_server_does_client_tasks

$name_server_does_client_tasks_only

Whether this MediaJob performs name server only tasks

public bool $name_server_does_client_tasks_only

$tasks

The most recently received from the name server tasks for this MediaJob

public array<string|int, mixed> $tasks

$update_time

Time in current epoch when feeds last updated

public int $update_time

Methods

__construct()

Instiates the MediaJob with a reference to the object that instatiated it

public __construct([object $media_updater = null ][, object $controller = null ]) : mixed
Parameters
$media_updater : object = null

a reference to the media updater that instatiated this object (if being run in MediaUpdater)

$controller : object = null

a reference to the controller that instantiated this object (if being run in the web app)

Return values
mixed

checkPrerequisites()

Only update if its been more than an hour since the last update

public checkPrerequisites() : bool
Return values
bool

whether its been an hour since the last update

doTasks()

For each podcast source downloads the podcast web file, checks which podcast items are not in the database, adds them.

public doTasks(array<string|int, mixed> $tasks) : mixed
Parameters
$tasks : array<string|int, mixed>

array of feed info (url to download, paths to extract etc)

Return values
mixed

the result of carrying out that processing

downloadPodcastItem()

Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.

public downloadPodcastItem(string $url[, string $type = "mp4" ][, array<string|int, mixed> $audiolist_urls = [] ]) : string

If the podcast item is an intermediate file pointing to several items to download such as video. It downloads these and concatenates them to makes a single video.

Parameters
$url : string

of podcast item to download

$type : string = "mp4"

file type of podcast item

$audiolist_urls : array<string|int, mixed> = []

an array of audio urls to download if this has already been obtained

Return values
string

with podcast item if successful or false otherwise

downloadPodcastItemIfNew()

Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.

public downloadPodcastItemIfNew(array<string|int, mixed> $item, array<string|int, mixed> &$podcast, int $age) : bool
Parameters
$item : array<string|int, mixed>

an associative array about one item on a podcast feed page

$podcast : array<string|int, mixed>

a reference to an associate array of the podcast feed the item is from. This is used for the language etc of the item and is also used to store updates to what podcasts have already been downloaded

$age : int

how many seconds ago is still considered a recent enough podcast to process

Return values
bool

whether downloaded or not.

execNameServer()

Executes a method on the name server's JobController.

public static execNameServer(string $command[, string $args = null ]) : array<string|int, mixed>

It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.

Parameters
$command : string

the method to invoke on the name server

$args : string = null

additional arguments to be passed to the name server

Return values
array<string|int, mixed>

data returned by the name server.

finishTasks()

This method is called on the name server to finish processing any data returned by MediaUpdater clients.

public finishTasks() : mixed
Return values
mixed

getCurrentMachine()

Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request

public static getCurrentMachine() : string
Return values
string

hash of current machine url

getJobName()

Gets the class name (less namespace and the word Job ) of the current MediaJob

public static getJobName() : string
Return values
string

name of the current job

getLinkFromQueryPage()

Used to extract a URL from a page either as a string of in dom form and to canonicalize it based on a starting url.

public getLinkFromQueryPage(string $xpath, string $page, string $dom, string $source_url) : string
Parameters
$xpath : string

either an xpath to look into a dom object or a regex to search a page as a string

$page : string

source page to search in as a string

$dom : string

source page as a dom object

$source_url : string

url to use to canonicalize an incomplete url if the extraction only produces part of a url

Return values
string

desired url link

getTasks()

Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.

public getTasks(int $machine_id[, array<string|int, mixed> $data = null ]) : array<string|int, mixed>
Parameters
$machine_id : int

id of client requesting data

$data : array<string|int, mixed> = null

any additional info about data being requested

Return values
array<string|int, mixed>

work for the client to process

init()

Initializes the last update time to far in the past so, feeds will get immediately updated. Sets up connect to DB to store feeds items, and makes it so the same media job runs both on name server and client Media Updaters

public init() : mixed
Return values
mixed

nondistributedTasks()

Get the media sources from the local database and use those to run the the same task as in the distributed setting

public nondistributedTasks() : mixed
Return values
mixed

parsePodcastAuxInfo()

Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.

public parsePodcastAuxInfo(array<string|int, mixed> &$podcast[, bool $test_mode = false ]) : mixed
Parameters
$podcast : array<string|int, mixed>

after running will contain an associative array of details about a particular podcast. The input podcast is assumed to have at least the NAME, WIKI_PAGE, AUX_PATH, and CATEGORY fields filled in. The latter with the time in seconds till item expires. If successful the MAX_AGE (which is esseentially the value the CATEGORY field), WIKI_FILE_PATTERN, WIKI_PAGE_FOLDERS, and PREVIOUSLY_DOWNLOADED folders will be filled in.

$test_mode : bool = false

if true then does not cull expired feed items from disk, but will return previously downloaded as if it had.

Return values
mixed

prepareTasks()

This method is called on the name server to prepare data for any MediaUpdater clients.

public prepareTasks() : mixed
Return values
mixed

processFeedPodcast()

Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.

public processFeedPodcast(array<string|int, mixed> &$podcast, int $age[, bool $test_mode = false ]) : mixed
Parameters
$podcast : array<string|int, mixed>

associative array containing page data for a podcast feed page (not the video or audio files of a particular podcast on that page) together with rules for how to process it

$age : int

how many seconds ago is still considered a recent enough podcast to process

$test_mode : bool = false

if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Return values
mixed

either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts

processHtmlPodcast()

Used to download the media item associated with an HTML scrape podcast

public processHtmlPodcast(array<string|int, mixed> &$podcast, int $age[, bool $test_mode = false ]) : array<string|int, mixed>
Parameters
$podcast : array<string|int, mixed>

associative array containing info about the location, how to handle, and where to download the podcast

$age : int

max age of an the media item to be considered for download

$test_mode : bool = false

if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Return values
array<string|int, mixed>

[whether item downloaded, test_mode_info_string if applicable or "" otherwise]

putTasks()

After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server

public putTasks(int $machine_id, mixed $data) : array<string|int, mixed>
Parameters
$machine_id : int

id of client that is sending data to name server

$data : mixed

results of computation done by client

Return values
array<string|int, mixed>

any response information to send back to the client

run()

Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.

public run() : mixed
Return values
mixed

updatePodcastsOneGo()

For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.

public updatePodcastsOneGo(mixed $podcasts[, int $age = CONE_WEEK ][, bool $test_mode = false ]) : mixed
Parameters
$podcasts : mixed
$age : int = CONE_WEEK

oldest age items to consider for download

$test_mode : bool = false

if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Return values
mixed

either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts

getPage()

Downloads the internet page with the give url.

private getPage( $url) : string
Parameters
$url :

The url want to download

Return values
string

contents of downloaded page

makeFileNamePattern()

Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder

private makeFileNamePattern(string $file_name, string $file_pattern[, string $title = "" ][, int $pubdate = null ]) : string
Parameters
$file_name : string

name of file

$file_pattern : string

string which can contain %F for previous filename, %T for title, and date %date_command, for example, %Y for year, %m for month, %d for day, etc. These will be substituted with their values when wriitng out the wiki name for the downloaded podcast item.

$title : string = ""

a title string for wiki item

$pubdate : int = null

when the wiki item was published as a Unix timestamp. The value of this is used when computing values for the $file_pattern

Return values
string

output filename for wiki item

makeFolder()

Makes a directory in a way compatible with yioop's error handling.

private makeFolder(string $folder) : bool
Parameters
$folder : string

name of directory/folder to create.

Return values
bool

whether directory was created


        

Search results