Yioop_V9.5_Source_Code

PodcastDownloadJob extends MediaJob
in package

Application

A media job to periodically download Podcasts and store them as resources of a Wiki Page

ITEM_EXPIRES_TIME = \seekquarry\yioop\configs\ONE_WEEK: how long in seconds before a podcast item expires
MAX_PODCASTS_ONE_GO = 100: Mamimum number of feeds to download in one try
$controller : object: If MediaJob was instantiated in the web app, the controller that instatiated it
$db : object: Datasource object used to run db queries related to fes items (for storing and updating them)
$group_model : object: Instance of group model used to get a list of podcasts
$media_updater : object: If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
$name_server_does_client_tasks : bool: Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
$name_server_does_client_tasks_only : bool: Whether this MediaJob performs name server only tasks
$tasks : array<string|int, mixed>: The most recently received from the name server tasks for this MediaJob
$update_time : int: Time in current epoch when feeds last updated
__construct() : mixed: Instiates the MediaJob with a reference to the object that instatiated it
checkPrerequisites() : bool: Only update if its been more than an hour since the last update
doTasks() : mixed: For each podcast source downloads the podcast web file, checks which podcast items are not in the database, adds them.
downloadPodcastItem() : string: Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.
downloadPodcastItemIfNew() : bool: Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.
execNameServer() : array<string|int, mixed>: Executes a method on the name server's JobController.
finishTasks() : mixed: This method is called on the name server to finish processing any data returned by MediaUpdater clients.
getCurrentMachine() : string: Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
getJobName() : string: Gets the class name (less namespace and the word Job ) of the current MediaJob
getLinkFromQueryPage() : string: Used to extract a URL from a page either as a string of in dom form and to canonicalize it based on a starting url.
getTasks() : array<string|int, mixed>: Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
init() : mixed: Initializes the last update time to far in the past so, feeds will get immediately updated. Sets up connect to DB to store feeds items, and makes it so the same media job runs both on name server and client Media Updaters
nondistributedTasks() : mixed: Get the media sources from the local database and use those to run the the same task as in the distributed setting
parsePodcastAuxInfo() : mixed: Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.
prepareTasks() : mixed: This method is called on the name server to prepare data for any MediaUpdater clients.
processFeedPodcast() : mixed: Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.
processHtmlPodcast() : array<string|int, mixed>: Used to download the media item associated with an HTML scrape podcast
putTasks() : array<string|int, mixed>: After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
run() : mixed: Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
updatePodcastsOneGo() : mixed: For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.
getPage() : string: Downloads the internet page with the give url.
makeFileNamePattern() : string: Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder
makeFolder() : bool: Makes a directory in a way compatible with yioop's error handling.

ITEM_EXPIRES_TIME

how long in seconds before a podcast item expires


    public
        mixed
    ITEM_EXPIRES_TIME
    = \seekquarry\yioop\configs\ONE_WEEK

MAX_PODCASTS_ONE_GO

Mamimum number of feeds to download in one try


    public
        mixed
    MAX_PODCASTS_ONE_GO
    = 100

$controller

If MediaJob was instantiated in the web app, the controller that instatiated it


    public
        object
    $controller

$db

Datasource object used to run db queries related to fes items (for storing and updating them)


    public
        object
    $db

$group_model

Instance of group model used to get a list of podcasts


    public
        object
    $group_model

$media_updater

If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater


    public
        object
    $media_updater

$name_server_does_client_tasks

Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks


    public
        bool
    $name_server_does_client_tasks

$name_server_does_client_tasks_only

Whether this MediaJob performs name server only tasks


    public
        bool
    $name_server_does_client_tasks_only

$tasks

The most recently received from the name server tasks for this MediaJob


    public
        array<string|int, mixed>
    $tasks

$update_time

Time in current epoch when feeds last updated


    public
        int
    $update_time

__construct()

Instiates the MediaJob with a reference to the object that instatiated it


    public
                    __construct([object $media_updater = null ][, object $controller = null ]) : mixed

Parameters

$media_updater : object = null: a reference to the media updater that instatiated this object (if being run in MediaUpdater)
$controller : object = null: a reference to the controller that instantiated this object (if being run in the web app)

Return values

mixed —

checkPrerequisites()

Only update if its been more than an hour since the last update


    public
                    checkPrerequisites() : bool

Return values

bool —

whether its been an hour since the last update

doTasks()

For each podcast source downloads the podcast web file, checks which podcast items are not in the database, adds them.


    public
                    doTasks(array<string|int, mixed> $tasks) : mixed

Parameters

$tasks : array<string|int, mixed>: array of feed info (url to download, paths to extract etc)

Return values

mixed —

the result of carrying out that processing

downloadPodcastItem()

Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.


    public
                    downloadPodcastItem(string $url[, string $type = "mp4" ][, array<string|int, mixed> $audiolist_urls = [] ]) : string

If the podcast item is an intermediate file pointing to several items to download such as video. It downloads these and concatenates them to makes a single video.

Parameters

$url : string: of podcast item to download
$type : string = "mp4": file type of podcast item
$audiolist_urls : array<string|int, mixed> = []: an array of audio urls to download if this has already been obtained

Return values

string —

with podcast item if successful or false otherwise

downloadPodcastItemIfNew()

Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.


    public
                    downloadPodcastItemIfNew(array<string|int, mixed> $item, array<string|int, mixed> &$podcast, int $age) : bool

Parameters

$item : array<string|int, mixed>: an associative array about one item on a podcast feed page
$podcast : array<string|int, mixed>: a reference to an associate array of the podcast feed the item is from. This is used for the language etc of the item and is also used to store updates to what podcasts have already been downloaded
$age : int: how many seconds ago is still considered a recent enough podcast to process

Return values

bool —

whether downloaded or not.

execNameServer()

Executes a method on the name server's JobController.


    public
            static        execNameServer(string $command[, string $args = null ]) : array<string|int, mixed>

It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.

Parameters

$command : string: the method to invoke on the name server
$args : string = null: additional arguments to be passed to the name server

Return values

array<string|int, mixed> —

data returned by the name server.

finishTasks()

This method is called on the name server to finish processing any data returned by MediaUpdater clients.


    public
                    finishTasks() : mixed

Return values

mixed —

getCurrentMachine()

Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request


    public
            static        getCurrentMachine() : string

Return values

string —

hash of current machine url

getJobName()

Gets the class name (less namespace and the word Job ) of the current MediaJob


    public
            static        getJobName() : string

Return values

string —

name of the current job

getLinkFromQueryPage()

Used to extract a URL from a page either as a string of in dom form and to canonicalize it based on a starting url.


    public
                    getLinkFromQueryPage(string $xpath, string $page, string $dom, string $source_url) : string

Parameters

$xpath : string: either an xpath to look into a dom object or a regex to search a page as a string
$page : string: source page to search in as a string
$dom : string: source page as a dom object
$source_url : string: url to use to canonicalize an incomplete url if the extraction only produces part of a url

Return values

string —

desired url link

getTasks()

Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.


    public
                    getTasks(int $machine_id[, array<string|int, mixed> $data = null ]) : array<string|int, mixed>

Parameters

$machine_id : int: id of client requesting data
$data : array<string|int, mixed> = null: any additional info about data being requested

Return values

array<string|int, mixed> —

work for the client to process

init()

Initializes the last update time to far in the past so, feeds will get immediately updated. Sets up connect to DB to store feeds items, and makes it so the same media job runs both on name server and client Media Updaters


    public
                    init() : mixed

Return values

mixed —

nondistributedTasks()

Get the media sources from the local database and use those to run the the same task as in the distributed setting


    public
                    nondistributedTasks() : mixed

Return values

mixed —

parsePodcastAuxInfo()

Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.


    public
                    parsePodcastAuxInfo(array<string|int, mixed> &$podcast[, bool $test_mode = false ]) : mixed

Parameters

$podcast : array<string|int, mixed>: after running will contain an associative array of details about a particular podcast. The input podcast is assumed to have at least the NAME, WIKI_PAGE, AUX_PATH, and CATEGORY fields filled in. The latter with the time in seconds till item expires. If successful the MAX_AGE (which is esseentially the value the CATEGORY field), WIKI_FILE_PATTERN, WIKI_PAGE_FOLDERS, and PREVIOUSLY_DOWNLOADED folders will be filled in.
$test_mode : bool = false: if true then does not cull expired feed items from disk, but will return previously downloaded as if it had.

Return values

mixed —

prepareTasks()

This method is called on the name server to prepare data for any MediaUpdater clients.


    public
                    prepareTasks() : mixed

Return values

mixed —

processFeedPodcast()

Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.


    public
                    processFeedPodcast(array<string|int, mixed> &$podcast, int $age[, bool $test_mode = false ]) : mixed

Parameters

$podcast : array<string|int, mixed>: associative array containing page data for a podcast feed page (not the video or audio files of a particular podcast on that page) together with rules for how to process it
$age : int: how many seconds ago is still considered a recent enough podcast to process
$test_mode : bool = false: if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Return values

mixed —

either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts

processHtmlPodcast()

Used to download the media item associated with an HTML scrape podcast


    public
                    processHtmlPodcast(array<string|int, mixed> &$podcast, int $age[, bool $test_mode = false ]) : array<string|int, mixed>

Parameters

$podcast : array<string|int, mixed>: associative array containing info about the location, how to handle, and where to download the podcast
$age : int: max age of an the media item to be considered for download
$test_mode : bool = false: if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Return values

array<string|int, mixed> —

[whether item downloaded, test_mode_info_string if applicable or "" otherwise]

putTasks()

After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server


    public
                    putTasks(int $machine_id, mixed $data) : array<string|int, mixed>

Parameters

$machine_id : int: id of client that is sending data to name server
$data : mixed: results of computation done by client

Return values

array<string|int, mixed> —

any response information to send back to the client

run()

Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.


    public
                    run() : mixed

Return values

mixed —

updatePodcastsOneGo()

For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.


    public
                    updatePodcastsOneGo(mixed $podcasts[, int $age = CONE_WEEK ][, bool $test_mode = false ]) : mixed

Parameters

$podcasts : mixed
$age : int = CONE_WEEK: oldest age items to consider for download
$test_mode : bool = false: if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast

Return values

mixed —

either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts

getPage()

Downloads the internet page with the give url.


    private
                    getPage( $url) : string

Parameters

$url :: The url want to download

Return values

string —

contents of downloaded page

makeFileNamePattern()

Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder


    private
                    makeFileNamePattern(string $file_name, string $file_pattern[, string $title = "" ][, int $pubdate = null ]) : string

Parameters

$file_name : string: name of file
$file_pattern : string: string which can contain %F for previous filename, %T for title, and date %date_command, for example, %Y for year, %m for month, %d for day, etc. These will be substituted with their values when wriitng out the wiki name for the downloaded podcast item.
$title : string = "": a title string for wiki item
$pubdate : int = null: when the wiki item was published as a Unix timestamp. The value of this is used when computing values for the $file_pattern

Return values

string —

output filename for wiki item

makeFolder()

Makes a directory in a way compatible with yioop's error handling.


    private
                    makeFolder(string $folder) : bool

Parameters

$folder : string: name of directory/folder to create.

Return values

bool —

whether directory was created

PodcastDownloadJob extends MediaJob in package Application

Table of Contents

Constants

ITEM_EXPIRES_TIME

MAX_PODCASTS_ONE_GO

Properties

$controller

$db

$group_model

$media_updater

$name_server_does_client_tasks

$name_server_does_client_tasks_only

$tasks

$update_time

Methods

__construct()

Parameters

Return values

checkPrerequisites()

Return values

doTasks()

Parameters

Return values

downloadPodcastItem()

Parameters

Return values

downloadPodcastItemIfNew()

Parameters

Return values

execNameServer()

Parameters

Return values

finishTasks()

Return values

getCurrentMachine()

Return values

getJobName()

Return values

getLinkFromQueryPage()

Parameters

Return values

getTasks()

Parameters

Return values

init()

Return values

nondistributedTasks()

Return values

parsePodcastAuxInfo()

Parameters

Return values

prepareTasks()

Return values

processFeedPodcast()

Parameters

Return values

processHtmlPodcast()

Parameters

Return values

putTasks()

Parameters

Return values

run()

Return values

updatePodcastsOneGo()

Parameters

Return values

getPage()

Parameters

Return values

makeFileNamePattern()

Parameters

Return values

makeFolder()

Parameters

Return values

PodcastDownloadJob extends MediaJob
in package

Application