PodcastDownloadJob
extends MediaJob
in package
A media job to periodically download Podcasts and store them as resources of a Wiki Page
Table of Contents
- ITEM_EXPIRES_TIME = \seekquarry\yioop\configs\ONE_WEEK
- how long in seconds before a podcast item expires
- MAX_PODCASTS_ONE_GO = 100
- Mamimum number of feeds to download in one try
- $controller : object
- If MediaJob was instantiated in the web app, the controller that instatiated it
- $db : object
- Datasource object used to run db queries related to fes items (for storing and updating them)
- $group_model : object
- Instance of group model used to get a list of podcasts
- $media_updater : object
- If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
- $name_server_does_client_tasks : bool
- Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
- $name_server_does_client_tasks_only : bool
- Whether this MediaJob performs name server only tasks
- $tasks : array<string|int, mixed>
- The most recently received from the name server tasks for this MediaJob
- $update_time : int
- Time in current epoch when feeds last updated
- __construct() : mixed
- Instiates the MediaJob with a reference to the object that instatiated it
- checkPrerequisites() : bool
- Only update if its been more than an hour since the last update
- doTasks() : mixed
- For each podcast source downloads the podcast web file, checks which podcast items are not in the database, adds them.
- downloadPodcastItem() : string
- Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.
- downloadPodcastItemIfNew() : bool
- Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.
- execNameServer() : array<string|int, mixed>
- Executes a method on the name server's JobController.
- finishTasks() : mixed
- This method is called on the name server to finish processing any data returned by MediaUpdater clients.
- getCurrentMachine() : string
- Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
- getJobName() : string
- Gets the class name (less namespace and the word Job ) of the current MediaJob
- getLinkFromQueryPage() : string
- Used to extract a URL from a page either as a string of in dom form and to canonicalize it based on a starting url.
- getTasks() : array<string|int, mixed>
- Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
- init() : mixed
- Initializes the last update time to far in the past so, feeds will get immediately updated. Sets up connect to DB to store feeds items, and makes it so the same media job runs both on name server and client Media Updaters
- nondistributedTasks() : mixed
- Get the media sources from the local database and use those to run the the same task as in the distributed setting
- parsePodcastAuxInfo() : mixed
- Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.
- prepareTasks() : mixed
- This method is called on the name server to prepare data for any MediaUpdater clients.
- processFeedPodcast() : mixed
- Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.
- processHtmlPodcast() : array<string|int, mixed>
- Used to download the media item associated with an HTML scrape podcast
- putTasks() : array<string|int, mixed>
- After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
- run() : mixed
- Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
- updatePodcastsOneGo() : mixed
- For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.
- getPage() : string
- Downloads the internet page with the give url.
- makeFileNamePattern() : string
- Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder
- makeFolder() : bool
- Makes a directory in a way compatible with yioop's error handling.
Constants
ITEM_EXPIRES_TIME
how long in seconds before a podcast item expires
public
mixed
ITEM_EXPIRES_TIME
= \seekquarry\yioop\configs\ONE_WEEK
MAX_PODCASTS_ONE_GO
Mamimum number of feeds to download in one try
public
mixed
MAX_PODCASTS_ONE_GO
= 100
Properties
$controller
If MediaJob was instantiated in the web app, the controller that instatiated it
public
object
$controller
$db
Datasource object used to run db queries related to fes items (for storing and updating them)
public
object
$db
$group_model
Instance of group model used to get a list of podcasts
public
object
$group_model
$media_updater
If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
public
object
$media_updater
$name_server_does_client_tasks
Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
public
bool
$name_server_does_client_tasks
$name_server_does_client_tasks_only
Whether this MediaJob performs name server only tasks
public
bool
$name_server_does_client_tasks_only
$tasks
The most recently received from the name server tasks for this MediaJob
public
array<string|int, mixed>
$tasks
$update_time
Time in current epoch when feeds last updated
public
int
$update_time
Methods
__construct()
Instiates the MediaJob with a reference to the object that instatiated it
public
__construct([object $media_updater = null ][, object $controller = null ]) : mixed
Parameters
- $media_updater : object = null
-
a reference to the media updater that instatiated this object (if being run in MediaUpdater)
- $controller : object = null
-
a reference to the controller that instantiated this object (if being run in the web app)
Return values
mixed —checkPrerequisites()
Only update if its been more than an hour since the last update
public
checkPrerequisites() : bool
Return values
bool —whether its been an hour since the last update
doTasks()
For each podcast source downloads the podcast web file, checks which podcast items are not in the database, adds them.
public
doTasks(array<string|int, mixed> $tasks) : mixed
Parameters
- $tasks : array<string|int, mixed>
-
array of feed info (url to download, paths to extract etc)
Return values
mixed —the result of carrying out that processing
downloadPodcastItem()
Helper method to @see downloadPodcastItemIfNew called when it is known that a podcast item should be downloaded. It downloads a podcast item.
public
downloadPodcastItem(string $url[, string $type = "mp4" ][, array<string|int, mixed> $audiolist_urls = [] ]) : string
If the podcast item is an intermediate file pointing to several items to download such as video. It downloads these and concatenates them to makes a single video.
Parameters
- $url : string
-
of podcast item to download
- $type : string = "mp4"
-
file type of podcast item
- $audiolist_urls : array<string|int, mixed> = []
-
an array of audio urls to download if this has already been obtained
Return values
string —with podcast item if successful or false otherwise
downloadPodcastItemIfNew()
Given a podcast item from a podcast feed page determines if it has been downloaded or not and if not whether it is recent enough to download. If it is recent enough, it scrapes the file to download and downloads any other intermediate files need to find the file to download, then finally downloads this podcast item. If the podcast item is built out of multiple videos, it concatenates them and makes a single video. It then moves the podcast item to the appropriate wiki folder.
public
downloadPodcastItemIfNew(array<string|int, mixed> $item, array<string|int, mixed> &$podcast, int $age) : bool
Parameters
- $item : array<string|int, mixed>
-
an associative array about one item on a podcast feed page
- $podcast : array<string|int, mixed>
-
a reference to an associate array of the podcast feed the item is from. This is used for the language etc of the item and is also used to store updates to what podcasts have already been downloaded
- $age : int
-
how many seconds ago is still considered a recent enough podcast to process
Return values
bool —whether downloaded or not.
execNameServer()
Executes a method on the name server's JobController.
public
static execNameServer(string $command[, string $args = null ]) : array<string|int, mixed>
It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.
Parameters
- $command : string
-
the method to invoke on the name server
- $args : string = null
-
additional arguments to be passed to the name server
Return values
array<string|int, mixed> —data returned by the name server.
finishTasks()
This method is called on the name server to finish processing any data returned by MediaUpdater clients.
public
finishTasks() : mixed
Return values
mixed —getCurrentMachine()
Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
public
static getCurrentMachine() : string
Return values
string —hash of current machine url
getJobName()
Gets the class name (less namespace and the word Job ) of the current MediaJob
public
static getJobName() : string
Return values
string —name of the current job
getLinkFromQueryPage()
Used to extract a URL from a page either as a string of in dom form and to canonicalize it based on a starting url.
public
getLinkFromQueryPage(string $xpath, string $page, string $dom, string $source_url) : string
Parameters
- $xpath : string
-
either an xpath to look into a dom object or a regex to search a page as a string
- $page : string
-
source page to search in as a string
- $dom : string
-
source page as a dom object
- $source_url : string
-
url to use to canonicalize an incomplete url if the extraction only produces part of a url
Return values
string —desired url link
getTasks()
Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
public
getTasks(int $machine_id[, array<string|int, mixed> $data = null ]) : array<string|int, mixed>
Parameters
- $machine_id : int
-
id of client requesting data
- $data : array<string|int, mixed> = null
-
any additional info about data being requested
Return values
array<string|int, mixed> —work for the client to process
init()
Initializes the last update time to far in the past so, feeds will get immediately updated. Sets up connect to DB to store feeds items, and makes it so the same media job runs both on name server and client Media Updaters
public
init() : mixed
Return values
mixed —nondistributedTasks()
Get the media sources from the local database and use those to run the the same task as in the distributed setting
public
nondistributedTasks() : mixed
Return values
mixed —parsePodcastAuxInfo()
Used to fill in details for an associative arrays containing the details of a Wiki feed and scrape podcast which should be examined to see if new items should be downloaded to wiki pages. As part of processing expired feed items for the given wiki might be deleted.
public
parsePodcastAuxInfo(array<string|int, mixed> &$podcast[, bool $test_mode = false ]) : mixed
Parameters
- $podcast : array<string|int, mixed>
-
after running will contain an associative array of details about a particular podcast. The input podcast is assumed to have at least the NAME, WIKI_PAGE, AUX_PATH, and CATEGORY fields filled in. The latter with the time in seconds till item expires. If successful the MAX_AGE (which is esseentially the value the CATEGORY field), WIKI_FILE_PATTERN, WIKI_PAGE_FOLDERS, and PREVIOUSLY_DOWNLOADED folders will be filled in.
- $test_mode : bool = false
-
if true then does not cull expired feed items from disk, but will return previously downloaded as if it had.
Return values
mixed —prepareTasks()
This method is called on the name server to prepare data for any MediaUpdater clients.
public
prepareTasks() : mixed
Return values
mixed —processFeedPodcast()
Processes the page contents of one podcast feed. Determines which podcast files on that page are fresh and if a podcast is fresh downloads it and moves it to the appropariate wiki folder.
public
processFeedPodcast(array<string|int, mixed> &$podcast, int $age[, bool $test_mode = false ]) : mixed
Parameters
- $podcast : array<string|int, mixed>
-
associative array containing page data for a podcast feed page (not the video or audio files of a particular podcast on that page) together with rules for how to process it
- $age : int
-
how many seconds ago is still considered a recent enough podcast to process
- $test_mode : bool = false
-
if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast
Return values
mixed —either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts
processHtmlPodcast()
Used to download the media item associated with an HTML scrape podcast
public
processHtmlPodcast(array<string|int, mixed> &$podcast, int $age[, bool $test_mode = false ]) : array<string|int, mixed>
Parameters
- $podcast : array<string|int, mixed>
-
associative array containing info about the location, how to handle, and where to download the podcast
- $age : int
-
max age of an the media item to be considered for download
- $test_mode : bool = false
-
if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast
Return values
array<string|int, mixed> —[whether item downloaded, test_mode_info_string if applicable or "" otherwise]
putTasks()
After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
public
putTasks(int $machine_id, mixed $data) : array<string|int, mixed>
Parameters
- $machine_id : int
-
id of client that is sending data to name server
- $data : mixed
-
results of computation done by client
Return values
array<string|int, mixed> —any response information to send back to the client
run()
Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
public
run() : mixed
Return values
mixed —updatePodcastsOneGo()
For each of a supplied list of podcast associative arrays, downloads the non-expired media for that podcast to the wiki folder specified.
public
updatePodcastsOneGo(mixed $podcasts[, int $age = CONE_WEEK ][, bool $test_mode = false ]) : mixed
Parameters
- $podcasts : mixed
- $age : int = CONE_WEEK
-
oldest age items to consider for download
- $test_mode : bool = false
-
if true then rather then updating items in wiki, returns as a string summarizing the results of the downloads that would occur as part of updating the podcast
Return values
mixed —either true, or if $test_mode is true then the results as a string of the operations involved in downloading the podcasts
getPage()
Downloads the internet page with the give url.
private
getPage( $url) : string
Parameters
Return values
string —contents of downloaded page
makeFileNamePattern()
Used to construct a filename for a downloaded podcast item suitable to be used when stored in a wiki page's resource folder
private
makeFileNamePattern(string $file_name, string $file_pattern[, string $title = "" ][, int $pubdate = null ]) : string
Parameters
- $file_name : string
-
name of file
- $file_pattern : string
-
string which can contain %F for previous filename, %T for title, and date %date_command, for example, %Y for year, %m for month, %d for day, etc. These will be substituted with their values when wriitng out the wiki name for the downloaded podcast item.
- $title : string = ""
-
a title string for wiki item
- $pubdate : int = null
-
when the wiki item was published as a Unix timestamp. The value of this is used when computing values for the $file_pattern
Return values
string —output filename for wiki item
makeFolder()
Makes a directory in a way compatible with yioop's error handling.
private
makeFolder(string $folder) : bool
Parameters
- $folder : string
-
name of directory/folder to create.
Return values
bool —whether directory was created