VideoProcessor
extends PageProcessor
in package
Base abstract class common to all processors used to create crawl summary information from videos
Tags
Table of Contents
- MIN_ANIMATE_LENGTH = 60
- Minimum duration movie (in seconds ) before make an animated thumbnail
- NUM_ANIMATED_THUMBS = 10
- Number of images to use for an animated thumbnail
- $image_types : array<string|int, mixed>
- Array filetypes which should be considered images.
- $indexed_file_types : array<string|int, mixed>
- Array of file extensions which can be handled by the search engine, other extensions will be ignored.
- $max_description_len : int
- Max number of chars to extract for description from a page to index.
- $max_links_to_extract : int
- Maximum number of urls to extract from a single document
- $mime_processor : array<string|int, mixed>
- Associative array of mime_type => (page processor name that can process that type) Sub-classes add to this array with the types they handle
- $plugin_instances : array<string|int, mixed>
- indexing_plugins which might be used with the current processor
- $summarizer : object
- Stores the summarizer object used by this instance of page processor to be used in generating a summary
- $summarizer_option : string
- Stores the name of the summarizer used for crawling.
- $text_data : bool
- Whether the current processor is for text data (i.e., text, html, xml, etc) or for some other format (gif, png, etc)
- __construct() : mixed
- Set-ups the any indexing plugins associated with this page processor
- createThumbs() : mixed
- Used to create an thumbnail file to a thumb folder from an .mp4 file also creates an animated gif file.
- getDuration() : float
- Returns how long a video is in seconds
- handle() : array<string|int, mixed>
- Method used to handle processing data for a web page. It makes a summary for the page (via the process() function which should be subclassed) as well as runs any plugins that are associated with the processors to create sub-documents
- initializeIndexedFileTypes() : mixed
- Get processors for different file types. constructing them will populate the self::$indexed_file_types, self::$image_types, and self::$mime_processor arrays
- process() : array<string|int, mixed>
- Extract summary data from the image provided in $page together the url in $url where it was downloaded from
- saveTempFile() : mixed
- Used to save a temporary file with the data downloaded for a url while carrying out image processing
Constants
MIN_ANIMATE_LENGTH
Minimum duration movie (in seconds ) before make an animated thumbnail
public
mixed
MIN_ANIMATE_LENGTH
= 60
NUM_ANIMATED_THUMBS
Number of images to use for an animated thumbnail
public
mixed
NUM_ANIMATED_THUMBS
= 10
Properties
$image_types
Array filetypes which should be considered images.
public
static array<string|int, mixed>
$image_types
= []
Sub-classes add to this array with the types they handle
$indexed_file_types
Array of file extensions which can be handled by the search engine, other extensions will be ignored.
public
static array<string|int, mixed>
$indexed_file_types
= ["unknown"]
Sub-classes add to this array with the types they handle
$max_description_len
Max number of chars to extract for description from a page to index.
public
static int
$max_description_len
Only words in the description are indexed.
$max_links_to_extract
Maximum number of urls to extract from a single document
public
static int
$max_links_to_extract
$mime_processor
Associative array of mime_type => (page processor name that can process that type) Sub-classes add to this array with the types they handle
public
static array<string|int, mixed>
$mime_processor
= []
$plugin_instances
indexing_plugins which might be used with the current processor
public
array<string|int, mixed>
$plugin_instances
$summarizer
Stores the summarizer object used by this instance of page processor to be used in generating a summary
public
object
$summarizer
$summarizer_option
Stores the name of the summarizer used for crawling.
public
string
$summarizer_option
Possible values are self::BASIC, self::GRAPH_BASED_SUMMARIZER, self::CENTROID_SUMMARIZER and self::CENTROID_WEIGHTED_SUMMARIZER
$text_data
Whether the current processor is for text data (i.e., text, html, xml, etc) or for some other format (gif, png, etc)
public
bool
$text_data
Methods
__construct()
Set-ups the any indexing plugins associated with this page processor
public
__construct([array<string|int, mixed> $plugins = [] ][, int $max_description_len = null ][, mixed $max_links_to_extract = null ][, int $summarizer_option = self::BASIC_SUMMARIZER ]) : mixed
Parameters
- $plugins : array<string|int, mixed> = []
-
an array of indexing plugins which might do further processing on the data handles by this page processor
- $max_description_len : int = null
-
maximal length of a page summary
- $max_links_to_extract : mixed = null
- $summarizer_option : int = self::BASIC_SUMMARIZER
-
CRAWL_CONSTANT specifying what kind of summarizer to use self::BASIC_SUMMARIZER, self::GRAPH_BASED_SUMMARIZER, self::CENTROID_SUMMARIZER and self::CENTROID_WEIGHTED_SUMMARIZER
Return values
mixed —createThumbs()
Used to create an thumbnail file to a thumb folder from an .mp4 file also creates an animated gif file.
public
static createThumbs(string $folder, string $thumb_folder, string $file_name[, int $width = CTHUMB_DIM ][, int $height = -1 ][, int $num_frames = self::NUM_ANIMATED_THUMBS ][, mixed $min_animate_length = self::MIN_ANIMATE_LENGTH ]) : mixed
Parameters
- $folder : string
-
with video in it
- $thumb_folder : string
-
folder to generate
- $file_name : string
-
of video file in $folder
- $width : int = CTHUMB_DIM
-
= width in pixels of thumb
- $height : int = -1
-
= height in pixels of thumb
- $num_frames : int = self::NUM_ANIMATED_THUMBS
-
number of frames to put in animated gif
- $min_animate_length : mixed = self::MIN_ANIMATE_LENGTH
Tags
Return values
mixed —getDuration()
Returns how long a video is in seconds
public
static getDuration(mixed $video) : float
Parameters
- $video : mixed
Return values
float —length of video in seconds
handle()
Method used to handle processing data for a web page. It makes a summary for the page (via the process() function which should be subclassed) as well as runs any plugins that are associated with the processors to create sub-documents
public
handle(string $page, string $url) : array<string|int, mixed>
Parameters
- $page : string
-
string of a web document
- $url : string
-
location the document came from
Return values
array<string|int, mixed> —a summary of (title, description,links, and content) of the information in $page also has a subdocs array containing any subdocuments returned from a plugin. A subdocuments might be things like recipes that appeared in a page or tweets, etc.
initializeIndexedFileTypes()
Get processors for different file types. constructing them will populate the self::$indexed_file_types, self::$image_types, and self::$mime_processor arrays
public
static initializeIndexedFileTypes() : mixed
Return values
mixed —process()
Extract summary data from the image provided in $page together the url in $url where it was downloaded from
public
process(string $page, string $url) : array<string|int, mixed>
VideoProcessor class defers a proper implementation of this method to subclasses
Parameters
- $page : string
-
the image represented as a character string
- $url : string
-
the url where the image was downloaded from
Return values
array<string|int, mixed> —summary information including a thumbnail and a description (where the description is just the url)
saveTempFile()
Used to save a temporary file with the data downloaded for a url while carrying out image processing
public
saveTempFile(string $page, string $url, string $file_extension) : mixed
Parameters
- $page : string
-
contains data about an image that one needs to save
- $url : string
-
where $page data came from
- $file_extension : string
-
to be associated with the $page data