Yioop_V9.5_Source_Code_Documentation

VideoProcessor extends PageProcessor
in package

Base abstract class common to all processors used to create crawl summary information from videos

Tags
author

Chris Pollett

Table of Contents

MIN_ANIMATE_LENGTH  = 60
Minimum duration movie (in seconds ) before make an animated thumbnail
NUM_ANIMATED_THUMBS  = 10
Number of images to use for an animated thumbnail
$image_types  : array<string|int, mixed>
Array filetypes which should be considered images.
$indexed_file_types  : array<string|int, mixed>
Array of file extensions which can be handled by the search engine, other extensions will be ignored.
$max_description_len  : int
Max number of chars to extract for description from a page to index.
$max_links_to_extract  : int
Maximum number of urls to extract from a single document
$mime_processor  : array<string|int, mixed>
Associative array of mime_type => (page processor name that can process that type) Sub-classes add to this array with the types they handle
$plugin_instances  : array<string|int, mixed>
indexing_plugins which might be used with the current processor
$summarizer  : object
Stores the summarizer object used by this instance of page processor to be used in generating a summary
$summarizer_option  : string
Stores the name of the summarizer used for crawling.
$text_data  : bool
Whether the current processor is for text data (i.e., text, html, xml, etc) or for some other format (gif, png, etc)
__construct()  : mixed
Set-ups the any indexing plugins associated with this page processor
createThumbs()  : mixed
Used to create an thumbnail file to a thumb folder from an .mp4 file also creates an animated gif file.
getDuration()  : float
Returns how long a video is in seconds
handle()  : array<string|int, mixed>
Method used to handle processing data for a web page. It makes a summary for the page (via the process() function which should be subclassed) as well as runs any plugins that are associated with the processors to create sub-documents
initializeIndexedFileTypes()  : mixed
Get processors for different file types. constructing them will populate the self::$indexed_file_types, self::$image_types, and self::$mime_processor arrays
process()  : array<string|int, mixed>
Extract summary data from the image provided in $page together the url in $url where it was downloaded from
saveTempFile()  : mixed
Used to save a temporary file with the data downloaded for a url while carrying out image processing

Constants

MIN_ANIMATE_LENGTH

Minimum duration movie (in seconds ) before make an animated thumbnail

public mixed MIN_ANIMATE_LENGTH = 60

NUM_ANIMATED_THUMBS

Number of images to use for an animated thumbnail

public mixed NUM_ANIMATED_THUMBS = 10

Properties

$image_types

Array filetypes which should be considered images.

public static array<string|int, mixed> $image_types = []

Sub-classes add to this array with the types they handle

$indexed_file_types

Array of file extensions which can be handled by the search engine, other extensions will be ignored.

public static array<string|int, mixed> $indexed_file_types = ["unknown"]

Sub-classes add to this array with the types they handle

$max_description_len

Max number of chars to extract for description from a page to index.

public static int $max_description_len

Only words in the description are indexed.

Maximum number of urls to extract from a single document

public static int $max_links_to_extract

$mime_processor

Associative array of mime_type => (page processor name that can process that type) Sub-classes add to this array with the types they handle

public static array<string|int, mixed> $mime_processor = []

$plugin_instances

indexing_plugins which might be used with the current processor

public array<string|int, mixed> $plugin_instances

$summarizer

Stores the summarizer object used by this instance of page processor to be used in generating a summary

public object $summarizer

$summarizer_option

Stores the name of the summarizer used for crawling.

public string $summarizer_option

Possible values are self::BASIC, self::GRAPH_BASED_SUMMARIZER, self::CENTROID_SUMMARIZER and self::CENTROID_WEIGHTED_SUMMARIZER

$text_data

Whether the current processor is for text data (i.e., text, html, xml, etc) or for some other format (gif, png, etc)

public bool $text_data

Methods

__construct()

Set-ups the any indexing plugins associated with this page processor

public __construct([array<string|int, mixed> $plugins = [] ][, int $max_description_len = null ][, mixed $max_links_to_extract = null ][, int $summarizer_option = self::BASIC_SUMMARIZER ]) : mixed
Parameters
$plugins : array<string|int, mixed> = []

an array of indexing plugins which might do further processing on the data handles by this page processor

$max_description_len : int = null

maximal length of a page summary

$max_links_to_extract : mixed = null
$summarizer_option : int = self::BASIC_SUMMARIZER

CRAWL_CONSTANT specifying what kind of summarizer to use self::BASIC_SUMMARIZER, self::GRAPH_BASED_SUMMARIZER, self::CENTROID_SUMMARIZER and self::CENTROID_WEIGHTED_SUMMARIZER

Return values
mixed

createThumbs()

Used to create an thumbnail file to a thumb folder from an .mp4 file also creates an animated gif file.

public static createThumbs(string $folder, string $thumb_folder, string $file_name[, int $width = CTHUMB_DIM ][, int $height = -1 ][, int $num_frames = self::NUM_ANIMATED_THUMBS ][, mixed $min_animate_length = self::MIN_ANIMATE_LENGTH ]) : mixed
Parameters
$folder : string

with video in it

$thumb_folder : string

folder to generate

$file_name : string

of video file in $folder

$width : int = CTHUMB_DIM

= width in pixels of thumb

$height : int = -1

= height in pixels of thumb

$num_frames : int = self::NUM_ANIMATED_THUMBS

number of frames to put in animated gif

$min_animate_length : mixed = self::MIN_ANIMATE_LENGTH
Tags
para

int $min_animate_length minimum duration of movie to try to make animated gif for

Return values
mixed

getDuration()

Returns how long a video is in seconds

public static getDuration(mixed $video) : float
Parameters
$video : mixed
Return values
float

length of video in seconds

handle()

Method used to handle processing data for a web page. It makes a summary for the page (via the process() function which should be subclassed) as well as runs any plugins that are associated with the processors to create sub-documents

public handle(string $page, string $url) : array<string|int, mixed>
Parameters
$page : string

string of a web document

$url : string

location the document came from

Return values
array<string|int, mixed>

a summary of (title, description,links, and content) of the information in $page also has a subdocs array containing any subdocuments returned from a plugin. A subdocuments might be things like recipes that appeared in a page or tweets, etc.

initializeIndexedFileTypes()

Get processors for different file types. constructing them will populate the self::$indexed_file_types, self::$image_types, and self::$mime_processor arrays

public static initializeIndexedFileTypes() : mixed
Return values
mixed

process()

Extract summary data from the image provided in $page together the url in $url where it was downloaded from

public process(string $page, string $url) : array<string|int, mixed>

VideoProcessor class defers a proper implementation of this method to subclasses

Parameters
$page : string

the image represented as a character string

$url : string

the url where the image was downloaded from

Return values
array<string|int, mixed>

summary information including a thumbnail and a description (where the description is just the url)

saveTempFile()

Used to save a temporary file with the data downloaded for a url while carrying out image processing

public saveTempFile(string $page, string $url, string $file_extension) : mixed
Parameters
$page : string

contains data about an image that one needs to save

$url : string

where $page data came from

$file_extension : string

to be associated with the $page data

Return values
mixed

        

Search results