Yioop_V9.5_Source_Code

BmpProcessor extends ImageProcessor
in package

Application

Used to create crawl summary information for BMP and ICO files

BLOCK_SIZE

Size in bytes of one block to read in of BMP


    public
        mixed
    BLOCK_SIZE
    = 4096

BMP_HEADER_LEN

Size in bytes of BMP header


    public
        mixed
    BMP_HEADER_LEN
    = 108

BMP_ID

Size in bytes of BMP identifier and size info


    public
        mixed
    BMP_ID
    = 10

MAX_DIM

Maximum pixel width or height


    public
        mixed
    MAX_DIM
    = 1000

$image_types

Array filetypes which should be considered images.


    public
    static    array<string|int, mixed>
    $image_types
     = []

Sub-classes add to this array with the types they handle

$indexed_file_types

Array of file extensions which can be handled by the search engine, other extensions will be ignored.


    public
    static    array<string|int, mixed>
    $indexed_file_types
     = ["unknown"]

Sub-classes add to this array with the types they handle

$max_description_len

Max number of chars to extract for description from a page to index.


    public
    static    int
    $max_description_len

Only words in the description are indexed.

$max_links_to_extract

Maximum number of urls to extract from a single document


    public
    static    int
    $max_links_to_extract

$mime_processor

Associative array of mime_type => (page processor name that can process that type) Sub-classes add to this array with the types they handle


    public
    static    array<string|int, mixed>
    $mime_processor
     = []

$plugin_instances

indexing_plugins which might be used with the current processor


    public
        array<string|int, mixed>
    $plugin_instances

$summarizer

Stores the summarizer object used by this instance of page processor to be used in generating a summary


    public
        object
    $summarizer

$summarizer_option

Stores the name of the summarizer used for crawling.


    public
        string
    $summarizer_option

Possible values are self::BASIC, self::GRAPH_BASED_SUMMARIZER, self::CENTROID_SUMMARIZER and self::CENTROID_WEIGHTED_SUMMARIZER

$text_data

Whether the current processor is for text data (i.e., text, html, xml, etc) or for some other format (gif, png, etc)


    public
        bool
    $text_data

__construct()

Set-ups the any indexing plugins associated with this page processor


    public
                    __construct([array<string|int, mixed> $plugins = [] ][, int $max_description_len = null ][, int $max_links_to_extract = null ][, string $summarizer_option = self::BASIC_SUMMARIZER ]) : mixed

Parameters

$plugins : array<string|int, mixed> = []: an array of indexing plugins which might do further processing on the data handles by this page processor
$max_description_len : int = null: maximal length of a page summary
$max_links_to_extract : int = null: maximum number of links to extract from a single document
$summarizer_option : string = self::BASIC_SUMMARIZER: CRAWL_CONSTANT specifying what kind of summarizer to use self::BASIC_SUMMARIZER, self::GRAPH_BASED_SUMMARIZER and self::CENTROID_SUMMARIZER self::CENTROID_SUMMARIZER

Return values

mixed —

addWidthHeightSummary()

Given an $image_string determines if possible its width and height then assigns the values into the CrawlConstants:WIDTH, CrawlConstants:HEIGHT fields of $summary


    public
                    addWidthHeightSummary(array<string|int, mixed> &$summary, string $image_string) : array<string|int, mixed>

Parameters

$summary : array<string|int, mixed>: to write the width and height into
$image_string : string: the image represented as a character string

Return values

array<string|int, mixed> —

summary information including a thumbnail and a description (where the description is just the url)

averageColor()

Computes the average RGBA pixel value over an image by resampling the image down to a 1x1 pixel image, then extracting its rgba value as a vector


    public
            static        averageColor(GdImage $image) : array<string|int, mixed>

Parameters

$image : GdImage: object to calculate average color for

Return values

array<string|int, mixed> —

a 4-tuple with components [red, green, blue, alpha]

createThumb()

Used to create a thumbnail from an image object


    public
            static        createThumb(object $image[, int $width = CTHUMB_DIM ][, int $height = CTHUMB_DIM ]) : string

Parameters

$image : object: image object with image
$width : int = CTHUMB_DIM: = width in pixels of thumb if width is a negative value and height positive, then this dimension will be set to be proportional based on the input images width versus height
$height : int = CTHUMB_DIM: = height in pixels of thumb if height is a negative value and width positive, then this dimension will be set to be proportional based on the input images width versus height

Return values

string —

of jpeg image if this string would have been non-blank empty string otherwise

getXmpData()

Given an image try to extract and XMP info from it.


    public
                    getXmpData(string $image_string) : array<string|int, mixed>

Parameters

$image_string : string: the image represented as a character string

Return values

array<string|int, mixed> —

XMP data converted from XML format to an array-like format

handle()

Method used to handle processing data for a web page. It makes a summary for the page (via the process() function which should be subclassed) as well as runs any plugins that are associated with the processors to create sub-documents


    public
                    handle(string $page, string $url) : array<string|int, mixed>

Parameters

$page : string: string of a web document
$url : string: location the document came from

Return values

array<string|int, mixed> —

a summary of (title, description,links, and content) of the information in $page also has a subdocs array containing any subdocuments returned from a plugin. A subdocuments might be things like recipes that appeared in a page or tweets, etc.

imagecreatefrombmp()

Reads in a 32 / 24bit non-palette bmp files from provided filename and returns a php image object corresponding to it. This is a crude variation of code from imagecreatewbmp function documentation at php.net


    public
                    imagecreatefrombmp(string $bmp_string) : mixed

Parameters

$bmp_string : string: string with the contents of a bmp file

Return values

mixed —

initializeIndexedFileTypes()

Get processors for different file types. constructing them will populate the self::$indexed_file_types, self::$image_types, and self::$mime_processor arrays


    public
            static        initializeIndexedFileTypes() : mixed

Return values

mixed —

isBlackAndWhite()

Checks if an image is Black and White (really gray scale) by sampling 200 points and check that for each point the rgb values are the same.


    public
            static        isBlackAndWhite(GdImage $image) : bool

Parameters

$image : GdImage: object to check if black white

Return values

bool —

true if black and white

process()

Extract summary data from the image provided in $page together the url in $url where it was downloaded from


    public
                    process(string $page, string $url) : array<string|int, mixed>

Parameters

$page : string: the image represented as a character string
$url : string: the url where the image was downloaded from

Return values

array<string|int, mixed> —

summary information including a thumbnail and a description (where the description is just the url)

saveTempFile()

Used to save a temporary file with the data downloaded for a url while carrying out image processing


    public
                    saveTempFile(string $page, string $url, string $file_extension) : mixed

Parameters

$page : string: contains data about an image that one needs to save
$url : string: where $page data came from
$file_extension : string: to be associated with the $page data

Return values

mixed —

BmpProcessor extends ImageProcessor in package Application

Tags

Table of Contents

Constants

BLOCK_SIZE

BMP_HEADER_LEN

BMP_ID

MAX_DIM

Properties

$image_types

$indexed_file_types

$max_description_len

$max_links_to_extract

$mime_processor

$plugin_instances

$summarizer

$summarizer_option

$text_data

Methods

__construct()

Parameters

Return values

addWidthHeightSummary()

Parameters

Return values

averageColor()

Parameters

Return values

createThumb()

Parameters

Return values

getXmpData()

Parameters

Return values

handle()

Parameters

Return values

imagecreatefrombmp()

Parameters

Return values

initializeIndexedFileTypes()

Return values

isBlackAndWhite()

Parameters

Return values

process()

Parameters

Return values

saveTempFile()

Parameters

Return values

BmpProcessor extends ImageProcessor
in package

Application