Yioop_V9.5_Source_Code_Documentation

SearchController extends Controller
in package
implements CrawlConstants

Controller used to handle search requests to SeekQuarry search site. Used to both get and display search results.

Tags
author

Chris Pollett

Interfaces, Classes, Traits and Enums

CrawlConstants
Shared constants and enums used by components that are involved in the crawling process

Table of Contents

$activities  : array<string|int, mixed>
Says which activities (roughly methods invoke from the web) this controller will respond to
$activity_component  : array<string|int, mixed>
Associative array of activity => component activity is on, used by @see Controller::call method to actually invoke a given activity on a given component
$component_activities  : array<string|int, mixed>
Associative array of $components activities for this controller Components are collections of activities (a little like traits) which can be reused.
$component_instances  : array<string|int, mixed>
Array of instances of components used by this controller
$image_subsearch_enabled  : bool
Flag to indicate if image subsearch are enabled
$model_instances  : array<string|int, mixed>
Array of instances of models used by this controller
$plugin_instances  : array<string|int, mixed>
Array of instances of indexing_plugins used by this controller
$subsearch_default_query  : int
Default query to use if user doesn't provide one for the current subsearch
$subsearch_identifier  : string
The localization identifier for the current subsearch
$subsearch_name  : string
Name of the sub-search currently in use
$subsearch_per_page  : int
Default number of results to display for the current subsearch
$video_subsearch_enabled  : bool
Flag to indicate if video subsearch are enabled
$view_instances  : array<string|int, mixed>
Array of instances of views used by this controller
$web_site  : WebSite
Stores a reference to the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode.
__construct()  : mixed
In addition to calling the base class' constructor, set up FileCache objects if we're configured to do query caching
addCacheJavascriptTags()  : mixed
Add to supplied node subnodes containing script tags for javascript libraries used to display cache pages
addDifferentialPrivacy()  : int
Adds to an integer, $actual_value, epsilon-noise taken from an L_1 gaussian source to centered at $actual_value to get a epsilon private, integer value.
addKeywordLinks()  : string
Function used to add links for keyword searches in keyword_links array of $cache_item to the text of the $web_page we are going to display the cache of as part of a pache page request
addLandingHighlights()  : mixed
Adds data about the currently computed landing highlights to the view $data variable so the view can draw this information
addSearchViewData()  : mixed
Prepares the array $data so the SearchView can draw search results
baseLink()  : string
Used to create the base link for links to be displayed on caches of web pages this link points to yioop because links on cache pages are to other cache pages
cacheRequest()  : string
Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results
cacheRequestAndOutput()  : mixed
Used to get and render a cached web page
calculateControlWords()  : array<string|int, mixed>
Extracts from the query string any control words: mix:, m:, raw:, no: and returns an array consisting of the query with these words removed, and then variables for their values.
call()  : mixed
Used to invoke an activity method of the current controller or one its components
canonicalizeLinks()  : object
Make relative links canonical with respect to provided $url for links appear within the Dom node.
checkCSRFTime()  : bool
Checks if the timestamp in $_REQUEST[$token_name] matches the timestamp of the last CSRF token accessed by this user for the kind of activity for which there might be a conflict.
checkCSRFToken()  : bool
Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)
checkRequest()  : bool
Checks the request if a request is for a valid activity and if it uses the correct authorization key
clean()  : string
Used to clean strings that might be tainted as originate from the user
clearQuerySavepoint()  : mixed
Query timestamps can be used to save an iteration position in a a set of query results. This method allows one to delete the supplied save point.
component()  : mixed
Dynamic loader for Component objects which might live on the current Component
convertArrayLines()  : string
Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned
convertStringCleanArray()  : array<string|int, mixed>
Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas. # is treated as a comment
crawlItemSummary()  : string
Generates a string representation of a crawl item suitable for for output in a cache page
createDomBoxNode()  : DOMElement
Creates a bordered tag (usually div) in which to put meta content on a page when it is displayed
createHistoryDataStructure()  : array<string|int, mixed>
Creates a data structure for storing years, months and associated timestamp components
createLinkDivs()  : DOMElement
Create divs for links based on all (year, month) combinations
createSummaryAndToggleNodes()  : DOMElement
Creates the toggle link and hidden div for extracted header and summary element on cache pages
displayView()  : mixed
Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode
extractActivityQuery()  : array<string|int, mixed>
This method is responsible for parsing out the kind of query from the raw query string
fieldRequest()  : mixed
A fieldRequest is a special kind of cache request in which only one field (usually, the favicon field) of a crawl item is desired for a particular crawl time from a list of $crawl_items of a similar type from several different crawl times. This methof takes, a $request_field, a $crawl_time, and an associative array $crawl_items of pairs timestamp => $crawl_item and outputs to the current stream with appropriate type HTTP headers the desired field (in the case of favicons (does some processing to make image out of a data url)).
formatCachePage()  : mixed
Formats a cache of a web page (adds history ui and highlight keywords)
generateCSRFToken()  : string
Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY
getAccessModifiers()  : array<string|int, mixed>
Returns an array of the possible modifiers to the access to the activity in question.
getCrawlItems()  : array<string|int, mixed>
Get crawl items based on queue server setting.
getCSRFTime()  : int
Used to return just the timestamp portion of the CSRF token
getIndexingPluginList()  : mixed
Used to get a list of all available indexing plugins for this Yioop instance.
getIndexTimestamp()  : string
Finds the timestamp of the main crawl or mix to return results from Does not do checking to make sure timestamp exists.
getTopPhrases()  : array<string|int, mixed>
Given a page summary extract the words from it and try to find documents which match the most relevant words. The algorithm for "relevant" is pretty weak. For now we pick the $num many words whose ratio of number of occurrences in crawl item/ number of occurrences in all documents is the largest
historyUI()  : DOMElement
User Interface for history feature
imageCachePage()  : string
Makes an HTML web page for an image cache item
initializeAdFields()  : mixed
If external source advertisements are present in the output of this controller this function can be used to initialize the field variables used to write the appropriate Javascripts
initializeIndexInfo()  : array<string|int, mixed>
Determines which crawl or mix timestamp should be in use for this query. It also determines info and returns associated with this timestamp.
initializeResponseFormat()  : array<string|int, mixed>
Determines how this query is being run and return variables for the view
initializeSubsearches()  : array<string|int, mixed>
Determines if query results are using a subsearch, and if so initializes them, also it sets up list of subsearches to draw at top of screen.
initializeUserAndDefaultActivity()  : array<string|int, mixed>
Determines the kind of user session that this search request is for
makeMediaGroups()  : array<string|int, mixed>
Groups search result pages together which have thumbnails from an array of search pages. Grouped thumbnail pages stored at array index of first thumbnail found, non thumbnail pages stored where were before
markChildren()  : object
Used in rendering a cached web page to highlight the search terms.
mirrorHandle()  : bool
Only used for serial network queries Used to check if there are any mirrors of the current server.
model()  : mixed
Dynamic loader for Model objects which might live on the current Controller
pagingLogic()  : mixed
When an activity involves displaying tabular data (such as rows of users, groups, etc), this method might be called to set up $data fields for next, prev, and page links, it also makes the call to the model to get the row data sorted and restricted as desired. For some data sources, rather than directly make a call to the model to get the data it might be passed directly to this method.
parsePageHeadVars()  : array<string|int, mixed>
Used to parse head meta variables out of a data string provided either from a wiki page or a static page. Meta data is stored in lines before the first occurrence of END_HEAD_VARS. Head variables are name=value pairs. An example of head variable might be: title = This web page's title Anything after a semi-colon on a line in the head section is treated as a comment
parsePageHeadVarsView()  : mixed
Used to set up the head variables for and page_data of a wiki or static page associated with a view.
plugin()  : mixed
Dynamic loader for Plugin objects which might live on the current Controller
processQuery()  : mixed
Searches the database for the most relevant pages for the supplied search terms. Renders the results to the HTML page.
processRequest()  : mixed
This is the main entry point for handling a search request.
queryRequest()  : array<string|int, mixed>
Part of Yioop Search API. Performs a normal search query and returns associative array of query results
recordViewSession()  : mixed
Used to store in a session which media list items have been viewed so we can put an indicator by them when the media list is rendered
redirectLocation()  : mixed
Method to perform a 301 redirect to $location in both under web server and CLI setting
redirectWithMessage()  : mixed
Does a 301 redirect to the given location, sets a session variable to display a message when get there.
relatedRequest()  : array<string|int, mixed>
Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results
restrictQueryByUserAgent()  : string
Sometimes robots disobey the statistics page nofollow meta tag.
setupGraphicalCaptchaViewData()  : mixed
Sets up the graphical captcha view Draws the string for graphical captcha
toggleHistory()  : mixed
The history toggle displays the year and month associated with the timestamp at which the page was cached.
view()  : mixed
Dynamic loader for View objects which might live on the current Controller
viewLinksByYearMonth()  : DOMElement
Display links based on selected year and month in History UI

Properties

$activities

Says which activities (roughly methods invoke from the web) this controller will respond to

public array<string|int, mixed> $activities = ["query", "cache", "chart", "related", "signout", "recordClick", "trending"]

$activity_component

Associative array of activity => component activity is on, used by @see Controller::call method to actually invoke a given activity on a given component

public array<string|int, mixed> $activity_component = []

$component_activities

Associative array of $components activities for this controller Components are collections of activities (a little like traits) which can be reused.

public static array<string|int, mixed> $component_activities = []

$component_instances

Array of instances of components used by this controller

public array<string|int, mixed> $component_instances

$image_subsearch_enabled

Flag to indicate if image subsearch are enabled

public bool $image_subsearch_enabled

$model_instances

Array of instances of models used by this controller

public array<string|int, mixed> $model_instances

$plugin_instances

Array of instances of indexing_plugins used by this controller

public array<string|int, mixed> $plugin_instances

$subsearch_default_query

Default query to use if user doesn't provide one for the current subsearch

public int $subsearch_default_query = ""

$subsearch_identifier

The localization identifier for the current subsearch

public string $subsearch_identifier = ""

$subsearch_name

Name of the sub-search currently in use

public string $subsearch_name = ""

$subsearch_per_page

Default number of results to display for the current subsearch

public int $subsearch_per_page = 10

$video_subsearch_enabled

Flag to indicate if video subsearch are enabled

public bool $video_subsearch_enabled

$view_instances

Array of instances of views used by this controller

public array<string|int, mixed> $view_instances = []

$web_site

Stores a reference to the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode.

public WebSite $web_site

In CLI, mode it is useful for caching files in RAM as they are read

Methods

__construct()

In addition to calling the base class' constructor, set up FileCache objects if we're configured to do query caching

public __construct([WebSite $web_site = null ]) : mixed
Parameters
$web_site : WebSite = null

is the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode. In CLI, mode it is useful for caching files in RAM as they are read

Return values
mixed

addCacheJavascriptTags()

Add to supplied node subnodes containing script tags for javascript libraries used to display cache pages

public addCacheJavascriptTags(DOMDocument $dom, DomElement &$node) : mixed
Parameters
$dom : DOMDocument

used to create new nodes

$node : DomElement

what to add script node to

Return values
mixed

addDifferentialPrivacy()

Adds to an integer, $actual_value, epsilon-noise taken from an L_1 gaussian source to centered at $actual_value to get a epsilon private, integer value.

public addDifferentialPrivacy(int $actual_value) : int
Parameters
$actual_value : int

number want to make private

Return values
int

$fuzzy_value number after noise added

Function used to add links for keyword searches in keyword_links array of $cache_item to the text of the $web_page we are going to display the cache of as part of a pache page request

public addKeywordLinks(string $web_page, array<string|int, mixed> &$cache_item) : string
Parameters
$web_page : string

to add links to

$cache_item : array<string|int, mixed>

original cache item web page generated from

Return values
string

modified web page

addLandingHighlights()

Adds data about the currently computed landing highlights to the view $data variable so the view can draw this information

public addLandingHighlights(array<string|int, mixed> &$data) : mixed
Parameters
$data : array<string|int, mixed>

of fields to be used by the view for drawing

Return values
mixed

addSearchViewData()

Prepares the array $data so the SearchView can draw search results

public addSearchViewData(array<string|int, mixed> $index_info, bool $no_query, int $raw, string $view, array<string|int, mixed> $subsearches, array<string|int, mixed> &$data) : mixed
Parameters
$index_info : array<string|int, mixed>

an array of info about that index in use

$no_query : bool

true in the case of a news subsearch when no query was entered by the user but still want to display news

$raw : int

$raw what kind of grouping of identical results should be done (0 is default, 1 and higher used for internal queries)

$view : string

name of view class search results are for

$subsearches : array<string|int, mixed>

an array of data about each subsearch to draw to the view

$data : array<string|int, mixed>

that will eventually be sent to the view for rendering. This method adds fields to the array

Return values
mixed

Used to create the base link for links to be displayed on caches of web pages this link points to yioop because links on cache pages are to other cache pages

public baseLink() : string
Return values
string

desired base link

cacheRequest()

Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results

public cacheRequest(string $url[, array<string|int, mixed> $ui_flags = [] ][, string $terms = "" ], string $crawl_time) : string
Parameters
$url : string

to get cached page for

$ui_flags : array<string|int, mixed> = []

array of ui features which should be added to the cache page. For example, "highlight" would way search terms should be highlighted, "history" says add history navigation for all copies of this cache page in yioop system.

$terms : string = ""

space separated list of search terms

$crawl_time : string

timestamp of crawl to look for cached page in

Return values
string

with contents of cached page

cacheRequestAndOutput()

Used to get and render a cached web page

public cacheRequestAndOutput(string $url[, array<string|int, mixed> $ui_flags = [] ][, string $terms = "" ], int $crawl_time) : mixed
Parameters
$url : string

the url of the page to find the cached version of

$ui_flags : array<string|int, mixed> = []

array of ui features which should be added to the cache page. For example, "highlight" would say search terms should be highlighted, "history" says add history navigation for all copies of this cache page in yioop system. "summaries" says add a toggle headers and extracted summaries link. "cache_link_referrer" says a link on a cache page referred us to the current cache request

$terms : string = ""

from original query responsible for cache request

$crawl_time : int

the timestamp of the crawl to look up the cached page in

Return values
mixed

calculateControlWords()

Extracts from the query string any control words: mix:, m:, raw:, no: and returns an array consisting of the query with these words removed, and then variables for their values.

public calculateControlWords(string $query, bool $raw, bool $is_mix, string $index_name) : array<string|int, mixed>
Parameters
$query : string

original query string

$raw : bool

the $_REQUEST['raw'] value

$is_mix : bool

if the current index name is that of a crawl mix

$index_name : string

timestamp of current mix or index

Return values
array<string|int, mixed>

($query, $raw, $network_allowed, $use_cache_if_possible, $guess_semantics)

call()

Used to invoke an activity method of the current controller or one its components

public call(string $activity[, string $modifiers = [] ]) : mixed
Parameters
$activity : string

method to invoke

$modifiers : string = []

access modifiers to executing this method

Return values
mixed

Make relative links canonical with respect to provided $url for links appear within the Dom node.

public canonicalizeLinks(object $node, string $url) : object
Parameters
$node : object

dom node to fix links for

$url : string

url to use to canonicalize links

Return values
object

updated dom node

checkCSRFTime()

Checks if the timestamp in $_REQUEST[$token_name] matches the timestamp of the last CSRF token accessed by this user for the kind of activity for which there might be a conflict.

public checkCSRFTime(string $token_name[, string $action = "" ]) : bool

This is to avoid accidental replays of postings etc if the back button used.

Parameters
$token_name : string

name of a $_REQUEST field used to hold a CSRF_TOKEN

$action : string = ""

name of current action to check for conflicts

Return values
bool

whether a conflicting action has occurred.

checkCSRFToken()

Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)

public checkCSRFToken(string $token_name, string $user_id[, bool $use_name_as_passed = false ]) : bool
Parameters
$token_name : string

attribute of $_REQUEST containing CSRFToken

$user_id : string

user id of the user to check the token for

$use_name_as_passed : bool = false

whether to use $token_name as the token (if true) or to use $_REQUEST[$token_name]

Return values
bool

whether the CSRF token was valid

checkRequest()

Checks the request if a request is for a valid activity and if it uses the correct authorization key

public checkRequest() : bool
Return values
bool

whether the request was valid or not

clean()

Used to clean strings that might be tainted as originate from the user

public clean(mixed $value, mixed $type[, mixed $default = null ]) : string
Parameters
$value : mixed

tainted data

$type : mixed

type of data in value can be one of the following strings: bool, color, double, float, int, hash, or string, web-url; or it can be an array listing allowed values. If the latter, then if the value is not in the array the cleaned value will be first element of the array if $default is null

$default : mixed = null

if $value is not set default value is returned, this isn't used much since if the error_reporting is E_ALL or -1 you would still get a Notice.

Return values
string

the clean input matching the type provided

clearQuerySavepoint()

Query timestamps can be used to save an iteration position in a a set of query results. This method allows one to delete the supplied save point.

public clearQuerySavepoint(int $save_timestamp) : mixed
Parameters
$save_timestamp : int

deletes a previously query saved timestamp

Return values
mixed

component()

Dynamic loader for Component objects which might live on the current Component

public component(string $component) : mixed
Parameters
$component : string

name of model to return

Return values
mixed

convertArrayLines()

Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned

public convertArrayLines(array<string|int, mixed> $arr[, string $endline_string = " " ][, bool $clean = false ]) : string
Parameters
$arr : array<string|int, mixed>

the array of lines to be process

$endline_string : string = " "

what string should be used to indicate the end of a line

$clean : bool = false

whether to clean each line

Return values
string

a concatenated string of cleaned lines

convertStringCleanArray()

Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas. # is treated as a comment

public convertStringCleanArray(string $str[, string $line_type = "url" ]) : array<string|int, mixed>
Parameters
$str : string

contains the url data

$line_type : string = "url"

does additional cleaning depending on the type of the lines. For instance, if is "url" then a line not beginning with a url scheme will have http:// prepended.

Return values
array<string|int, mixed>

$lines an array of clean lines

crawlItemSummary()

Generates a string representation of a crawl item suitable for for output in a cache page

public crawlItemSummary(array<string|int, mixed> $crawl_item) : string
Parameters
$crawl_item : array<string|int, mixed>

summary information of a web page (title, description, etc)

Return values
string

suitable string formatting of item

createDomBoxNode()

Creates a bordered tag (usually div) in which to put meta content on a page when it is displayed

public createDomBoxNode(DOMDocument $dom, string $text_align[, string $more_styles = "" ][, string $tag = "div" ]) : DOMElement
Parameters
$dom : DOMDocument

representing cache page

$text_align : string

whether doc is ltr or rtl

$more_styles : string = ""

any additional styles for box

$tag : string = "div"

base tag of box (default div)

Return values
DOMElement

of styled box

createHistoryDataStructure()

Creates a data structure for storing years, months and associated timestamp components

public createHistoryDataStructure(array<string|int, mixed> $all_crawl_times, string $locale_type, string $url) : array<string|int, mixed>
Parameters
$all_crawl_times : array<string|int, mixed>

is an array storing all crawl time

$locale_type : string

is the locale tag

$url : string

is the URL for the cached page

Return values
array<string|int, mixed>

$results is an array storing years array, months array and the combined data structure for the History UI

createLinkDivs()

Create divs for links based on all (year, month) combinations

public createLinkDivs(array<string|int, mixed> $time_ds, string $current_year, string $current_month, DOMElement $d1, DOMDocument $dom, string $url, array<string|int, mixed> $years, bool $hist_ui_open, string $terms, long $crawl_time) : DOMElement
Parameters
$time_ds : array<string|int, mixed>

is the data structure for History UI

$current_year : string

is the year associated with the timestamp of the cached page

$current_month : string

is the month associated with the timestamp of the cached page

$d1 : DOMElement

is the section that contains options for years and months

$dom : DOMDocument

is the DOM for the cached page

$url : string

is the URL for the cached page

$years : array<string|int, mixed>

is an array storing years associated with all indexes

$hist_ui_open : bool

checks if the History UI state should be open

$terms : string

is a string containing the query terms

$crawl_time : long

is the crawl time for the cached page

Return values
DOMElement

$d1 is the section containing the options for selecting year and month

createSummaryAndToggleNodes()

Creates the toggle link and hidden div for extracted header and summary element on cache pages

public createSummaryAndToggleNodes(DOMDocument $dom, string $text_align, DOMElement $body, string $summary_string, array<string|int, mixed> $cache_item) : DOMElement
Parameters
$dom : DOMDocument

used to create new nodes to add to body object for page

$text_align : string

whether rtl or ltr language

$body : DOMElement

represent body of cached page

$summary_string : string

header and summary that were extracted

$cache_item : array<string|int, mixed>

contains info about the cached item

Return values
DOMElement

a div node with toggle link and hidden div

displayView()

Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode

public displayView(string $view, array<string|int, mixed> $data) : mixed
Parameters
$view : string

the name of the view to draw

$data : array<string|int, mixed>

an array of values to use in drawing the view

Return values
mixed

extractActivityQuery()

This method is responsible for parsing out the kind of query from the raw query string

public extractActivityQuery() : array<string|int, mixed>

This method parses the raw query string for query activities. It parses the name of each activity and its argument

Return values
array<string|int, mixed>

list of search activities parsed out of the search string

fieldRequest()

A fieldRequest is a special kind of cache request in which only one field (usually, the favicon field) of a crawl item is desired for a particular crawl time from a list of $crawl_items of a similar type from several different crawl times. This methof takes, a $request_field, a $crawl_time, and an associative array $crawl_items of pairs timestamp => $crawl_item and outputs to the current stream with appropriate type HTTP headers the desired field (in the case of favicons (does some processing to make image out of a data url)).

public fieldRequest(string $request_field, int $crawl_time, array<string|int, mixed> $crawl_items) : mixed
Parameters
$request_field : string

field desire out of crawl_item

$crawl_time : int

timestamp of crawl_item in list of $crawl_items

$crawl_items : array<string|int, mixed>

pairs timestamp => $crawl_item of crawl item to look through

Return values
mixed

formatCachePage()

Formats a cache of a web page (adds history ui and highlight keywords)

public formatCachePage(array<string|int, mixed> $cache_item, string $cache_file, string $url, string $summary_string, int $crawl_time, array<string|int, mixed> $all_crawl_times, string $terms, array<string|int, mixed> $ui_flags) : mixed
Parameters
$cache_item : array<string|int, mixed>

details meta information about the cache page

$cache_file : string

contains current web page before formatting

$url : string

that cache web page was originally from

$summary_string : string

summary data that was extracted from the web page to be put in the actually inverted index

$crawl_time : int

timestamp of crawl cache page was from

$all_crawl_times : array<string|int, mixed>

timestamps of all crawl times currently in Yioop system

$terms : string

from original query responsible for cache request

$ui_flags : array<string|int, mixed>

array of ui features which should be added to the cache page. For example, "highlight" would way search terms should be highlighted, "history" says add history navigation for all copies of this cache page in yioop system. return string of formatted cached page

Return values
mixed

generateCSRFToken()

Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY

public generateCSRFToken(string $user) : string
Parameters
$user : string

username to use to generate token

Return values
string

a csrf token

getAccessModifiers()

Returns an array of the possible modifiers to the access to the activity in question.

public getAccessModifiers(string $activity) : array<string|int, mixed>
Parameters
$activity : string

method to get access modifier list for

Return values
array<string|int, mixed>

of string names => translated names of the access modifiers for the method in question (if any exist).

getCrawlItems()

Get crawl items based on queue server setting.

public getCrawlItems(string $url, array<string|int, mixed> $crawl_times, array<string|int, mixed> $queue_servers) : array<string|int, mixed>
Parameters
$url : string

is the URL of the cached page

$crawl_times : array<string|int, mixed>

is an array storing crawl times for all indexes

$queue_servers : array<string|int, mixed>

is an array containing URLs for queue servers

Return values
array<string|int, mixed>

[$all_crawl_times, $all_crawl_items] is an array containing an array of crawl times and an array of their respective crawl items

getCSRFTime()

Used to return just the timestamp portion of the CSRF token

public getCSRFTime(string $token_name) : int
Parameters
$token_name : string

name of a $_REQUEST field used to hold a CSRF_TOKEN

Return values
int

the timestamp portion of the CSRF_TOKEN

getIndexingPluginList()

Used to get a list of all available indexing plugins for this Yioop instance.

public getIndexingPluginList() : mixed
Return values
mixed

getIndexTimestamp()

Finds the timestamp of the main crawl or mix to return results from Does not do checking to make sure timestamp exists.

public getIndexTimestamp() : string
Return values
string

current timestamp

getTopPhrases()

Given a page summary extract the words from it and try to find documents which match the most relevant words. The algorithm for "relevant" is pretty weak. For now we pick the $num many words whose ratio of number of occurrences in crawl item/ number of occurrences in all documents is the largest

public getTopPhrases(string $crawl_item, int $num, int $crawl_time) : array<string|int, mixed>
Parameters
$crawl_item : string

a page summary

$num : int

number of key phrase to return

$crawl_time : int

the timestamp of an index to use, if 0 then default used

Return values
array<string|int, mixed>

an array of most selective key phrases

historyUI()

User Interface for history feature

public historyUI(long $crawl_time, array<string|int, mixed> $all_crawl_times, DOMElement $div_node, DOMDocument $dom, string $terms, bool $hist_ui_open, string $url) : DOMElement
Parameters
$crawl_time : long

is the crawl time

$all_crawl_times : array<string|int, mixed>

is an array storing all crawl time

$div_node : DOMElement

is the section that contains the History UI

$dom : DOMDocument

is the DOM of the cached page

$terms : string

is a string containing query terms

$hist_ui_open : bool

is a flag to check if History UI should be open by default

$url : string

is the URL of the page

Return values
DOMElement

the section containing the options for selecting year and month

imageCachePage()

Makes an HTML web page for an image cache item

public imageCachePage(string $url, array<string|int, mixed> $cache_item, string $cache_file,  $queue_servers) : string
Parameters
$url : string

original url of the image

$cache_item : array<string|int, mixed>

details about the image item

$cache_file : string

string with image

$queue_servers :

machines used by yioop for the current index cache item is from. Used to find out urls on which image occurred

Return values
string

an HTML page with the image embedded as a data url

initializeAdFields()

If external source advertisements are present in the output of this controller this function can be used to initialize the field variables used to write the appropriate Javascripts

public initializeAdFields(array<string|int, mixed> &$data[, bool $ads_off = false ]) : mixed
Parameters
$data : array<string|int, mixed>

data to be used in drawing the view

$ads_off : bool = false

whether or not ads are turned off so that this method should do nothing

Return values
mixed

initializeIndexInfo()

Determines which crawl or mix timestamp should be in use for this query. It also determines info and returns associated with this timestamp.

public initializeIndexInfo(bool $web_flag, int $raw, array<string|int, mixed> &$data) : array<string|int, mixed>
Parameters
$web_flag : bool

whether this is a web based query or one from the search API

$raw : int

should validate against list of known crawls or an internal (say network) query that doesn't require validation (faster without).

$data : array<string|int, mixed>

that will eventually be sent to the view. We set the 'its' (index_time_stamp) field here

Return values
array<string|int, mixed>

consisting of index timestamp of crawl or mix in use, $index_info an array of info about that index, and $save_timestamp timestamp of last savepoint, used if this query is being is the query for a crawl mix archive crawl.

initializeResponseFormat()

Determines how this query is being run and return variables for the view

public initializeResponseFormat() : array<string|int, mixed>

A query might be run as a web-based where HTML is expected as the output, an RSS query, an API query, or as a serial query from a name_server or mirror instance back to one of the other queue servers in a Yioop installation. A query might also request different numbers of pages back beginning at different starting points in the result.

Return values
array<string|int, mixed>

consisting of (view to be used to render results, flag for whether html results should be used, int code for what kind of group of similar urls should be done on the results, number of search results to return, start from which result)

initializeSubsearches()

Determines if query results are using a subsearch, and if so initializes them, also it sets up list of subsearches to draw at top of screen.

public initializeSubsearches() : array<string|int, mixed>
Return values
array<string|int, mixed>

(subsearches, no_query) where subsearches is itself an array of data about each subsearch to draw, and no_query is a bool flag used in the case of a news subsearch when no query was entered by the user but still want to display news

initializeUserAndDefaultActivity()

Determines the kind of user session that this search request is for

public initializeUserAndDefaultActivity(array<string|int, mixed> &$data) : array<string|int, mixed>

This function is called by @see processRequest(). The user session might be one without a login, one with a login so need to validate against to prevent CSRF attacks, just after someone logged out, or a bot session (googlebot, etc) so remove the query request

Parameters
$data : array<string|int, mixed>

that will eventually be sent to the view. We might update with error messages

Return values
array<string|int, mixed>

consisting of (query based on user info, whether if a cache request highlighting should be userd, what activity user wants, any arguments to this activity)

makeMediaGroups()

Groups search result pages together which have thumbnails from an array of search pages. Grouped thumbnail pages stored at array index of first thumbnail found, non thumbnail pages stored where were before

public makeMediaGroups( $pages) : array<string|int, mixed>
Parameters
$pages :

an array of search result pages to group those pages with thumbs within

Return values
array<string|int, mixed>

[$pages after the grouping has been done, whether images or videos found]

markChildren()

Used in rendering a cached web page to highlight the search terms.

public markChildren(object $node, array<string|int, mixed> $words, object $dom) : object
Parameters
$node : object

DOM object to mark html elements of

$words : array<string|int, mixed>

an array of words to be highlighted

$dom : object

a DOM object for the whole document

Return values
object

the node modified to now have highlighting

mirrorHandle()

Only used for serial network queries Used to check if there are any mirrors of the current server.

public mirrorHandle() : bool

If so, it tries to distribute the query requests randomly amongst the mirrors and itself. To determine if there are mirrors of the current server it looks in a mirror_table.txt file for machines that have notified this machine they are mirroring it.

Return values
bool

whether or not a mirror of the current site handled it

model()

Dynamic loader for Model objects which might live on the current Controller

public model(string $model) : mixed
Parameters
$model : string

name of model to return

Return values
mixed

pagingLogic()

When an activity involves displaying tabular data (such as rows of users, groups, etc), this method might be called to set up $data fields for next, prev, and page links, it also makes the call to the model to get the row data sorted and restricted as desired. For some data sources, rather than directly make a call to the model to get the data it might be passed directly to this method.

public pagingLogic(array<string|int, mixed> &$data, mixed $field_or_model, string $output_field, int $default_show[, array<string|int, mixed> $search_array = [] ][, string $var_prefix = "" ][, array<string|int, mixed> $args = null ]) : mixed
Parameters
$data : array<string|int, mixed>

used to send data to the view will be updated by this method with row and paging data

$field_or_model : mixed

if an object, this is assumed to be a model and so the getRows method of this model is called to get row data, sorted and restricted according to $search_array; if a string then the row data is assumed to be in $data[$field_or_model] and pagingLogic itself does the sorting and restricting.

$output_field : string

output rows for the view will be stored in $data[$output_field]

$default_show : int

if not specified by $_REQUEST, then this will be used to determine the maximum number of rows that will be written to $data[$output_field]

$search_array : array<string|int, mixed> = []

used to sort and restrict in the getRows call or the data from $data[$field_or_model]. Each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by

$var_prefix : string = ""

if there are multiple uses of pagingLogic presented on the same view then $var_prefix can be prepended to to the $data field variables like num_show, start_row, end_row to distinguish between them

$args : array<string|int, mixed> = null

additional arguments that are passed to getRows and in turn to selectCallback, fromCallback, and whereCallback that might provide user_id, etc to further control which rows are returned

Return values
mixed

parsePageHeadVars()

Used to parse head meta variables out of a data string provided either from a wiki page or a static page. Meta data is stored in lines before the first occurrence of END_HEAD_VARS. Head variables are name=value pairs. An example of head variable might be: title = This web page's title Anything after a semi-colon on a line in the head section is treated as a comment

public parsePageHeadVars(string $page_data[, mixed $with_body = false ]) : array<string|int, mixed>
Parameters
$page_data : string

this is the actual content of a wiki or static page

$with_body : mixed = false
Return values
array<string|int, mixed>

the associative array of head variables or pair [head vars, page body]

parsePageHeadVarsView()

Used to set up the head variables for and page_data of a wiki or static page associated with a view.

public parsePageHeadVarsView(object $view, string $page_name, string $page_data) : mixed
Parameters
$view : object

View on which page data will be rendered

$page_name : string

a string name/id to associate with page. For example, might have 404 for a page about 404 errors

$page_data : string

this is the actual content of a wiki or static page

Return values
mixed

plugin()

Dynamic loader for Plugin objects which might live on the current Controller

public plugin(string $plugin) : mixed
Parameters
$plugin : string

name of Plugin to return

Return values
mixed

processQuery()

Searches the database for the most relevant pages for the supplied search terms. Renders the results to the HTML page.

public processQuery(array<string|int, mixed> &$data, string $query, string $activity, string $arg, int $results_per_page, int $limit, int $index_name, int $raw, mixed $save_timestamp[, array<string|int, mixed> $ranking_factors = [] ]) : mixed
Parameters
$data : array<string|int, mixed>

an array of view data that will be updated to include at most results_per_page many search results

$query : string

a string containing the words to search on

$activity : string

besides a straight search for words query, one might have other searches, such as a search for related pages. this argument says what kind of search to do.

$arg : string

for a search other than a straight word query this argument provides auxiliary information on how to conduct the search. For instance on a related web page search, it might provide the url of the site with which to perform the related search.

$results_per_page : int

the maixmum number of search results that can occur on a page

$limit : int

the first page of all the pages with the query terms to return. For instance, if 10 then the tenth highest ranking page for those query terms will be return, then the eleventh, etc.

$index_name : int

the timestamp of an index to use, if 0 then default used

$raw : int

($raw == 0) normal grouping, $raw > 0 no grouping done on data. If $raw == 1 no summary returned (used with f=serial, end user probably does not want) In this case, will get offset, generation, etc so could later lookup

$save_timestamp : mixed

if this timestamp is nonzero, then save iterate position, so can resume on future queries that make use of the timestamp. $save_time_stamp may also be in the format of string timestamp-query_part to handle networked queries involving presentations

$ranking_factors : array<string|int, mixed> = []

field say how url, keywords, and title words should influence relevance and doc rank calculations

Return values
mixed

processRequest()

This is the main entry point for handling a search request.

public processRequest() : mixed

ProcessRequest determines the type of search request (normal request , cache request, or related request), or if its a user is returning from the admin panel via signout. It then calls the appropriate method to handle the given activity.Finally, it draw the search screen.

Return values
mixed

queryRequest()

Part of Yioop Search API. Performs a normal search query and returns associative array of query results

public queryRequest(string $query, int $results_per_page, int $limit, int $grouping, int $save_timestamp[, array<string|int, mixed> $ranking_factors = [] ]) : array<string|int, mixed>
Parameters
$query : string

this can be any query string that could be entered into the search bar on Yioop (other than related: and cache: queries)

$results_per_page : int

number of results to return

$limit : int

first result to return from the ordered query results

$grouping : int

($grouping == 0) normal grouping of links with associated document, ($grouping > 0) no grouping done on data

$save_timestamp : int

if this timestamp is nonzero, then save iterate position, so can resume on future queries that make use of the timestamp

$ranking_factors : array<string|int, mixed> = []

field say how url, keywords, and title words should influence relevance and doc rank calculations

Return values
array<string|int, mixed>

associative array of results for the query performed

recordViewSession()

Used to store in a session which media list items have been viewed so we can put an indicator by them when the media list is rendered

public recordViewSession(int $page_id, string $sub_path, string $media_name) : mixed
Parameters
$page_id : int

the id of page with media list

$sub_path : string

the resource folder on that page

$media_name : string

item to store indiicator into session for

Return values
mixed

redirectLocation()

Method to perform a 301 redirect to $location in both under web server and CLI setting

public redirectLocation(string $location) : mixed
Parameters
$location : string

url to redirect to

Return values
mixed

redirectWithMessage()

Does a 301 redirect to the given location, sets a session variable to display a message when get there.

public redirectWithMessage(string $message[, string $copy_fields = false ][, bool $restart = false ][, bool $use_base_url = false ]) : mixed
Parameters
$message : string

message to write

$copy_fields : string = false

$_REQUEST fields to copy for redirect

$restart : bool = false

if yioop is being run as its own server rather than under apache whether to restart this server.

$use_base_url : bool = false

set true if the base_url be included in the redirect

Return values
mixed

relatedRequest()

Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results

public relatedRequest(string $url, int $results_per_page, int $limit, string $crawl_time, int $grouping, int $save_timestamp) : array<string|int, mixed>
Parameters
$url : string

to find related documents for

$results_per_page : int

number of results to return

$limit : int

first result to return from the ordered query results

$crawl_time : string

timestamp of crawl to look for related request

$grouping : int

($grouping == 0) normal grouping of links with associated document, ($grouping > 0) no grouping done on data

$save_timestamp : int

if this timestamp is nonzero, then save iterate position, so can resume on future queries that make use of the timestamp

Return values
array<string|int, mixed>

associative array of results for the query performed

restrictQueryByUserAgent()

Sometimes robots disobey the statistics page nofollow meta tag.

public restrictQueryByUserAgent(string $query) : string

and need to be stopped before they query the whole index

Parameters
$query : string

the search request string

Return values
string

the search request string if not a bot; "" otherwise

setupGraphicalCaptchaViewData()

Sets up the graphical captcha view Draws the string for graphical captcha

public setupGraphicalCaptchaViewData(array<string|int, mixed> &$data) : mixed
Parameters
$data : array<string|int, mixed>

used by view to draw any dynamic content in this case we append a field "CAPTCHA_IMAGE" with a data url of the captcha to draw.

Return values
mixed

toggleHistory()

The history toggle displays the year and month associated with the timestamp at which the page was cached.

public toggleHistory(array<string|int, mixed> $months, DOMElement $div_node, DOMDocument $dom) : mixed
Parameters
$months : array<string|int, mixed>

used to store month names for which we have a cache

$div_node : DOMElement

is the section that contains the History UI

$dom : DOMDocument

is the DOM of the cached page

Return values
mixed

view()

Dynamic loader for View objects which might live on the current Controller

public view(string $view) : mixed
Parameters
$view : string

name of view to return

Return values
mixed

viewLinksByYearMonth()

Display links based on selected year and month in History UI

public viewLinksByYearMonth(array<string|int, mixed> $years, array<string|int, mixed> $months, string $current_year, string $current_month, array<string|int, mixed> $time_ds, DOMDocument $dom) : DOMElement
Parameters
$years : array<string|int, mixed>

is an array storing years associated with all indexes

$months : array<string|int, mixed>

is an array storing months

$current_year : string

is the year associated with the timestamp of the cached page

$current_month : string

is the month associated with the timestamp of the cached page

$time_ds : array<string|int, mixed>

is the data structure for History UI

$dom : DOMDocument

is the DOM for the cached page

Return values
DOMElement

$d1 is the section containing the options for selecting year and month


        

Search results