SearchController
extends Controller
in package
implements
CrawlConstants
Controller used to handle search requests to SeekQuarry search site. Used to both get and display search results.
Tags
Interfaces, Classes, Traits and Enums
- CrawlConstants
- Shared constants and enums used by components that are involved in the crawling process
Table of Contents
- $activities : array<string|int, mixed>
- Says which activities (roughly methods invoke from the web) this controller will respond to
- $activity_component : array<string|int, mixed>
- Associative array of activity => component activity is on, used by @see Controller::call method to actually invoke a given activity on a given component
- $component_activities : array<string|int, mixed>
- Associative array of $components activities for this controller Components are collections of activities (a little like traits) which can be reused.
- $component_instances : array<string|int, mixed>
- Array of instances of components used by this controller
- $image_subsearch_enabled : bool
- Flag to indicate if image subsearch are enabled
- $model_instances : array<string|int, mixed>
- Array of instances of models used by this controller
- $plugin_instances : array<string|int, mixed>
- Array of instances of indexing_plugins used by this controller
- $subsearch_default_query : int
- Default query to use if user doesn't provide one for the current subsearch
- $subsearch_identifier : string
- The localization identifier for the current subsearch
- $subsearch_name : string
- Name of the sub-search currently in use
- $subsearch_per_page : int
- Default number of results to display for the current subsearch
- $video_subsearch_enabled : bool
- Flag to indicate if video subsearch are enabled
- $view_instances : array<string|int, mixed>
- Array of instances of views used by this controller
- $web_site : WebSite
- Stores a reference to the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode.
- __construct() : mixed
- In addition to calling the base class' constructor, set up FileCache objects if we're configured to do query caching
- addCacheJavascriptTags() : mixed
- Add to supplied node subnodes containing script tags for javascript libraries used to display cache pages
- addDifferentialPrivacy() : int
- Adds to an integer, $actual_value, epsilon-noise taken from an L_1 gaussian source to centered at $actual_value to get a epsilon private, integer value.
- addKeywordLinks() : string
- Function used to add links for keyword searches in keyword_links array of $cache_item to the text of the $web_page we are going to display the cache of as part of a pache page request
- addLandingHighlights() : mixed
- Adds data about the currently computed landing highlights to the view $data variable so the view can draw this information
- addSearchViewData() : mixed
- Prepares the array $data so the SearchView can draw search results
- baseLink() : string
- Used to create the base link for links to be displayed on caches of web pages this link points to yioop because links on cache pages are to other cache pages
- cacheRequest() : string
- Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results
- cacheRequestAndOutput() : mixed
- Used to get and render a cached web page
- calculateControlWords() : array<string|int, mixed>
- Extracts from the query string any control words: mix:, m:, raw:, no: and returns an array consisting of the query with these words removed, and then variables for their values.
- call() : mixed
- Used to invoke an activity method of the current controller or one its components
- canonicalizeLinks() : object
- Make relative links canonical with respect to provided $url for links appear within the Dom node.
- checkCSRFTime() : bool
- Checks if the timestamp in $_REQUEST[$token_name] matches the timestamp of the last CSRF token accessed by this user for the kind of activity for which there might be a conflict.
- checkCSRFToken() : bool
- Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)
- checkRequest() : bool
- Checks the request if a request is for a valid activity and if it uses the correct authorization key
- clean() : string
- Used to clean strings that might be tainted as originate from the user
- clearQuerySavepoint() : mixed
- Query timestamps can be used to save an iteration position in a a set of query results. This method allows one to delete the supplied save point.
- component() : mixed
- Dynamic loader for Component objects which might live on the current Component
- convertArrayLines() : string
- Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned
- convertStringCleanArray() : array<string|int, mixed>
- Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas. # is treated as a comment
- crawlItemSummary() : string
- Generates a string representation of a crawl item suitable for for output in a cache page
- createDomBoxNode() : DOMElement
- Creates a bordered tag (usually div) in which to put meta content on a page when it is displayed
- createHistoryDataStructure() : array<string|int, mixed>
- Creates a data structure for storing years, months and associated timestamp components
- createLinkDivs() : DOMElement
- Create divs for links based on all (year, month) combinations
- createSummaryAndToggleNodes() : DOMElement
- Creates the toggle link and hidden div for extracted header and summary element on cache pages
- displayView() : mixed
- Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode
- extractActivityQuery() : array<string|int, mixed>
- This method is responsible for parsing out the kind of query from the raw query string
- fieldRequest() : mixed
- A fieldRequest is a special kind of cache request in which only one field (usually, the favicon field) of a crawl item is desired for a particular crawl time from a list of $crawl_items of a similar type from several different crawl times. This methof takes, a $request_field, a $crawl_time, and an associative array $crawl_items of pairs timestamp => $crawl_item and outputs to the current stream with appropriate type HTTP headers the desired field (in the case of favicons (does some processing to make image out of a data url)).
- formatCachePage() : mixed
- Formats a cache of a web page (adds history ui and highlight keywords)
- generateCSRFToken() : string
- Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY
- getAccessModifiers() : array<string|int, mixed>
- Returns an array of the possible modifiers to the access to the activity in question.
- getCrawlItems() : array<string|int, mixed>
- Get crawl items based on queue server setting.
- getCSRFTime() : int
- Used to return just the timestamp portion of the CSRF token
- getIndexingPluginList() : mixed
- Used to get a list of all available indexing plugins for this Yioop instance.
- getIndexTimestamp() : string
- Finds the timestamp of the main crawl or mix to return results from Does not do checking to make sure timestamp exists.
- getTopPhrases() : array<string|int, mixed>
- Given a page summary extract the words from it and try to find documents which match the most relevant words. The algorithm for "relevant" is pretty weak. For now we pick the $num many words whose ratio of number of occurrences in crawl item/ number of occurrences in all documents is the largest
- historyUI() : DOMElement
- User Interface for history feature
- imageCachePage() : string
- Makes an HTML web page for an image cache item
- initializeAdFields() : mixed
- If external source advertisements are present in the output of this controller this function can be used to initialize the field variables used to write the appropriate Javascripts
- initializeIndexInfo() : array<string|int, mixed>
- Determines which crawl or mix timestamp should be in use for this query. It also determines info and returns associated with this timestamp.
- initializeResponseFormat() : array<string|int, mixed>
- Determines how this query is being run and return variables for the view
- initializeSubsearches() : array<string|int, mixed>
- Determines if query results are using a subsearch, and if so initializes them, also it sets up list of subsearches to draw at top of screen.
- initializeUserAndDefaultActivity() : array<string|int, mixed>
- Determines the kind of user session that this search request is for
- makeMediaGroups() : array<string|int, mixed>
- Groups search result pages together which have thumbnails from an array of search pages. Grouped thumbnail pages stored at array index of first thumbnail found, non thumbnail pages stored where were before
- markChildren() : object
- Used in rendering a cached web page to highlight the search terms.
- mirrorHandle() : bool
- Only used for serial network queries Used to check if there are any mirrors of the current server.
- model() : mixed
- Dynamic loader for Model objects which might live on the current Controller
- pagingLogic() : mixed
- When an activity involves displaying tabular data (such as rows of users, groups, etc), this method might be called to set up $data fields for next, prev, and page links, it also makes the call to the model to get the row data sorted and restricted as desired. For some data sources, rather than directly make a call to the model to get the data it might be passed directly to this method.
- parsePageHeadVars() : array<string|int, mixed>
- Used to parse head meta variables out of a data string provided either from a wiki page or a static page. Meta data is stored in lines before the first occurrence of END_HEAD_VARS. Head variables are name=value pairs. An example of head variable might be: title = This web page's title Anything after a semi-colon on a line in the head section is treated as a comment
- parsePageHeadVarsView() : mixed
- Used to set up the head variables for and page_data of a wiki or static page associated with a view.
- plugin() : mixed
- Dynamic loader for Plugin objects which might live on the current Controller
- processQuery() : mixed
- Searches the database for the most relevant pages for the supplied search terms. Renders the results to the HTML page.
- processRequest() : mixed
- This is the main entry point for handling a search request.
- queryRequest() : array<string|int, mixed>
- Part of Yioop Search API. Performs a normal search query and returns associative array of query results
- recordViewSession() : mixed
- Used to store in a session which media list items have been viewed so we can put an indicator by them when the media list is rendered
- redirectLocation() : mixed
- Method to perform a 301 redirect to $location in both under web server and CLI setting
- redirectWithMessage() : mixed
- Does a 301 redirect to the given location, sets a session variable to display a message when get there.
- relatedRequest() : array<string|int, mixed>
- Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results
- restrictQueryByUserAgent() : string
- Sometimes robots disobey the statistics page nofollow meta tag.
- setupGraphicalCaptchaViewData() : mixed
- Sets up the graphical captcha view Draws the string for graphical captcha
- toggleHistory() : mixed
- The history toggle displays the year and month associated with the timestamp at which the page was cached.
- view() : mixed
- Dynamic loader for View objects which might live on the current Controller
- viewLinksByYearMonth() : DOMElement
- Display links based on selected year and month in History UI
Properties
$activities
Says which activities (roughly methods invoke from the web) this controller will respond to
public
array<string|int, mixed>
$activities
= ["query", "cache", "chart", "related", "signout", "recordClick", "trending"]
$activity_component
Associative array of activity => component activity is on, used by @see Controller::call method to actually invoke a given activity on a given component
public
array<string|int, mixed>
$activity_component
= []
$component_activities
Associative array of $components activities for this controller Components are collections of activities (a little like traits) which can be reused.
public
static array<string|int, mixed>
$component_activities
= []
$component_instances
Array of instances of components used by this controller
public
array<string|int, mixed>
$component_instances
$image_subsearch_enabled
Flag to indicate if image subsearch are enabled
public
bool
$image_subsearch_enabled
$model_instances
Array of instances of models used by this controller
public
array<string|int, mixed>
$model_instances
$plugin_instances
Array of instances of indexing_plugins used by this controller
public
array<string|int, mixed>
$plugin_instances
$subsearch_default_query
Default query to use if user doesn't provide one for the current subsearch
public
int
$subsearch_default_query
= ""
$subsearch_identifier
The localization identifier for the current subsearch
public
string
$subsearch_identifier
= ""
$subsearch_name
Name of the sub-search currently in use
public
string
$subsearch_name
= ""
$subsearch_per_page
Default number of results to display for the current subsearch
public
int
$subsearch_per_page
= 10
$video_subsearch_enabled
Flag to indicate if video subsearch are enabled
public
bool
$video_subsearch_enabled
$view_instances
Array of instances of views used by this controller
public
array<string|int, mixed>
$view_instances
= []
$web_site
Stores a reference to the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode.
public
WebSite
$web_site
In CLI, mode it is useful for caching files in RAM as they are read
Methods
__construct()
In addition to calling the base class' constructor, set up FileCache objects if we're configured to do query caching
public
__construct([WebSite $web_site = null ]) : mixed
Parameters
- $web_site : WebSite = null
-
is the web server when Yioop runs in CLI mode, it acts as request router in non-CLI mode. In CLI, mode it is useful for caching files in RAM as they are read
Return values
mixed —addCacheJavascriptTags()
Add to supplied node subnodes containing script tags for javascript libraries used to display cache pages
public
addCacheJavascriptTags(DOMDocument $dom, DomElement &$node) : mixed
Parameters
- $dom : DOMDocument
-
used to create new nodes
- $node : DomElement
-
what to add script node to
Return values
mixed —addDifferentialPrivacy()
Adds to an integer, $actual_value, epsilon-noise taken from an L_1 gaussian source to centered at $actual_value to get a epsilon private, integer value.
public
addDifferentialPrivacy(int $actual_value) : int
Parameters
- $actual_value : int
-
number want to make private
Return values
int —$fuzzy_value number after noise added
addKeywordLinks()
Function used to add links for keyword searches in keyword_links array of $cache_item to the text of the $web_page we are going to display the cache of as part of a pache page request
public
addKeywordLinks(string $web_page, array<string|int, mixed> &$cache_item) : string
Parameters
- $web_page : string
-
to add links to
- $cache_item : array<string|int, mixed>
-
original cache item web page generated from
Return values
string —modified web page
addLandingHighlights()
Adds data about the currently computed landing highlights to the view $data variable so the view can draw this information
public
addLandingHighlights(array<string|int, mixed> &$data) : mixed
Parameters
- $data : array<string|int, mixed>
-
of fields to be used by the view for drawing
Return values
mixed —addSearchViewData()
Prepares the array $data so the SearchView can draw search results
public
addSearchViewData(array<string|int, mixed> $index_info, bool $no_query, int $raw, string $view, array<string|int, mixed> $subsearches, array<string|int, mixed> &$data) : mixed
Parameters
- $index_info : array<string|int, mixed>
-
an array of info about that index in use
- $no_query : bool
-
true in the case of a news subsearch when no query was entered by the user but still want to display news
- $raw : int
-
$raw what kind of grouping of identical results should be done (0 is default, 1 and higher used for internal queries)
- $view : string
-
name of view class search results are for
- $subsearches : array<string|int, mixed>
-
an array of data about each subsearch to draw to the view
- $data : array<string|int, mixed>
-
that will eventually be sent to the view for rendering. This method adds fields to the array
Return values
mixed —baseLink()
Used to create the base link for links to be displayed on caches of web pages this link points to yioop because links on cache pages are to other cache pages
public
baseLink() : string
Return values
string —desired base link
cacheRequest()
Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results
public
cacheRequest(string $url[, array<string|int, mixed> $ui_flags = [] ][, string $terms = "" ], string $crawl_time) : string
Parameters
- $url : string
-
to get cached page for
- $ui_flags : array<string|int, mixed> = []
-
array of ui features which should be added to the cache page. For example, "highlight" would way search terms should be highlighted, "history" says add history navigation for all copies of this cache page in yioop system.
- $terms : string = ""
-
space separated list of search terms
- $crawl_time : string
-
timestamp of crawl to look for cached page in
Return values
string —with contents of cached page
cacheRequestAndOutput()
Used to get and render a cached web page
public
cacheRequestAndOutput(string $url[, array<string|int, mixed> $ui_flags = [] ][, string $terms = "" ], int $crawl_time) : mixed
Parameters
- $url : string
-
the url of the page to find the cached version of
- $ui_flags : array<string|int, mixed> = []
-
array of ui features which should be added to the cache page. For example, "highlight" would say search terms should be highlighted, "history" says add history navigation for all copies of this cache page in yioop system. "summaries" says add a toggle headers and extracted summaries link. "cache_link_referrer" says a link on a cache page referred us to the current cache request
- $terms : string = ""
-
from original query responsible for cache request
- $crawl_time : int
-
the timestamp of the crawl to look up the cached page in
Return values
mixed —calculateControlWords()
Extracts from the query string any control words: mix:, m:, raw:, no: and returns an array consisting of the query with these words removed, and then variables for their values.
public
calculateControlWords(string $query, bool $raw, bool $is_mix, string $index_name) : array<string|int, mixed>
Parameters
- $query : string
-
original query string
- $raw : bool
-
the $_REQUEST['raw'] value
- $is_mix : bool
-
if the current index name is that of a crawl mix
- $index_name : string
-
timestamp of current mix or index
Return values
array<string|int, mixed> —($query, $raw, $network_allowed, $use_cache_if_possible, $guess_semantics)
call()
Used to invoke an activity method of the current controller or one its components
public
call(string $activity[, string $modifiers = [] ]) : mixed
Parameters
- $activity : string
-
method to invoke
- $modifiers : string = []
-
access modifiers to executing this method
Return values
mixed —canonicalizeLinks()
Make relative links canonical with respect to provided $url for links appear within the Dom node.
public
canonicalizeLinks(object $node, string $url) : object
Parameters
- $node : object
-
dom node to fix links for
- $url : string
-
url to use to canonicalize links
Return values
object —updated dom node
checkCSRFTime()
Checks if the timestamp in $_REQUEST[$token_name] matches the timestamp of the last CSRF token accessed by this user for the kind of activity for which there might be a conflict.
public
checkCSRFTime(string $token_name[, string $action = "" ]) : bool
This is to avoid accidental replays of postings etc if the back button used.
Parameters
- $token_name : string
-
name of a $_REQUEST field used to hold a CSRF_TOKEN
- $action : string = ""
-
name of current action to check for conflicts
Return values
bool —whether a conflicting action has occurred.
checkCSRFToken()
Checks if the form CSRF (cross-site request forgery preventing) token matches the given user and has not expired (1 hour till expires)
public
checkCSRFToken(string $token_name, string $user_id[, bool $use_name_as_passed = false ]) : bool
Parameters
- $token_name : string
-
attribute of $_REQUEST containing CSRFToken
- $user_id : string
-
user id of the user to check the token for
- $use_name_as_passed : bool = false
-
whether to use $token_name as the token (if true) or to use $_REQUEST[$token_name]
Return values
bool —whether the CSRF token was valid
checkRequest()
Checks the request if a request is for a valid activity and if it uses the correct authorization key
public
checkRequest() : bool
Return values
bool —whether the request was valid or not
clean()
Used to clean strings that might be tainted as originate from the user
public
clean(mixed $value, mixed $type[, mixed $default = null ]) : string
Parameters
- $value : mixed
-
tainted data
- $type : mixed
-
type of data in value can be one of the following strings: bool, color, double, float, int, hash, or string, web-url; or it can be an array listing allowed values. If the latter, then if the value is not in the array the cleaned value will be first element of the array if $default is null
- $default : mixed = null
-
if $value is not set default value is returned, this isn't used much since if the error_reporting is E_ALL or -1 you would still get a Notice.
Return values
string —the clean input matching the type provided
clearQuerySavepoint()
Query timestamps can be used to save an iteration position in a a set of query results. This method allows one to delete the supplied save point.
public
clearQuerySavepoint(int $save_timestamp) : mixed
Parameters
- $save_timestamp : int
-
deletes a previously query saved timestamp
Return values
mixed —component()
Dynamic loader for Component objects which might live on the current Component
public
component(string $component) : mixed
Parameters
- $component : string
-
name of model to return
Return values
mixed —convertArrayLines()
Converts an array of lines of strings into a single string with proper newlines, each line having been trimmed and potentially cleaned
public
convertArrayLines(array<string|int, mixed> $arr[, string $endline_string = "
" ][, bool $clean = false ]) : string
Parameters
- $arr : array<string|int, mixed>
-
the array of lines to be process
- $endline_string : string = " "
-
what string should be used to indicate the end of a line
- $clean : bool = false
-
whether to clean each line
Return values
string —a concatenated string of cleaned lines
convertStringCleanArray()
Cleans a string consisting of lines, typically of urls into an array of clean lines. This is used in handling data from the crawl options text areas. # is treated as a comment
public
convertStringCleanArray(string $str[, string $line_type = "url" ]) : array<string|int, mixed>
Parameters
- $str : string
-
contains the url data
- $line_type : string = "url"
-
does additional cleaning depending on the type of the lines. For instance, if is "url" then a line not beginning with a url scheme will have http:// prepended.
Return values
array<string|int, mixed> —$lines an array of clean lines
crawlItemSummary()
Generates a string representation of a crawl item suitable for for output in a cache page
public
crawlItemSummary(array<string|int, mixed> $crawl_item) : string
Parameters
- $crawl_item : array<string|int, mixed>
-
summary information of a web page (title, description, etc)
Return values
string —suitable string formatting of item
createDomBoxNode()
Creates a bordered tag (usually div) in which to put meta content on a page when it is displayed
public
createDomBoxNode(DOMDocument $dom, string $text_align[, string $more_styles = "" ][, string $tag = "div" ]) : DOMElement
Parameters
- $dom : DOMDocument
-
representing cache page
- $text_align : string
-
whether doc is ltr or rtl
- $more_styles : string = ""
-
any additional styles for box
- $tag : string = "div"
-
base tag of box (default div)
Return values
DOMElement —of styled box
createHistoryDataStructure()
Creates a data structure for storing years, months and associated timestamp components
public
createHistoryDataStructure(array<string|int, mixed> $all_crawl_times, string $locale_type, string $url) : array<string|int, mixed>
Parameters
- $all_crawl_times : array<string|int, mixed>
-
is an array storing all crawl time
- $locale_type : string
-
is the locale tag
- $url : string
-
is the URL for the cached page
Return values
array<string|int, mixed> —$results is an array storing years array, months array and the combined data structure for the History UI
createLinkDivs()
Create divs for links based on all (year, month) combinations
public
createLinkDivs(array<string|int, mixed> $time_ds, string $current_year, string $current_month, DOMElement $d1, DOMDocument $dom, string $url, array<string|int, mixed> $years, bool $hist_ui_open, string $terms, long $crawl_time) : DOMElement
Parameters
- $time_ds : array<string|int, mixed>
-
is the data structure for History UI
- $current_year : string
-
is the year associated with the timestamp of the cached page
- $current_month : string
-
is the month associated with the timestamp of the cached page
- $d1 : DOMElement
-
is the section that contains options for years and months
- $dom : DOMDocument
-
is the DOM for the cached page
- $url : string
-
is the URL for the cached page
- $years : array<string|int, mixed>
-
is an array storing years associated with all indexes
- $hist_ui_open : bool
-
checks if the History UI state should be open
- $terms : string
-
is a string containing the query terms
- $crawl_time : long
-
is the crawl time for the cached page
Return values
DOMElement —$d1 is the section containing the options for selecting year and month
createSummaryAndToggleNodes()
Creates the toggle link and hidden div for extracted header and summary element on cache pages
public
createSummaryAndToggleNodes(DOMDocument $dom, string $text_align, DOMElement $body, string $summary_string, array<string|int, mixed> $cache_item) : DOMElement
Parameters
- $dom : DOMDocument
-
used to create new nodes to add to body object for page
- $text_align : string
-
whether rtl or ltr language
- $body : DOMElement
-
represent body of cached page
- $summary_string : string
-
header and summary that were extracted
- $cache_item : array<string|int, mixed>
-
contains info about the cached item
Return values
DOMElement —a div node with toggle link and hidden div
displayView()
Send the provided view to output, drawing it with the given data variable, using the current locale for translation, and writing mode
public
displayView(string $view, array<string|int, mixed> $data) : mixed
Parameters
- $view : string
-
the name of the view to draw
- $data : array<string|int, mixed>
-
an array of values to use in drawing the view
Return values
mixed —extractActivityQuery()
This method is responsible for parsing out the kind of query from the raw query string
public
extractActivityQuery() : array<string|int, mixed>
This method parses the raw query string for query activities. It parses the name of each activity and its argument
Return values
array<string|int, mixed> —list of search activities parsed out of the search string
fieldRequest()
A fieldRequest is a special kind of cache request in which only one field (usually, the favicon field) of a crawl item is desired for a particular crawl time from a list of $crawl_items of a similar type from several different crawl times. This methof takes, a $request_field, a $crawl_time, and an associative array $crawl_items of pairs timestamp => $crawl_item and outputs to the current stream with appropriate type HTTP headers the desired field (in the case of favicons (does some processing to make image out of a data url)).
public
fieldRequest(string $request_field, int $crawl_time, array<string|int, mixed> $crawl_items) : mixed
Parameters
- $request_field : string
-
field desire out of crawl_item
- $crawl_time : int
-
timestamp of crawl_item in list of $crawl_items
- $crawl_items : array<string|int, mixed>
-
pairs timestamp => $crawl_item of crawl item to look through
Return values
mixed —formatCachePage()
Formats a cache of a web page (adds history ui and highlight keywords)
public
formatCachePage(array<string|int, mixed> $cache_item, string $cache_file, string $url, string $summary_string, int $crawl_time, array<string|int, mixed> $all_crawl_times, string $terms, array<string|int, mixed> $ui_flags) : mixed
Parameters
- $cache_item : array<string|int, mixed>
-
details meta information about the cache page
- $cache_file : string
-
contains current web page before formatting
- $url : string
-
that cache web page was originally from
- $summary_string : string
-
summary data that was extracted from the web page to be put in the actually inverted index
- $crawl_time : int
-
timestamp of crawl cache page was from
- $all_crawl_times : array<string|int, mixed>
-
timestamps of all crawl times currently in Yioop system
- $terms : string
-
from original query responsible for cache request
- $ui_flags : array<string|int, mixed>
-
array of ui features which should be added to the cache page. For example, "highlight" would way search terms should be highlighted, "history" says add history navigation for all copies of this cache page in yioop system. return string of formatted cached page
Return values
mixed —generateCSRFToken()
Generates a cross site request forgery preventing token based on the provided user name, the current time and the hidden AUTH_KEY
public
generateCSRFToken(string $user) : string
Parameters
- $user : string
-
username to use to generate token
Return values
string —a csrf token
getAccessModifiers()
Returns an array of the possible modifiers to the access to the activity in question.
public
getAccessModifiers(string $activity) : array<string|int, mixed>
Parameters
- $activity : string
-
method to get access modifier list for
Return values
array<string|int, mixed> —of string names => translated names of the access modifiers for the method in question (if any exist).
getCrawlItems()
Get crawl items based on queue server setting.
public
getCrawlItems(string $url, array<string|int, mixed> $crawl_times, array<string|int, mixed> $queue_servers) : array<string|int, mixed>
Parameters
- $url : string
-
is the URL of the cached page
- $crawl_times : array<string|int, mixed>
-
is an array storing crawl times for all indexes
- $queue_servers : array<string|int, mixed>
-
is an array containing URLs for queue servers
Return values
array<string|int, mixed> —[$all_crawl_times, $all_crawl_items] is an array containing an array of crawl times and an array of their respective crawl items
getCSRFTime()
Used to return just the timestamp portion of the CSRF token
public
getCSRFTime(string $token_name) : int
Parameters
- $token_name : string
-
name of a $_REQUEST field used to hold a CSRF_TOKEN
Return values
int —the timestamp portion of the CSRF_TOKEN
getIndexingPluginList()
Used to get a list of all available indexing plugins for this Yioop instance.
public
getIndexingPluginList() : mixed
Return values
mixed —getIndexTimestamp()
Finds the timestamp of the main crawl or mix to return results from Does not do checking to make sure timestamp exists.
public
getIndexTimestamp() : string
Return values
string —current timestamp
getTopPhrases()
Given a page summary extract the words from it and try to find documents which match the most relevant words. The algorithm for "relevant" is pretty weak. For now we pick the $num many words whose ratio of number of occurrences in crawl item/ number of occurrences in all documents is the largest
public
getTopPhrases(string $crawl_item, int $num, int $crawl_time) : array<string|int, mixed>
Parameters
- $crawl_item : string
-
a page summary
- $num : int
-
number of key phrase to return
- $crawl_time : int
-
the timestamp of an index to use, if 0 then default used
Return values
array<string|int, mixed> —an array of most selective key phrases
historyUI()
User Interface for history feature
public
historyUI(long $crawl_time, array<string|int, mixed> $all_crawl_times, DOMElement $div_node, DOMDocument $dom, string $terms, bool $hist_ui_open, string $url) : DOMElement
Parameters
- $crawl_time : long
-
is the crawl time
- $all_crawl_times : array<string|int, mixed>
-
is an array storing all crawl time
- $div_node : DOMElement
-
is the section that contains the History UI
- $dom : DOMDocument
-
is the DOM of the cached page
- $terms : string
-
is a string containing query terms
- $hist_ui_open : bool
-
is a flag to check if History UI should be open by default
- $url : string
-
is the URL of the page
Return values
DOMElement —the section containing the options for selecting year and month
imageCachePage()
Makes an HTML web page for an image cache item
public
imageCachePage(string $url, array<string|int, mixed> $cache_item, string $cache_file, $queue_servers) : string
Parameters
- $url : string
-
original url of the image
- $cache_item : array<string|int, mixed>
-
details about the image item
- $cache_file : string
-
string with image
- $queue_servers :
-
machines used by yioop for the current index cache item is from. Used to find out urls on which image occurred
Return values
string —an HTML page with the image embedded as a data url
initializeAdFields()
If external source advertisements are present in the output of this controller this function can be used to initialize the field variables used to write the appropriate Javascripts
public
initializeAdFields(array<string|int, mixed> &$data[, bool $ads_off = false ]) : mixed
Parameters
- $data : array<string|int, mixed>
-
data to be used in drawing the view
- $ads_off : bool = false
-
whether or not ads are turned off so that this method should do nothing
Return values
mixed —initializeIndexInfo()
Determines which crawl or mix timestamp should be in use for this query. It also determines info and returns associated with this timestamp.
public
initializeIndexInfo(bool $web_flag, int $raw, array<string|int, mixed> &$data) : array<string|int, mixed>
Parameters
- $web_flag : bool
-
whether this is a web based query or one from the search API
- $raw : int
-
should validate against list of known crawls or an internal (say network) query that doesn't require validation (faster without).
- $data : array<string|int, mixed>
-
that will eventually be sent to the view. We set the 'its' (index_time_stamp) field here
Return values
array<string|int, mixed> —consisting of index timestamp of crawl or mix in use, $index_info an array of info about that index, and $save_timestamp timestamp of last savepoint, used if this query is being is the query for a crawl mix archive crawl.
initializeResponseFormat()
Determines how this query is being run and return variables for the view
public
initializeResponseFormat() : array<string|int, mixed>
A query might be run as a web-based where HTML is expected as the output, an RSS query, an API query, or as a serial query from a name_server or mirror instance back to one of the other queue servers in a Yioop installation. A query might also request different numbers of pages back beginning at different starting points in the result.
Return values
array<string|int, mixed> —consisting of (view to be used to render results, flag for whether html results should be used, int code for what kind of group of similar urls should be done on the results, number of search results to return, start from which result)
initializeSubsearches()
Determines if query results are using a subsearch, and if so initializes them, also it sets up list of subsearches to draw at top of screen.
public
initializeSubsearches() : array<string|int, mixed>
Return values
array<string|int, mixed> —(subsearches, no_query) where subsearches is itself an array of data about each subsearch to draw, and no_query is a bool flag used in the case of a news subsearch when no query was entered by the user but still want to display news
initializeUserAndDefaultActivity()
Determines the kind of user session that this search request is for
public
initializeUserAndDefaultActivity(array<string|int, mixed> &$data) : array<string|int, mixed>
This function is called by @see processRequest(). The user session might be one without a login, one with a login so need to validate against to prevent CSRF attacks, just after someone logged out, or a bot session (googlebot, etc) so remove the query request
Parameters
- $data : array<string|int, mixed>
-
that will eventually be sent to the view. We might update with error messages
Return values
array<string|int, mixed> —consisting of (query based on user info, whether if a cache request highlighting should be userd, what activity user wants, any arguments to this activity)
makeMediaGroups()
Groups search result pages together which have thumbnails from an array of search pages. Grouped thumbnail pages stored at array index of first thumbnail found, non thumbnail pages stored where were before
public
makeMediaGroups( $pages) : array<string|int, mixed>
Parameters
Return values
array<string|int, mixed> —[$pages after the grouping has been done, whether images or videos found]
markChildren()
Used in rendering a cached web page to highlight the search terms.
public
markChildren(object $node, array<string|int, mixed> $words, object $dom) : object
Parameters
- $node : object
-
DOM object to mark html elements of
- $words : array<string|int, mixed>
-
an array of words to be highlighted
- $dom : object
-
a DOM object for the whole document
Return values
object —the node modified to now have highlighting
mirrorHandle()
Only used for serial network queries Used to check if there are any mirrors of the current server.
public
mirrorHandle() : bool
If so, it tries to distribute the query requests randomly amongst the mirrors and itself. To determine if there are mirrors of the current server it looks in a mirror_table.txt file for machines that have notified this machine they are mirroring it.
Return values
bool —whether or not a mirror of the current site handled it
model()
Dynamic loader for Model objects which might live on the current Controller
public
model(string $model) : mixed
Parameters
- $model : string
-
name of model to return
Return values
mixed —pagingLogic()
When an activity involves displaying tabular data (such as rows of users, groups, etc), this method might be called to set up $data fields for next, prev, and page links, it also makes the call to the model to get the row data sorted and restricted as desired. For some data sources, rather than directly make a call to the model to get the data it might be passed directly to this method.
public
pagingLogic(array<string|int, mixed> &$data, mixed $field_or_model, string $output_field, int $default_show[, array<string|int, mixed> $search_array = [] ][, string $var_prefix = "" ][, array<string|int, mixed> $args = null ]) : mixed
Parameters
- $data : array<string|int, mixed>
-
used to send data to the view will be updated by this method with row and paging data
- $field_or_model : mixed
-
if an object, this is assumed to be a model and so the getRows method of this model is called to get row data, sorted and restricted according to $search_array; if a string then the row data is assumed to be in $data[$field_or_model] and pagingLogic itself does the sorting and restricting.
- $output_field : string
-
output rows for the view will be stored in $data[$output_field]
- $default_show : int
-
if not specified by $_REQUEST, then this will be used to determine the maximum number of rows that will be written to $data[$output_field]
- $search_array : array<string|int, mixed> = []
-
used to sort and restrict in the getRows call or the data from $data[$field_or_model]. Each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by
- $var_prefix : string = ""
-
if there are multiple uses of pagingLogic presented on the same view then $var_prefix can be prepended to to the $data field variables like num_show, start_row, end_row to distinguish between them
- $args : array<string|int, mixed> = null
-
additional arguments that are passed to getRows and in turn to selectCallback, fromCallback, and whereCallback that might provide user_id, etc to further control which rows are returned
Return values
mixed —parsePageHeadVars()
Used to parse head meta variables out of a data string provided either from a wiki page or a static page. Meta data is stored in lines before the first occurrence of END_HEAD_VARS. Head variables are name=value pairs. An example of head variable might be: title = This web page's title Anything after a semi-colon on a line in the head section is treated as a comment
public
parsePageHeadVars(string $page_data[, mixed $with_body = false ]) : array<string|int, mixed>
Parameters
- $page_data : string
-
this is the actual content of a wiki or static page
- $with_body : mixed = false
Return values
array<string|int, mixed> —the associative array of head variables or pair [head vars, page body]
parsePageHeadVarsView()
Used to set up the head variables for and page_data of a wiki or static page associated with a view.
public
parsePageHeadVarsView(object $view, string $page_name, string $page_data) : mixed
Parameters
- $view : object
-
View on which page data will be rendered
- $page_name : string
-
a string name/id to associate with page. For example, might have 404 for a page about 404 errors
- $page_data : string
-
this is the actual content of a wiki or static page
Return values
mixed —plugin()
Dynamic loader for Plugin objects which might live on the current Controller
public
plugin(string $plugin) : mixed
Parameters
- $plugin : string
-
name of Plugin to return
Return values
mixed —processQuery()
Searches the database for the most relevant pages for the supplied search terms. Renders the results to the HTML page.
public
processQuery(array<string|int, mixed> &$data, string $query, string $activity, string $arg, int $results_per_page, int $limit, int $index_name, int $raw, mixed $save_timestamp[, array<string|int, mixed> $ranking_factors = [] ]) : mixed
Parameters
- $data : array<string|int, mixed>
-
an array of view data that will be updated to include at most results_per_page many search results
- $query : string
-
a string containing the words to search on
- $activity : string
-
besides a straight search for words query, one might have other searches, such as a search for related pages. this argument says what kind of search to do.
- $arg : string
-
for a search other than a straight word query this argument provides auxiliary information on how to conduct the search. For instance on a related web page search, it might provide the url of the site with which to perform the related search.
- $results_per_page : int
-
the maixmum number of search results that can occur on a page
- $limit : int
-
the first page of all the pages with the query terms to return. For instance, if 10 then the tenth highest ranking page for those query terms will be return, then the eleventh, etc.
- $index_name : int
-
the timestamp of an index to use, if 0 then default used
- $raw : int
-
($raw == 0) normal grouping, $raw > 0 no grouping done on data. If $raw == 1 no summary returned (used with f=serial, end user probably does not want) In this case, will get offset, generation, etc so could later lookup
- $save_timestamp : mixed
-
if this timestamp is nonzero, then save iterate position, so can resume on future queries that make use of the timestamp. $save_time_stamp may also be in the format of string timestamp-query_part to handle networked queries involving presentations
- $ranking_factors : array<string|int, mixed> = []
-
field say how url, keywords, and title words should influence relevance and doc rank calculations
Return values
mixed —processRequest()
This is the main entry point for handling a search request.
public
processRequest() : mixed
ProcessRequest determines the type of search request (normal request , cache request, or related request), or if its a user is returning from the admin panel via signout. It then calls the appropriate method to handle the given activity.Finally, it draw the search screen.
Return values
mixed —queryRequest()
Part of Yioop Search API. Performs a normal search query and returns associative array of query results
public
queryRequest(string $query, int $results_per_page, int $limit, int $grouping, int $save_timestamp[, array<string|int, mixed> $ranking_factors = [] ]) : array<string|int, mixed>
Parameters
- $query : string
-
this can be any query string that could be entered into the search bar on Yioop (other than related: and cache: queries)
- $results_per_page : int
-
number of results to return
- $limit : int
-
first result to return from the ordered query results
- $grouping : int
-
($grouping == 0) normal grouping of links with associated document, ($grouping > 0) no grouping done on data
- $save_timestamp : int
-
if this timestamp is nonzero, then save iterate position, so can resume on future queries that make use of the timestamp
- $ranking_factors : array<string|int, mixed> = []
-
field say how url, keywords, and title words should influence relevance and doc rank calculations
Return values
array<string|int, mixed> —associative array of results for the query performed
recordViewSession()
Used to store in a session which media list items have been viewed so we can put an indicator by them when the media list is rendered
public
recordViewSession(int $page_id, string $sub_path, string $media_name) : mixed
Parameters
- $page_id : int
-
the id of page with media list
- $sub_path : string
-
the resource folder on that page
- $media_name : string
-
item to store indiicator into session for
Return values
mixed —redirectLocation()
Method to perform a 301 redirect to $location in both under web server and CLI setting
public
redirectLocation(string $location) : mixed
Parameters
- $location : string
-
url to redirect to
Return values
mixed —redirectWithMessage()
Does a 301 redirect to the given location, sets a session variable to display a message when get there.
public
redirectWithMessage(string $message[, string $copy_fields = false ][, bool $restart = false ][, bool $use_base_url = false ]) : mixed
Parameters
- $message : string
-
message to write
- $copy_fields : string = false
-
$_REQUEST fields to copy for redirect
- $restart : bool = false
-
if yioop is being run as its own server rather than under apache whether to restart this server.
- $use_base_url : bool = false
-
set true if the base_url be included in the redirect
Return values
mixed —relatedRequest()
Part of Yioop Search API. Performs a related to a given url search query and returns associative array of query results
public
relatedRequest(string $url, int $results_per_page, int $limit, string $crawl_time, int $grouping, int $save_timestamp) : array<string|int, mixed>
Parameters
- $url : string
-
to find related documents for
- $results_per_page : int
-
number of results to return
- $limit : int
-
first result to return from the ordered query results
- $crawl_time : string
-
timestamp of crawl to look for related request
- $grouping : int
-
($grouping == 0) normal grouping of links with associated document, ($grouping > 0) no grouping done on data
- $save_timestamp : int
-
if this timestamp is nonzero, then save iterate position, so can resume on future queries that make use of the timestamp
Return values
array<string|int, mixed> —associative array of results for the query performed
restrictQueryByUserAgent()
Sometimes robots disobey the statistics page nofollow meta tag.
public
restrictQueryByUserAgent(string $query) : string
and need to be stopped before they query the whole index
Parameters
- $query : string
-
the search request string
Return values
string —the search request string if not a bot; "" otherwise
setupGraphicalCaptchaViewData()
Sets up the graphical captcha view Draws the string for graphical captcha
public
setupGraphicalCaptchaViewData(array<string|int, mixed> &$data) : mixed
Parameters
- $data : array<string|int, mixed>
-
used by view to draw any dynamic content in this case we append a field "CAPTCHA_IMAGE" with a data url of the captcha to draw.
Return values
mixed —toggleHistory()
The history toggle displays the year and month associated with the timestamp at which the page was cached.
public
toggleHistory(array<string|int, mixed> $months, DOMElement $div_node, DOMDocument $dom) : mixed
Parameters
- $months : array<string|int, mixed>
-
used to store month names for which we have a cache
- $div_node : DOMElement
-
is the section that contains the History UI
- $dom : DOMDocument
-
is the DOM of the cached page
Return values
mixed —view()
Dynamic loader for View objects which might live on the current Controller
public
view(string $view) : mixed
Parameters
- $view : string
-
name of view to return
Return values
mixed —viewLinksByYearMonth()
Display links based on selected year and month in History UI
public
viewLinksByYearMonth(array<string|int, mixed> $years, array<string|int, mixed> $months, string $current_year, string $current_month, array<string|int, mixed> $time_ds, DOMDocument $dom) : DOMElement
Parameters
- $years : array<string|int, mixed>
-
is an array storing years associated with all indexes
- $months : array<string|int, mixed>
-
is an array storing months
- $current_year : string
-
is the year associated with the timestamp of the cached page
- $current_month : string
-
is the month associated with the timestamp of the cached page
- $time_ds : array<string|int, mixed>
-
is the data structure for History UI
- $dom : DOMDocument
-
is the DOM for the cached page
Return values
DOMElement —$d1 is the section containing the options for selecting year and month