SourceModel
extends ParallelModel
in package
Used to manage data related to video, news, and other search sources Also, used to manage data about available subsearches seen in SearchView
Tags
Table of Contents
- DEFAULT_DESCRIPTION_LENGTH = 150
- Default maximum character length of a search summary
- MAX_SNIPPET_TITLE_LENGTH = 20
- MIN_DESCRIPTION_LENGTH = 100
- the minimum length of a description before we stop appending additional link doc summaries
- MIN_SNIPPET_LENGTH = 100
- SNIPPET_LENGTH_LEFT = 20
- SNIPPET_LENGTH_RIGHT = 40
- SNIPPET_TITLE_LENGTH = 20
- $any_fields : array<string|int, mixed>
- These fields if present in $search_array (used by @see getRows() ), but with value "-1", will be skipped as part of the where clause but will be used for order by clause
- $cache : object
- Cache object to be used if we are doing caching
- $current_machine : int
- If known the id of the queue_server this belongs to
- $db : object
- Reference to a DatasourceManager
- $db_name : string
- Name of the search engine database
- $edited_page_summaries : array<string|int, mixed>
- Associative array of page summaries which might be used to override default page summaries if set.
- $index_name : string
- Stores the name of the current index archive to use to get search results from
- $private_db : object
- Reference to a private DatasourceManager
- $private_db_name : string
- Name of the private search engine database
- $search_table_column_map : array<string|int, mixed>
- Associations of the form name of field for web forms => database column names/abbreviations
- $web_site : object
- Reference to a WebSite object in use to serve pages (if any)
- __construct() : mixed
- Sets up the database manager that will be used and name of the search engine database
- addMediaSource() : mixed
- Used to add a new video, rss, html news, or other sources to Yioop
- addSubsearch() : mixed
- Adds a new subsearch to the list of subsearches. This are displayed at the top od the Yioop search pages.
- boldKeywords() : string
- Given a string, wraps in bold html tags a set of key words it contains.
- clearFeedData() : mixed
- Used to delete any feed data (IndexDataFeed bundle) and trending data in this Yioop installation.
- clearQuerySavePoint() : mixed
- A save point is used to store to disk a sequence generation-doc-offset pairs of a particular mix query when doing an archive crawl of a crawl mix. This is used so that the mix can remember where it was the next time it is invoked by the web app on the machine in question.
- createIfNecessaryDirectory() : int
- Creates a directory and sets it to world permission if it doesn't already exist
- deleteMediaSource() : mixed
- Deletes the media source whose id is the given timestamp
- deleteSubsearch() : mixed
- Deletes a subsearch from the subsearch table and removes its associated translations
- execMachines() : array<string|int, mixed>
- This method is invoked by other ParallelModel (@see CrawlModel for examples) methods when they want to have their method performed on an array of other Yioop instances. The results returned can then be aggregated. The invocation sequence is crawlModelMethodA invokes execMachine with a list of urls of other Yioop instances. execMachine makes REST requests of those instances of the given command and optional arguments This request would be handled by a CrawlController which in turn calls crawlModelMethodA on the given Yioop instance, serializes the result and gives it back to execMachine and then back to the originally calling function.
- fileGetContents() : string
- Either a wrapper for file_get_contents, or if a WebSite object is being used to serve pages, it reads it in using blocking I/O file_get_contents() and caches it before return its string contents.
- filePutContents() : mixed
- Either a wrapper for file_put_contents, or if a WebSite object is being used to serve pages, writes $data to the persistent file with name $filename. Saves a copy in the RAM cache if there is a copy already there.
- formatSinglePageResult() : array<string|int, mixed>
- Given a page summary, extracts snippets which are related to a set of search words. For each snippet, bold faces the search terms, and then creates a new summary array.
- fromCallback() : string
- Controls which tables and the names of tables underlie the given model and should be used in a getRows call As SourceModel is used for both media sources and subsearches.
- getCrawlItem() : array<string|int, mixed>
- Get a summary of a document based on its url, the active machines and the idnex we want to look up in.
- getCrawlItems() : array<string|int, mixed>
- Gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset).
- getDbmsList() : array<string|int, mixed>
- Gets a list of all DBMS that work with the search engine
- getMachineHashUrls() : a
- Receives a request to get machine data for an array of hashes of urls
- getMediaCategories() : array<string|int, mixed>
- Returns the media categories of the search sources that have been stored
- getMediaSource() : array<string|int, mixed>
- Return the media source by the name of the source
- getMediaSources() : array<string|int, mixed>
- Returns a list of media sources such as (video, rss sites) and their URL and thumb url formats, etc
- getRows() : array<string|int, mixed>
- Gets a range of rows which match the provided search criteria from $th provided table
- getSnippets() : string
- Given a string, extracts a snippets of text related to a given set of key words. For a given word a snippet is a window of characters to its left and right that is less than a maximum total number of characters.
- getSubsearch() : array<string|int, mixed>
- Return the media source by the name of the source
- getSubsearches() : array<string|int, mixed>
- Returns a list of the subsearches used by the current Yioop instances including their names translated to the current locale
- getSubsearchName() : string
- Given the folder name for a subsearch and a locale tag return the natural language name in that for the subsearch
- getUserId() : string
- Get the user_id associated with a given username (In base class as used as an internal method in both signin and user models)
- isSingleLocalhost() : bool
- Used to determine if an action involves just one yioop instance on the current local machine or not
- loginDbms() : bool
- Returns whether the provided dbms needs a login and password or not (sqlite or sqlite3)
- lookupSummaryOffsetGeneration() : array<string|int, mixed>
- Determines the offset into the summaries WebArchiveBundle and generation of the provided url (or hash_url) so that the info:url (info:base64_hash_url) summary can be retrieved. This assumes of course that the info:url meta word has been stored.
- networkGetCrawlItems() : array<string|int, mixed>
- In a multiple queue server setting, gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset). This makes an execMachines call to make a network request to the CrawlController's on each machine which in turn calls getCrawlItems (and thence nonNetworkGetCrawlItems) on each machine. The results are then sent back to networkGetCrawlItems and aggregated.
- nonNetworkGetCrawlItems() : array<string|int, mixed>
- Gets summaries on a particular machine for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset) This may be used in either the single queue_server setting or it may be called indirectly by a particular machine's CrawlController as part of fufilling a network-based getCrawlItems request. $lookups contains items which are to be grouped (as came from same url or site with the same cache). So this function aggregates their descriptions.
- postQueryCallback() : array<string|int, mixed>
- Called after getRows has retrieved all the rows that it would retrieve but before they are returned to give one last place where they could be further manipulated. For example, in MachineModel this callback is used to make parallel network calls to get the status of each machine returned by getRows. The default for this method is to leave the rows that would be returned unchanged
- rowCallback() : array<string|int, mixed>
- Called after as row is retrieved by getRows from the database to perform some manipulation that would be useful for this model.
- searchArrayToWhereOrderClauses() : array<string|int, mixed>
- Creates the WHERE and ORDER BY clauses for a query of a Yioop table such as USERS, ROLE, GROUP, which have associated search web forms. Searches are case insensitive
- selectCallback() : string
- Controls which columns and the names of those columns from the tables underlying the given model should be return from a getRows call.
- translateDb() : mixed
- Used to get the translation of a string_id stored in the database to the given locale.
- updateMediaSource() : mixed
- Used to update the fields stored in a MEDIA_SOURCE row according to an array holding new values
- updateSubsearch() : mixed
- Used to update the fields stored in a SUBSEARCH row according to an array holding new values
- whereCallback() : string
- Controls the WHERE clause of the SQL query that underlies the given model and should be used in a getRows call.
Constants
DEFAULT_DESCRIPTION_LENGTH
Default maximum character length of a search summary
public
mixed
DEFAULT_DESCRIPTION_LENGTH
= 150
MAX_SNIPPET_TITLE_LENGTH
public
mixed
MAX_SNIPPET_TITLE_LENGTH
= 20
MIN_DESCRIPTION_LENGTH
the minimum length of a description before we stop appending additional link doc summaries
public
mixed
MIN_DESCRIPTION_LENGTH
= 100
MIN_SNIPPET_LENGTH
public
mixed
MIN_SNIPPET_LENGTH
= 100
SNIPPET_LENGTH_LEFT
public
mixed
SNIPPET_LENGTH_LEFT
= 20
SNIPPET_LENGTH_RIGHT
public
mixed
SNIPPET_LENGTH_RIGHT
= 40
SNIPPET_TITLE_LENGTH
public
mixed
SNIPPET_TITLE_LENGTH
= 20
Properties
$any_fields
These fields if present in $search_array (used by @see getRows() ), but with value "-1", will be skipped as part of the where clause but will be used for order by clause
public
array<string|int, mixed>
$any_fields
= []
$cache
Cache object to be used if we are doing caching
public
static object
$cache
$current_machine
If known the id of the queue_server this belongs to
public
int
$current_machine
$db
Reference to a DatasourceManager
public
object
$db
$db_name
Name of the search engine database
public
string
$db_name
$edited_page_summaries
Associative array of page summaries which might be used to override default page summaries if set.
public
array<string|int, mixed>
$edited_page_summaries
= null
$index_name
Stores the name of the current index archive to use to get search results from
public
string
$index_name
$private_db
Reference to a private DatasourceManager
public
object
$private_db
$private_db_name
Name of the private search engine database
public
string
$private_db_name
$search_table_column_map
Associations of the form name of field for web forms => database column names/abbreviations
public
array<string|int, mixed>
$search_table_column_map
= []
$web_site
Reference to a WebSite object in use to serve pages (if any)
public
object
$web_site
Methods
__construct()
Sets up the database manager that will be used and name of the search engine database
public
__construct([string $db_name = CDB_NAME ][, bool $connect = true ][, mixed $web_site = null ]) : mixed
Parameters
- $db_name : string = CDB_NAME
-
the name of the database for the search engine
- $connect : bool = true
-
whether to connect to the database by default after making the datasource class
- $web_site : mixed = null
Return values
mixed —addMediaSource()
Used to add a new video, rss, html news, or other sources to Yioop
public
addMediaSource(string $name, string $source_type, string $source_category, string $source_url, string $aux_info[, string $language = CDEFAULT_LOCALE ]) : mixed
Parameters
- $name : string
-
user-friendly name for media source, for example, New York Times
- $source_type : string
-
whether rss, html, json, regex, feed podcast scrape podcast
- $source_category : string
-
whether news, weather, etc.
- $source_url : string
-
url regex of resource (video) or actual resource (rss). Not quite a real regex you add } to the location in the url where the name of the particular video should go http://www.youtube.com/watch?v=}& (anything after & is ignored, so between = and & will be matched as the name of a video)
- $aux_info : string
-
xpaths or regex to scrape news items or podcast feeds
- $language : string = CDEFAULT_LOCALE
-
the locale tag for the media source (rss)
Return values
mixed —addSubsearch()
Adds a new subsearch to the list of subsearches. This are displayed at the top od the Yioop search pages.
public
addSubsearch(string $folder_name, string $index_identifier, int $per_page[, string $default_query = "" ]) : mixed
Parameters
- $folder_name : string
-
name of subsearch in terms of urls (not translated name that appears in the subsearch bar)
- $index_identifier : string
-
timestamp of crawl or mix to be used for results of subsearch
- $per_page : int
-
number of search results per page when this subsearch is used
- $default_query : string = ""
-
query to use when using subsearch if no query provided by user. For example, for image search this might be the empty query, for news it might be lang:default to get all the news for the default locale.
Return values
mixed —boldKeywords()
Given a string, wraps in bold html tags a set of key words it contains.
public
boldKeywords(string $text, array<string|int, mixed> $words) : string
Parameters
- $text : string
-
haystack string to look for the key words
- $words : array<string|int, mixed>
-
an array of words to bold face
Return values
string —the resulting string after boldfacing has been applied
clearFeedData()
Used to delete any feed data (IndexDataFeed bundle) and trending data in this Yioop installation.
public
clearFeedData([array<string|int, mixed> $machine_urls = null ]) : mixed
Parameters
- $machine_urls : array<string|int, mixed> = null
-
a list of machines which are running MediaUpdaters for this instance of Yioop. If empty assume is just the Name Server
Return values
mixed —clearQuerySavePoint()
A save point is used to store to disk a sequence generation-doc-offset pairs of a particular mix query when doing an archive crawl of a crawl mix. This is used so that the mix can remember where it was the next time it is invoked by the web app on the machine in question.
public
clearQuerySavePoint(int $save_timestamp[, array<string|int, mixed> $machine_urls = null ]) : mixed
This function deletes such a save point associated with a timestamp
Parameters
- $save_timestamp : int
-
timestamp of save point to delete
- $machine_urls : array<string|int, mixed> = null
-
machines on which to try to delete savepoint
Return values
mixed —createIfNecessaryDirectory()
Creates a directory and sets it to world permission if it doesn't already exist
public
createIfNecessaryDirectory(string $directory) : int
Parameters
- $directory : string
-
name of directory to create
Return values
int —-1 on failure, 0 if already existed, 1 if created
deleteMediaSource()
Deletes the media source whose id is the given timestamp
public
deleteMediaSource(int $timestamp) : mixed
Parameters
- $timestamp : int
-
of media source to be deleted
Return values
mixed —deleteSubsearch()
Deletes a subsearch from the subsearch table and removes its associated translations
public
deleteSubsearch(string $folder_name) : mixed
Parameters
- $folder_name : string
-
of subsearch to delete
Return values
mixed —execMachines()
This method is invoked by other ParallelModel (@see CrawlModel for examples) methods when they want to have their method performed on an array of other Yioop instances. The results returned can then be aggregated. The invocation sequence is crawlModelMethodA invokes execMachine with a list of urls of other Yioop instances. execMachine makes REST requests of those instances of the given command and optional arguments This request would be handled by a CrawlController which in turn calls crawlModelMethodA on the given Yioop instance, serializes the result and gives it back to execMachine and then back to the originally calling function.
public
execMachines(string $command, array<string|int, mixed> $machine_urls[, string $arg = null ], int $num_machines[, bool $send_specs = false ][, int $fetcher_queue_server_ratio = 1 ]) : array<string|int, mixed>
Parameters
- $command : string
-
the ParallelModel method to invoke on the remote Yioop instances
- $machine_urls : array<string|int, mixed>
-
machines to invoke this command on
- $arg : string = null
-
additional arguments to be passed to the remote machine
- $num_machines : int
-
the integer to be used in calculating partition
- $send_specs : bool = false
-
whether to send the queue_server, num fetcher info for given machine
- $fetcher_queue_server_ratio : int = 1
-
maximum of the number 1 and the number of active fetchers running across all yioop instances currently divided by the number of queue servers
Return values
array<string|int, mixed> —a list of outputs from each machine that was called.
fileGetContents()
Either a wrapper for file_get_contents, or if a WebSite object is being used to serve pages, it reads it in using blocking I/O file_get_contents() and caches it before return its string contents.
public
fileGetContents(string $filename[, bool $force_read = false ]) : string
Note this function assumes that only the web server is performing I/O with this file. filemtime() can be used to see if a file on disk has been changed and then you can use $force_read = true below to force re- reading the file into the cache
Parameters
- $filename : string
-
name of file to get contents of
- $force_read : bool = false
-
whether to force the file to be read from persistent storage rather than the cache
Return values
string —contents of the file given by $filename
filePutContents()
Either a wrapper for file_put_contents, or if a WebSite object is being used to serve pages, writes $data to the persistent file with name $filename. Saves a copy in the RAM cache if there is a copy already there.
public
filePutContents(string $filename, string $data) : mixed
Parameters
- $filename : string
-
name of file to write to persistent storages
- $data : string
-
string of data to store in file
Return values
mixed —formatSinglePageResult()
Given a page summary, extracts snippets which are related to a set of search words. For each snippet, bold faces the search terms, and then creates a new summary array.
public
formatSinglePageResult(array<string|int, mixed> $page[, array<string|int, mixed> $words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
Parameters
- $page : array<string|int, mixed>
-
a single search result summary
- $words : array<string|int, mixed> = null
-
keywords (typically what was searched on)
- $description_length : int = self::DEFAULT_DESCRIPTION_LENGTH
-
length of the description
Return values
array<string|int, mixed> —$page which has been snippified and bold faced
fromCallback()
Controls which tables and the names of tables underlie the given model and should be used in a getRows call As SourceModel is used for both media sources and subsearches.
public
fromCallback([string $args = null ]) : string
The underlying table might be MEDIA_SOURCE or it might be SUBSEARCH. The $args variable is a string which is assumed to say which.
Parameters
- $args : string = null
-
if is "SUBSEARCH" then the SUBSEARCH table will be used by getRows rather than MEDIA_SOURCE.
Return values
string —which table to use
getCrawlItem()
Get a summary of a document based on its url, the active machines and the idnex we want to look up in.
public
getCrawlItem(string $url[, array<string|int, mixed> $machine_urls = null ][, string $index_name = "" ]) : array<string|int, mixed>
Parameters
- $url : string
-
of summary we are trying to look-up
- $machine_urls : array<string|int, mixed> = null
-
an array of urls of yioop queue servers
- $index_name : string = ""
-
timestamp of the index to do the lookup in
Return values
array<string|int, mixed> —summary data of the matching document
getCrawlItems()
Gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset).
public
getCrawlItems(string $lookups[, array<string|int, mixed> $machine_urls = null ][, array<string|int, mixed> $exclude_fields = [] ][, array<string|int, mixed> $format_words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
For Version >=3, indexes offset is the code "PDB" as a look up can done by the first four items.
Parameters
- $lookups : string
-
things whose summaries we are trying to look up
- $machine_urls : array<string|int, mixed> = null
-
an array of urls of yioop queue servers
- $exclude_fields : array<string|int, mixed> = []
-
an array of fields which might be int the crawlItem but which should be excluded from the result. This will make the result smaller and so hopefully faster to transmit
- $format_words : array<string|int, mixed> = null
-
words which should be highlighted in search snippets returned
- $description_length : int = self::DEFAULT_DESCRIPTION_LENGTH
-
length of snippets to be returned for each search result
Return values
array<string|int, mixed> —of summary data for the matching documents
getDbmsList()
Gets a list of all DBMS that work with the search engine
public
getDbmsList() : array<string|int, mixed>
Return values
array<string|int, mixed> —Names of available data sources
getMachineHashUrls()
Receives a request to get machine data for an array of hashes of urls
public
getMachineHashUrls() : a
Return values
a —list of urls of machines used by this instance of yioop for crawling
getMediaCategories()
Returns the media categories of the search sources that have been stored
public
getMediaCategories([array<string|int, mixed> $exclude_categories = [] ][, string $type = "" ]) : array<string|int, mixed>
Parameters
- $exclude_categories : array<string|int, mixed> = []
- $type : string = ""
Return values
array<string|int, mixed> —of arrays distinct ["NAME" => media category, "TYPE" => source_type]
getMediaSource()
Return the media source by the name of the source
public
getMediaSource(string $timestamp) : array<string|int, mixed>
Parameters
- $timestamp : string
-
of the media source to look up
Return values
array<string|int, mixed> —associative array with SOURCE_NAME, TYPE, SOURCE_URL, AUX_INFO, and LANGUAGE
getMediaSources()
Returns a list of media sources such as (video, rss sites) and their URL and thumb url formats, etc
public
getMediaSources([string $source_type = "" ]) : array<string|int, mixed>
Parameters
- $source_type : string = ""
-
the particular kind of media source to return for example, video
Return values
array<string|int, mixed> —a list of web sites which are either video or news sites
getRows()
Gets a range of rows which match the provided search criteria from $th provided table
public
getRows(int $limit, int $num, int &$total[, array<string|int, mixed> $search_array = [] ][, array<string|int, mixed> $args = null ]) : array<string|int, mixed>
Parameters
- $limit : int
-
starting row from the potential results to return
- $num : int
-
number of rows after start row to return
- $total : int
-
gets set with the total number of rows that can be returned by the given database query
- $search_array : array<string|int, mixed> = []
-
each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by
- $args : array<string|int, mixed> = null
-
additional values which may be used to get rows (what these are will typically depend on the subclass implementation)
Return values
array<string|int, mixed> —getSnippets()
Given a string, extracts a snippets of text related to a given set of key words. For a given word a snippet is a window of characters to its left and right that is less than a maximum total number of characters.
public
getSnippets(string $text, array<string|int, mixed> $words, string $description_length) : string
There is also a rule that a snippet should avoid ending in the middle of a word
Parameters
- $text : string
-
haystack to extract snippet from
- $words : array<string|int, mixed>
-
keywords used to make look in haystack
- $description_length : string
-
length of the description desired
Return values
string —a concatenation of the extracted snippets of each word
getSubsearch()
Return the media source by the name of the source
public
getSubsearch(string $folder_name) : array<string|int, mixed>
Parameters
- $folder_name : string
Return values
array<string|int, mixed> —getSubsearches()
Returns a list of the subsearches used by the current Yioop instances including their names translated to the current locale
public
getSubsearches() : array<string|int, mixed>
Return values
array<string|int, mixed> —associative array containing subsearch info name in locale, folder name, index, number of results per page
getSubsearchName()
Given the folder name for a subsearch and a locale tag return the natural language name in that for the subsearch
public
getSubsearchName(string $folder_name, string $locale_tag) : string
Parameters
- $folder_name : string
-
of subsearch want to look up
- $locale_tag : string
-
of language want human understandable subsearch name
Return values
string —natural language name of subsearch
getUserId()
Get the user_id associated with a given username (In base class as used as an internal method in both signin and user models)
public
getUserId(string $username) : string
Parameters
- $username : string
-
the username to look up
Return values
string —the corresponding userid
isSingleLocalhost()
Used to determine if an action involves just one yioop instance on the current local machine or not
public
isSingleLocalhost(array<string|int, mixed> $machine_urls[, string $index_timestamp = -1 ]) : bool
Parameters
- $machine_urls : array<string|int, mixed>
-
urls of yioop instances to which the action applies
- $index_timestamp : string = -1
-
if timestamp exists checks if the index has declared itself to be a no network index.
Return values
bool —whether it involves a single local yioop instance (true) or not (false)
loginDbms()
Returns whether the provided dbms needs a login and password or not (sqlite or sqlite3)
public
loginDbms(string $dbms) : bool
Parameters
- $dbms : string
-
the name of a database management system
Return values
bool —true if needs a login and password; false otherwise
lookupSummaryOffsetGeneration()
Determines the offset into the summaries WebArchiveBundle and generation of the provided url (or hash_url) so that the info:url (info:base64_hash_url) summary can be retrieved. This assumes of course that the info:url meta word has been stored.
public
lookupSummaryOffsetGeneration(string $url_or_key[, string $index_name = "" ][, bool $is_key = false ]) : array<string|int, mixed>
Parameters
- $url_or_key : string
-
either info:base64_hash_url or just a url to lookup
- $index_name : string = ""
-
index into which to do the lookup
- $is_key : bool = false
-
whether the string is info:base64_hash_url or just a url
Return values
array<string|int, mixed> —(offset, generation) into the web archive bundle
networkGetCrawlItems()
In a multiple queue server setting, gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset). This makes an execMachines call to make a network request to the CrawlController's on each machine which in turn calls getCrawlItems (and thence nonNetworkGetCrawlItems) on each machine. The results are then sent back to networkGetCrawlItems and aggregated.
public
networkGetCrawlItems(string $lookups, array<string|int, mixed> $machine_urls[, array<string|int, mixed> $exclude_fields = [] ][, array<string|int, mixed> $format_words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
Parameters
- $lookups : string
-
things whose summaries we are trying to look up
- $machine_urls : array<string|int, mixed>
-
an array of urls of yioop queue servers
- $exclude_fields : array<string|int, mixed> = []
-
an array of fields which might be int the crawlItem but which should be excluded from the result. This will make the result smaller and so hopefully faster to transmit
- $format_words : array<string|int, mixed> = null
-
words which should be highlighted in search snippets returned
- $description_length : int = self::DEFAULT_DESCRIPTION_LENGTH
-
length of snippets to be returned for each search result
Return values
array<string|int, mixed> —of summary data for the matching documents
nonNetworkGetCrawlItems()
Gets summaries on a particular machine for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset) This may be used in either the single queue_server setting or it may be called indirectly by a particular machine's CrawlController as part of fufilling a network-based getCrawlItems request. $lookups contains items which are to be grouped (as came from same url or site with the same cache). So this function aggregates their descriptions.
public
nonNetworkGetCrawlItems(string $lookups[, array<string|int, mixed> $exclude_fields = [] ][, array<string|int, mixed> $format_words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
Parameters
- $lookups : string
-
things whose summaries we are trying to look up
- $exclude_fields : array<string|int, mixed> = []
-
an array of fields which might be in the crawlItem but which should be excluded from the result. This will make the result smaller and so hopefully faster to transmit
- $format_words : array<string|int, mixed> = null
-
words which should be highlighted in search snippets returned
- $description_length : int = self::DEFAULT_DESCRIPTION_LENGTH
-
length of snippets to be returned for each search result
Return values
array<string|int, mixed> —of summary data for the matching documents
postQueryCallback()
Called after getRows has retrieved all the rows that it would retrieve but before they are returned to give one last place where they could be further manipulated. For example, in MachineModel this callback is used to make parallel network calls to get the status of each machine returned by getRows. The default for this method is to leave the rows that would be returned unchanged
public
postQueryCallback(array<string|int, mixed> $rows) : array<string|int, mixed>
Parameters
- $rows : array<string|int, mixed>
-
that have been calculated so far by getRows
Return values
array<string|int, mixed> —$rows after this final manipulation
rowCallback()
Called after as row is retrieved by getRows from the database to perform some manipulation that would be useful for this model.
public
rowCallback(array<string|int, mixed> $row, mixed $args) : array<string|int, mixed>
For example, in CrawlModel, after a row representing a crawl mix has been gotten, this is used to perform an additional query to marshal its components. By default this method just returns this row unchanged.
Parameters
- $row : array<string|int, mixed>
-
row as retrieved from database query
- $args : mixed
-
additional arguments that might be used by this callback
Return values
array<string|int, mixed> —$row after callback manipulation
searchArrayToWhereOrderClauses()
Creates the WHERE and ORDER BY clauses for a query of a Yioop table such as USERS, ROLE, GROUP, which have associated search web forms. Searches are case insensitive
public
searchArrayToWhereOrderClauses(array<string|int, mixed> $search_array[, array<string|int, mixed> $any_fields = ['status'] ]) : array<string|int, mixed>
Parameters
- $search_array : array<string|int, mixed>
-
each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by
- $any_fields : array<string|int, mixed> = ['status']
-
these fields if present in search array but with value "-1" will be skipped as part of the where clause but will be used for order by clause
Return values
array<string|int, mixed> —string for where clause, string for order by clause
selectCallback()
Controls which columns and the names of those columns from the tables underlying the given model should be return from a getRows call.
public
selectCallback([mixed $args = null ]) : string
This defaults to *, but in general will be overridden in subclasses of Model
Parameters
- $args : mixed = null
-
any additional arguments which should be used to determine the columns
Return values
string —a comma separated list of columns suitable for a SQL query
translateDb()
Used to get the translation of a string_id stored in the database to the given locale.
public
translateDb(string $string_id, string $locale_tag) : mixed
Parameters
- $string_id : string
-
id to translate
- $locale_tag : string
-
to translate to
Return values
mixed —translation if found, $string_id, otherwise
updateMediaSource()
Used to update the fields stored in a MEDIA_SOURCE row according to an array holding new values
public
updateMediaSource(array<string|int, mixed> $source_info) : mixed
Parameters
- $source_info : array<string|int, mixed>
-
updated values for a MEDIA_SOURCE row
Return values
mixed —updateSubsearch()
Used to update the fields stored in a SUBSEARCH row according to an array holding new values
public
updateSubsearch(array<string|int, mixed> $search_info) : mixed
Parameters
- $search_info : array<string|int, mixed>
-
updated values for a SUBSEARCH row
Return values
mixed —whereCallback()
Controls the WHERE clause of the SQL query that underlies the given model and should be used in a getRows call.
public
whereCallback([mixed $args = null ]) : string
This defaults to an empty WHERE clause.
Parameters
- $args : mixed = null
-
additional arguments that might be used to construct the WHERE clause.
Return values
string —a SQL WHERE clause