Yioop_V9.5_Source_Code_Documentation

PhraseModel extends ParallelModel
in package

This is class is used to handle results for a given phrase search

Tags
author

Chris Pollett

Table of Contents

DEFAULT_DESCRIPTION_LENGTH  = 150
Default maximum character length of a search summary
INFO_HASH_LEN  = 16
Length of info hash record phrase
MAX_SNIPPET_TITLE_LENGTH  = 20
MIN_DESCRIPTION_LENGTH  = 100
the minimum length of a description before we stop appending additional link doc summaries
MIN_SNIPPET_LENGTH  = 100
SNIPPET_LENGTH_LEFT  = 20
SNIPPET_LENGTH_RIGHT  = 40
SNIPPET_TITLE_LENGTH  = 20
$additional_meta_words  : array<string|int, mixed>
an associative array of additional meta words and the max description length of results if such a meta word is used this array is typically set in index.php
$any_fields  : array<string|int, mixed>
These fields if present in $search_array (used by @see getRows() ), but with value "-1", will be skipped as part of the where clause but will be used for order by clause
$cache  : object
Cache object to be used if we are doing caching
$current_machine  : int
If known the id of the queue_server this belongs to
$db  : object
Reference to a DatasourceManager
$db_name  : string
Name of the search engine database
$edited_page_summaries  : array<string|int, mixed>
Associative array of page summaries which might be used to override default page summaries if set.
$index_name  : string
Stores the name of the current index archive to use to get search results from
$private_db  : object
Reference to a private DatasourceManager
$private_db_name  : string
Name of the private search engine database
$program_indicator  : string
A indicator to indicate source code files
$programming_language_map  : string
Used to hold extension of programming language which is used the language
$query_info  : array<string|int, mixed>
Used to hold query statistics about the current query
$search_table_column_map  : array<string|int, mixed>
Associations of the form name of field for web forms => database column names/abbreviations
$web_site  : object
Reference to a WebSite object in use to serve pages (if any)
__construct()  : mixed
Sets up the database manager that will be used and name of the search engine database
beginMatch()  : string
Matches terms (non white-char strings) in the language $lang_tag in $phrase that begin with $start_with and don't contain $not_contain, replaces $start_with with $new_prefix and adds $suffix to the end
boldKeywords()  : string
Given a string, wraps in bold html tags a set of key words it contains.
clearQuerySavePoint()  : mixed
A save point is used to store to disk a sequence generation-doc-offset pairs of a particular mix query when doing an archive crawl of a crawl mix. This is used so that the mix can remember where it was the next time it is invoked by the web app on the machine in question.
createIfNecessaryDirectory()  : int
Creates a directory and sets it to world permission if it doesn't already exist
endMatch()  : string
Matches terms (non white-char strings) in the language $lang_tag in $phrase that end with $end_with and don't contain $not_contain, replaces $end_with with $new_suffix (if not empty) and adds $prefix to the beginning
execMachines()  : array<string|int, mixed>
This method is invoked by other ParallelModel (@see CrawlModel for examples) methods when they want to have their method performed on an array of other Yioop instances. The results returned can then be aggregated. The invocation sequence is crawlModelMethodA invokes execMachine with a list of urls of other Yioop instances. execMachine makes REST requests of those instances of the given command and optional arguments This request would be handled by a CrawlController which in turn calls crawlModelMethodA on the given Yioop instance, serializes the result and gives it back to execMachine and then back to the originally calling function.
extractMetaWordInfo()  : array<string|int, mixed>
Given a query string, this method extracts meta words, which of these are "materialized" (i.e., should be encoded as part of word ids), disallowed phrases, the query string after meta words removed and ampersand substitution applied, the query string with meta words but apersand substitution applied, the index and the weights found as part of the query string. Finally, it extracts the locale_tag for the query
fileGetContents()  : string
Either a wrapper for file_get_contents, or if a WebSite object is being used to serve pages, it reads it in using blocking I/O file_get_contents() and caches it before return its string contents.
filePutContents()  : mixed
Either a wrapper for file_put_contents, or if a WebSite object is being used to serve pages, writes $data to the persistent file with name $filename. Saves a copy in the RAM cache if there is a copy already there.
formatSinglePageResult()  : array<string|int, mixed>
Given a page summary, extracts snippets which are related to a set of search words. For each snippet, bold faces the search terms, and then creates a new summary array.
fromCallback()  : string
Controls which tables and the names of tables underlie the given model and should be used in a getRows call This defaults to the single table whose name is whatever is before Model in the name of the model. For example, by default on FooModel this method would return "FOO". If a different behavior, this can be overridden in subclasses of Model
getCrawlItem()  : array<string|int, mixed>
Get a summary of a document based on its url, the active machines and the idnex we want to look up in.
getCrawlItems()  : array<string|int, mixed>
Gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset).
getDbmsList()  : array<string|int, mixed>
Gets a list of all DBMS that work with the search engine
getPhrasePageResults()  : array<string|int, mixed>
Given a query phrase, returns formatted document summaries of the documents that match the phrase.
getQueryIterator()  : object
Using the supplied $word_structs, constructs an iterator for getting results to a query
getRows()  : array<string|int, mixed>
Gets a range of rows which match the provided search criteria from $th provided table
getSnippets()  : string
Given a string, extracts a snippets of text related to a given set of key words. For a given word a snippet is a window of characters to its left and right that is less than a maximum total number of characters.
getSummariesByHash()  : array<string|int, mixed>
Gets doc summaries of documents containing given words and meeting the additional provided criteria
getSummariesFromOffsets()  : array<string|int, mixed>
Used to lookup summary info for the pages provided (using their) self::SUMMARY_OFFSET field. If any of the lookup-ed summaries are HTTP Location redirect page's then looks these up in turn.
getUserId()  : string
Get the user_id associated with a given username (In base class as used as an internal method in both signin and user models)
guessSemantics()  : string
Ideally, this function tries to guess from the query what the user is looking for. For now, we are just doing simple things like when a query term is a url and rewriting it to the appropriate meta meta word.
indexExists()  : bool
Returns whether there is a index with the provide timestamp
isSingleLocalhost()  : bool
Used to determine if an action involves just one yioop instance on the current local machine or not
loginDbms()  : bool
Returns whether the provided dbms needs a login and password or not (sqlite or sqlite3)
lookupSummaryOffsetGeneration()  : array<string|int, mixed>
Determines the offset into the summaries WebArchiveBundle and generation of the provided url (or hash_url) so that the info:url (info:base64_hash_url) summary can be retrieved. This assumes of course that the info:url meta word has been stored.
networkGetCrawlItems()  : array<string|int, mixed>
In a multiple queue server setting, gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset). This makes an execMachines call to make a network request to the CrawlController's on each machine which in turn calls getCrawlItems (and thence nonNetworkGetCrawlItems) on each machine. The results are then sent back to networkGetCrawlItems and aggregated.
nonNetworkGetCrawlItems()  : array<string|int, mixed>
Gets summaries on a particular machine for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset) This may be used in either the single queue_server setting or it may be called indirectly by a particular machine's CrawlController as part of fufilling a network-based getCrawlItems request. $lookups contains items which are to be grouped (as came from same url or site with the same cache). So this function aggregates their descriptions.
parseIfConditions()  : string
Evaluates any if: conditional meta-words in the query string to calculate a new query string.
parseWordStructConjunctiveQuery()  : array<string|int, mixed>
Parses from a string phrase representing a conjunctive query, a struct consisting of the words keys searched for, the allowed and disallowed phrases, the weight that should be put on these query results, and which archive to use.
postQueryCallback()  : array<string|int, mixed>
Called after getRows has retrieved all the rows that it would retrieve but before they are returned to give one last place where they could be further manipulated. For example, in MachineModel this callback is used to make parallel network calls to get the status of each machine returned by getRows. The default for this method is to leave the rows that would be returned unchanged
rewriteMixQuery()  : string
Rewrites a mix query so that it maps directly to a query about crawls
rowCallback()  : array<string|int, mixed>
Called after as row is retrieved by getRows from the database to perform some manipulation that would be useful for this model.
searchArrayToWhereOrderClauses()  : array<string|int, mixed>
Creates the WHERE and ORDER BY clauses for a query of a Yioop table such as USERS, ROLE, GROUP, which have associated search web forms. Searches are case insensitive
selectCallback()  : string
Controls which columns and the names of those columns from the tables underlying the given model should be return from a getRows call.
translateDb()  : mixed
Used to get the translation of a string_id stored in the database to the given locale.
whereCallback()  : string
Controls the WHERE clause of the SQL query that underlies the given model and should be used in a getRows call.

Constants

DEFAULT_DESCRIPTION_LENGTH

Default maximum character length of a search summary

public mixed DEFAULT_DESCRIPTION_LENGTH = 150

INFO_HASH_LEN

Length of info hash record phrase

public mixed INFO_HASH_LEN = 16

MAX_SNIPPET_TITLE_LENGTH

public mixed MAX_SNIPPET_TITLE_LENGTH = 20

MIN_DESCRIPTION_LENGTH

the minimum length of a description before we stop appending additional link doc summaries

public mixed MIN_DESCRIPTION_LENGTH = 100

MIN_SNIPPET_LENGTH

public mixed MIN_SNIPPET_LENGTH = 100

SNIPPET_LENGTH_LEFT

public mixed SNIPPET_LENGTH_LEFT = 20

SNIPPET_LENGTH_RIGHT

public mixed SNIPPET_LENGTH_RIGHT = 40

SNIPPET_TITLE_LENGTH

public mixed SNIPPET_TITLE_LENGTH = 20

Properties

$additional_meta_words

an associative array of additional meta words and the max description length of results if such a meta word is used this array is typically set in index.php

public array<string|int, mixed> $additional_meta_words

$any_fields

These fields if present in $search_array (used by @see getRows() ), but with value "-1", will be skipped as part of the where clause but will be used for order by clause

public array<string|int, mixed> $any_fields = []

$cache

Cache object to be used if we are doing caching

public static object $cache

$current_machine

If known the id of the queue_server this belongs to

public int $current_machine

$db

Reference to a DatasourceManager

public object $db

$db_name

Name of the search engine database

public string $db_name

$edited_page_summaries

Associative array of page summaries which might be used to override default page summaries if set.

public array<string|int, mixed> $edited_page_summaries = null

$index_name

Stores the name of the current index archive to use to get search results from

public string $index_name

$private_db

Reference to a private DatasourceManager

public object $private_db

$private_db_name

Name of the private search engine database

public string $private_db_name

$program_indicator

A indicator to indicate source code files

public string $program_indicator

$programming_language_map

Used to hold extension of programming language which is used the language

public string $programming_language_map

$query_info

Used to hold query statistics about the current query

public array<string|int, mixed> $query_info

$search_table_column_map

Associations of the form name of field for web forms => database column names/abbreviations

public array<string|int, mixed> $search_table_column_map = []

$web_site

Reference to a WebSite object in use to serve pages (if any)

public object $web_site

Methods

__construct()

Sets up the database manager that will be used and name of the search engine database

public __construct([string $db_name = CDB_NAME ][, bool $connect = true ]) : mixed
Parameters
$db_name : string = CDB_NAME

the name of the database for the search engine

$connect : bool = true

whether to connect to the database by default after making the datasource class

Return values
mixed

beginMatch()

Matches terms (non white-char strings) in the language $lang_tag in $phrase that begin with $start_with and don't contain $not_contain, replaces $start_with with $new_prefix and adds $suffix to the end

public beginMatch(string $phrase, string $start_with, string $new_prefix[, string $suffix = "" ][, string $not_contains = [] ][, string $lang_tag = "en-US" ]) : string
Parameters
$phrase : string

string to look for terms in

$start_with : string

what we're looking to see if term begins with

$new_prefix : string

what to change $start_with to

$suffix : string = ""

what to tack on to the end of the term if there is a match

$not_contains : string = []

string match is not allowed to contain

$lang_tag : string = "en-US"

what language the phrase must be in for the rule to apply

Return values
string

$phrase after modifications have been made

boldKeywords()

Given a string, wraps in bold html tags a set of key words it contains.

public boldKeywords(string $text, array<string|int, mixed> $words) : string
Parameters
$text : string

haystack string to look for the key words

$words : array<string|int, mixed>

an array of words to bold face

Return values
string

the resulting string after boldfacing has been applied

clearQuerySavePoint()

A save point is used to store to disk a sequence generation-doc-offset pairs of a particular mix query when doing an archive crawl of a crawl mix. This is used so that the mix can remember where it was the next time it is invoked by the web app on the machine in question.

public clearQuerySavePoint(int $save_timestamp[, array<string|int, mixed> $machine_urls = null ]) : mixed

This function deletes such a save point associated with a timestamp

Parameters
$save_timestamp : int

timestamp of save point to delete

$machine_urls : array<string|int, mixed> = null

machines on which to try to delete savepoint

Return values
mixed

createIfNecessaryDirectory()

Creates a directory and sets it to world permission if it doesn't already exist

public createIfNecessaryDirectory(string $directory) : int
Parameters
$directory : string

name of directory to create

Return values
int

-1 on failure, 0 if already existed, 1 if created

endMatch()

Matches terms (non white-char strings) in the language $lang_tag in $phrase that end with $end_with and don't contain $not_contain, replaces $end_with with $new_suffix (if not empty) and adds $prefix to the beginning

public endMatch(string $phrase, string $end_with, string $prefix[, string $new_suffix = "" ][, string $not_contains = [] ][, string $lang_tag = "en-US" ]) : string
Parameters
$phrase : string

string to look for terms in

$end_with : string

what we're looking to see if term ends with

$prefix : string

what to tack on to the start if there is a match

$new_suffix : string = ""

what to change $end_with to

$not_contains : string = []

string match is not allowed to contain

$lang_tag : string = "en-US"

what language the phrase must be in for the rule to apply

Return values
string

$phrase after modifications have been made

execMachines()

This method is invoked by other ParallelModel (@see CrawlModel for examples) methods when they want to have their method performed on an array of other Yioop instances. The results returned can then be aggregated. The invocation sequence is crawlModelMethodA invokes execMachine with a list of urls of other Yioop instances. execMachine makes REST requests of those instances of the given command and optional arguments This request would be handled by a CrawlController which in turn calls crawlModelMethodA on the given Yioop instance, serializes the result and gives it back to execMachine and then back to the originally calling function.

public execMachines(string $command, array<string|int, mixed> $machine_urls[, string $arg = null ], int $num_machines[, bool $send_specs = false ][, int $fetcher_queue_server_ratio = 1 ]) : array<string|int, mixed>
Parameters
$command : string

the ParallelModel method to invoke on the remote Yioop instances

$machine_urls : array<string|int, mixed>

machines to invoke this command on

$arg : string = null

additional arguments to be passed to the remote machine

$num_machines : int

the integer to be used in calculating partition

$send_specs : bool = false

whether to send the queue_server, num fetcher info for given machine

$fetcher_queue_server_ratio : int = 1

maximum of the number 1 and the number of active fetchers running across all yioop instances currently divided by the number of queue servers

Return values
array<string|int, mixed>

a list of outputs from each machine that was called.

extractMetaWordInfo()

Given a query string, this method extracts meta words, which of these are "materialized" (i.e., should be encoded as part of word ids), disallowed phrases, the query string after meta words removed and ampersand substitution applied, the query string with meta words but apersand substitution applied, the index and the weights found as part of the query string. Finally, it extracts the locale_tag for the query

public extractMetaWordInfo(string $phrase) : array<string|int, mixed>
Parameters
$phrase : string

the query string

Return values
array<string|int, mixed>

containing items listed above in the description of this method

fileGetContents()

Either a wrapper for file_get_contents, or if a WebSite object is being used to serve pages, it reads it in using blocking I/O file_get_contents() and caches it before return its string contents.

public fileGetContents(string $filename[, bool $force_read = false ]) : string

Note this function assumes that only the web server is performing I/O with this file. filemtime() can be used to see if a file on disk has been changed and then you can use $force_read = true below to force re- reading the file into the cache

Parameters
$filename : string

name of file to get contents of

$force_read : bool = false

whether to force the file to be read from persistent storage rather than the cache

Return values
string

contents of the file given by $filename

filePutContents()

Either a wrapper for file_put_contents, or if a WebSite object is being used to serve pages, writes $data to the persistent file with name $filename. Saves a copy in the RAM cache if there is a copy already there.

public filePutContents(string $filename, string $data) : mixed
Parameters
$filename : string

name of file to write to persistent storages

$data : string

string of data to store in file

Return values
mixed

formatSinglePageResult()

Given a page summary, extracts snippets which are related to a set of search words. For each snippet, bold faces the search terms, and then creates a new summary array.

public formatSinglePageResult(array<string|int, mixed> $page[, array<string|int, mixed> $words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
Parameters
$page : array<string|int, mixed>

a single search result summary

$words : array<string|int, mixed> = null

keywords (typically what was searched on)

$description_length : int = self::DEFAULT_DESCRIPTION_LENGTH

length of the description

Return values
array<string|int, mixed>

$page which has been snippified and bold faced

fromCallback()

Controls which tables and the names of tables underlie the given model and should be used in a getRows call This defaults to the single table whose name is whatever is before Model in the name of the model. For example, by default on FooModel this method would return "FOO". If a different behavior, this can be overridden in subclasses of Model

public fromCallback([mixed $args = null ]) : string
Parameters
$args : mixed = null

any additional arguments which should be used to determine these tables

Return values
string

a comma separated list of tables suitable for a SQL query

getCrawlItem()

Get a summary of a document based on its url, the active machines and the idnex we want to look up in.

public getCrawlItem(string $url[, array<string|int, mixed> $machine_urls = null ][, string $index_name = "" ]) : array<string|int, mixed>
Parameters
$url : string

of summary we are trying to look-up

$machine_urls : array<string|int, mixed> = null

an array of urls of yioop queue servers

$index_name : string = ""

timestamp of the index to do the lookup in

Return values
array<string|int, mixed>

summary data of the matching document

getCrawlItems()

Gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset).

public getCrawlItems(string $lookups[, array<string|int, mixed> $machine_urls = null ][, array<string|int, mixed> $exclude_fields = [] ][, array<string|int, mixed> $format_words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>

For Version >=3, indexes offset is the code "PDB" as a look up can done by the first four items.

Parameters
$lookups : string

things whose summaries we are trying to look up

$machine_urls : array<string|int, mixed> = null

an array of urls of yioop queue servers

$exclude_fields : array<string|int, mixed> = []

an array of fields which might be int the crawlItem but which should be excluded from the result. This will make the result smaller and so hopefully faster to transmit

$format_words : array<string|int, mixed> = null

words which should be highlighted in search snippets returned

$description_length : int = self::DEFAULT_DESCRIPTION_LENGTH

length of snippets to be returned for each search result

Return values
array<string|int, mixed>

of summary data for the matching documents

getDbmsList()

Gets a list of all DBMS that work with the search engine

public getDbmsList() : array<string|int, mixed>
Return values
array<string|int, mixed>

Names of available data sources

getPhrasePageResults()

Given a query phrase, returns formatted document summaries of the documents that match the phrase.

public getPhrasePageResults(string $input_phrase, int $low[, int $results_per_page = CNUM_RESULTS_PER_PAGE ][, bool $format = true ][, SearchfiltersModel $filter = null ][, bool $use_cache_if_allowed = true ], int $raw[, array<string|int, mixed> $queue_servers = [] ][, bool $guess_semantics = true ], int $save_timestamp[, array<string|int, mixed> $ranking_factors = [] ]) : array<string|int, mixed>
Parameters
$input_phrase : string

the phrase to try to match

$low : int

return results beginning with the $low document

$results_per_page : int = CNUM_RESULTS_PER_PAGE

how many results to return

$format : bool = true

whether to highlight in the returned summaries the matched text

$filter : SearchfiltersModel = null

Model responsible for keeping track of edited and deleted search results

$use_cache_if_allowed : bool = true

if true and USE_CACHE is true then an attempt will be made to look up the results in the file cache. Otherwise, items will be recomputed and then potentially restored in cache

$raw : int

($raw == 0) normal grouping, ($raw == 1) no grouping done on data also no summaries returned (only lookup info), $raw > 1 return summaries but no grouping

$queue_servers : array<string|int, mixed> = []

a list of urls of yioop machines which might be used during lookup

$guess_semantics : bool = true

whether to do query rewriting before lookup

$save_timestamp : int

if this timestamp is nonzero, then save iterate position, so can resume on future queries that make use of the timestamp

$ranking_factors : array<string|int, mixed> = []

field say how url, keywords, and title words should influence relevance and doc rank calculations

Return values
array<string|int, mixed>

an array of summary data

getQueryIterator()

Using the supplied $word_structs, constructs an iterator for getting results to a query

public getQueryIterator(array<string|int, mixed> $word_structs, SearchfiltersModel $filter, int $raw, int &$to_retrieve[, array<string|int, mixed> $queue_servers = [] ][, string $original_query = "" ][, string $save_timestamp_name = "" ][, array<string|int, mixed> $ranking_factors = [] ]) : object
Parameters
$word_structs : array<string|int, mixed>

an array of word_structs. Here a word_struct is an associative array with at least the following fields KEYS -- an array of word keys QUOTE_POSITIONS -- an array of positions of words that appreared in quotes (so need to be matched exactly) DISALLOW_PHRASES -- an array of words the document must not contain WEIGHT -- a weight to multiple scores returned from this iterator by INDEX_NAME -- an index timestamp to get results from

$filter : SearchfiltersModel

Model responsible for keeping track of edited and deleted search results

$raw : int

($raw == 0) normal grouping, ($raw == 1) no grouping done on data also no summaries returned (only lookup info), $raw > 1 return summaries but no grouping

$to_retrieve : int

number of items to retrieve from location in iterator

$queue_servers : array<string|int, mixed> = []

a list of urls of yioop machines which might be used during lookup

$original_query : string = ""

if set, the original query that corresponds to $word_structs

$save_timestamp_name : string = ""

if this timestamp is non empty, then when making iterator get sub-iterators to advance to gen doc_offset stored with respect to save_timestamp if exists.

$ranking_factors : array<string|int, mixed> = []

field say how url, keywords, and title words should influence relevance and doc rank calculations

Return values
object

an iterator for iterating through results to the query

getRows()

Gets a range of rows which match the provided search criteria from $th provided table

public getRows(int $limit, int $num, int &$total[, array<string|int, mixed> $search_array = [] ][, array<string|int, mixed> $args = null ]) : array<string|int, mixed>
Parameters
$limit : int

starting row from the potential results to return

$num : int

number of rows after start row to return

$total : int

gets set with the total number of rows that can be returned by the given database query

$search_array : array<string|int, mixed> = []

each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by

$args : array<string|int, mixed> = null

additional values which may be used to get rows (what these are will typically depend on the subclass implementation)

Return values
array<string|int, mixed>

getSnippets()

Given a string, extracts a snippets of text related to a given set of key words. For a given word a snippet is a window of characters to its left and right that is less than a maximum total number of characters.

public getSnippets(string $text, array<string|int, mixed> $words, string $description_length) : string

There is also a rule that a snippet should avoid ending in the middle of a word

Parameters
$text : string

haystack to extract snippet from

$words : array<string|int, mixed>

keywords used to make look in haystack

$description_length : string

length of the description desired

Return values
string

a concatenation of the extracted snippets of each word

getSummariesByHash()

Gets doc summaries of documents containing given words and meeting the additional provided criteria

public getSummariesByHash(array<string|int, mixed> $word_structs, int $limit, int $num, SearchfiltersModel $filter[, bool $use_cache_if_allowed = true ], int $raw[, array<string|int, mixed> $queue_servers = [] ][, string $original_query = "" ][, string $save_timestamp_name = "" ][, array<string|int, mixed> $format_words = null ][, array<string|int, mixed> $ranking_factors = [] ]) : array<string|int, mixed>
Parameters
$word_structs : array<string|int, mixed>

an array of word_structs. Here a word_struct is an associative array with at least the following fields KEYS -- an array of word keys QUOTE_POSITIONS -- an array of positions of words that appeared in quotes (so need to be matched exactly) DISALLOW_PHRASES -- an array of words the document must not contain WEIGHT -- a weight to multiple scores returned from this iterator by INDEX_NAME -- an index timestamp to get results from

$limit : int

number of first document in order to return

$num : int

number of documents to return summaries of

$filter : SearchfiltersModel

Model responsible for keeping track of edited and deleted search results

$use_cache_if_allowed : bool = true

if true and USE_CACHE is true then an attempt will be made to look up the results in the file cache. Otherwise, items will be recomputed and then potentially restored in cache

$raw : int

($raw == 0) normal grouping, ($raw > 0) no grouping done on data. if ($raw == 1) no lookups of summaries done

$queue_servers : array<string|int, mixed> = []

a list of urls of yioop machines which might be used during lookup

$original_query : string = ""

if set, the original query that corresponds to $word_structs

$save_timestamp_name : string = ""

if this timestamp is not empty, then save iterate position, so can resume on future queries that make use of the timestamp. If used then $limit ignored and get next $num docs after $save_timestamp 's previous iterate position.

$format_words : array<string|int, mixed> = null

words which should be highlighted in search snippets returned

$ranking_factors : array<string|int, mixed> = []

field say how url, keywords, and title words should influence relevance and doc rank calculations

Return values
array<string|int, mixed>

document summaries

getSummariesFromOffsets()

Used to lookup summary info for the pages provided (using their) self::SUMMARY_OFFSET field. If any of the lookup-ed summaries are HTTP Location redirect page's then looks these up in turn.

public getSummariesFromOffsets(array<string|int, mixed> &$pages, array<string|int, mixed> &$queue_servers, int $raw, bool $groups_with_docs, bool $with_question_answer_info[, array<string|int, mixed> $format_words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>

This method handles robot meta tags which might forbid indexing.

Parameters
$pages : array<string|int, mixed>

of page data without text summaries

$queue_servers : array<string|int, mixed>

array of queue server to find data on

$raw : int

only lookup locations if 0

$groups_with_docs : bool

whether to return only groups that contain at least one doc as opposed to a groups with only links

$with_question_answer_info : bool

whether question answer info in summaries needs to be returned

$format_words : array<string|int, mixed> = null

words which should be highlighted in search snippets returned

$description_length : int = self::DEFAULT_DESCRIPTION_LENGTH

length of snippets to be returned for each search result

Return values
array<string|int, mixed>

pages with summaries added

getUserId()

Get the user_id associated with a given username (In base class as used as an internal method in both signin and user models)

public getUserId(string $username) : string
Parameters
$username : string

the username to look up

Return values
string

the corresponding userid

guessSemantics()

Ideally, this function tries to guess from the query what the user is looking for. For now, we are just doing simple things like when a query term is a url and rewriting it to the appropriate meta meta word.

public guessSemantics(string $phrase) : string
Parameters
$phrase : string

input query to guess semantics of

Return values
string

a phrase that more closely matches the intentions of the query.

indexExists()

Returns whether there is a index with the provide timestamp

public indexExists(int $index_time_stamp) : bool
Parameters
$index_time_stamp : int

timestamp of the index to check if in cache

Return values
bool

whether it exists or not

isSingleLocalhost()

Used to determine if an action involves just one yioop instance on the current local machine or not

public isSingleLocalhost(array<string|int, mixed> $machine_urls[, string $index_timestamp = -1 ]) : bool
Parameters
$machine_urls : array<string|int, mixed>

urls of yioop instances to which the action applies

$index_timestamp : string = -1

if timestamp exists checks if the index has declared itself to be a no network index.

Return values
bool

whether it involves a single local yioop instance (true) or not (false)

loginDbms()

Returns whether the provided dbms needs a login and password or not (sqlite or sqlite3)

public loginDbms(string $dbms) : bool
Parameters
$dbms : string

the name of a database management system

Return values
bool

true if needs a login and password; false otherwise

lookupSummaryOffsetGeneration()

Determines the offset into the summaries WebArchiveBundle and generation of the provided url (or hash_url) so that the info:url (info:base64_hash_url) summary can be retrieved. This assumes of course that the info:url meta word has been stored.

public lookupSummaryOffsetGeneration(string $url_or_key[, string $index_name = "" ][, bool $is_key = false ]) : array<string|int, mixed>
Parameters
$url_or_key : string

either info:base64_hash_url or just a url to lookup

$index_name : string = ""

index into which to do the lookup

$is_key : bool = false

whether the string is info:base64_hash_url or just a url

Return values
array<string|int, mixed>

(offset, generation) into the web archive bundle

networkGetCrawlItems()

In a multiple queue server setting, gets summaries for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset). This makes an execMachines call to make a network request to the CrawlController's on each machine which in turn calls getCrawlItems (and thence nonNetworkGetCrawlItems) on each machine. The results are then sent back to networkGetCrawlItems and aggregated.

public networkGetCrawlItems(string $lookups, array<string|int, mixed> $machine_urls[, array<string|int, mixed> $exclude_fields = [] ][, array<string|int, mixed> $format_words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
Parameters
$lookups : string

things whose summaries we are trying to look up

$machine_urls : array<string|int, mixed>

an array of urls of yioop queue servers

$exclude_fields : array<string|int, mixed> = []

an array of fields which might be int the crawlItem but which should be excluded from the result. This will make the result smaller and so hopefully faster to transmit

$format_words : array<string|int, mixed> = null

words which should be highlighted in search snippets returned

$description_length : int = self::DEFAULT_DESCRIPTION_LENGTH

length of snippets to be returned for each search result

Return values
array<string|int, mixed>

of summary data for the matching documents

nonNetworkGetCrawlItems()

Gets summaries on a particular machine for a set of document by their url, or by group of 5-tuples of the form (machine, key, index, generation, offset) This may be used in either the single queue_server setting or it may be called indirectly by a particular machine's CrawlController as part of fufilling a network-based getCrawlItems request. $lookups contains items which are to be grouped (as came from same url or site with the same cache). So this function aggregates their descriptions.

public nonNetworkGetCrawlItems(string $lookups[, array<string|int, mixed> $exclude_fields = [] ][, array<string|int, mixed> $format_words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
Parameters
$lookups : string

things whose summaries we are trying to look up

$exclude_fields : array<string|int, mixed> = []

an array of fields which might be in the crawlItem but which should be excluded from the result. This will make the result smaller and so hopefully faster to transmit

$format_words : array<string|int, mixed> = null

words which should be highlighted in search snippets returned

$description_length : int = self::DEFAULT_DESCRIPTION_LENGTH

length of snippets to be returned for each search result

Return values
array<string|int, mixed>

of summary data for the matching documents

parseIfConditions()

Evaluates any if: conditional meta-words in the query string to calculate a new query string.

public parseIfConditions(string $phrase) : string
Parameters
$phrase : string

original query string

Return values
string

query string after if: meta words have been evaluated

parseWordStructConjunctiveQuery()

Parses from a string phrase representing a conjunctive query, a struct consisting of the words keys searched for, the allowed and disallowed phrases, the weight that should be put on these query results, and which archive to use.

public parseWordStructConjunctiveQuery(string &$phrase) : array<string|int, mixed>
Parameters
$phrase : string

string to extract struct from, if the phrase semantics is guessed or an if condition is processed the value of phrase will be altered. (Helps for feeding to network queries)

Return values
array<string|int, mixed>

struct representing the conjunctive query

postQueryCallback()

Called after getRows has retrieved all the rows that it would retrieve but before they are returned to give one last place where they could be further manipulated. For example, in MachineModel this callback is used to make parallel network calls to get the status of each machine returned by getRows. The default for this method is to leave the rows that would be returned unchanged

public postQueryCallback(array<string|int, mixed> $rows) : array<string|int, mixed>
Parameters
$rows : array<string|int, mixed>

that have been calculated so far by getRows

Return values
array<string|int, mixed>

$rows after this final manipulation

rewriteMixQuery()

Rewrites a mix query so that it maps directly to a query about crawls

public rewriteMixQuery(string $query, object $mix) : string
Parameters
$query : string

the original before a rewrite

$mix : object

a mix object saying how the mix is built out of crawls

Return values
string

a rewritten query in terms of crawls

rowCallback()

Called after as row is retrieved by getRows from the database to perform some manipulation that would be useful for this model.

public rowCallback(array<string|int, mixed> $row, mixed $args) : array<string|int, mixed>

For example, in CrawlModel, after a row representing a crawl mix has been gotten, this is used to perform an additional query to marshal its components. By default this method just returns this row unchanged.

Parameters
$row : array<string|int, mixed>

row as retrieved from database query

$args : mixed

additional arguments that might be used by this callback

Return values
array<string|int, mixed>

$row after callback manipulation

searchArrayToWhereOrderClauses()

Creates the WHERE and ORDER BY clauses for a query of a Yioop table such as USERS, ROLE, GROUP, which have associated search web forms. Searches are case insensitive

public searchArrayToWhereOrderClauses(array<string|int, mixed> $search_array[, array<string|int, mixed> $any_fields = ['status'] ]) : array<string|int, mixed>
Parameters
$search_array : array<string|int, mixed>

each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by

$any_fields : array<string|int, mixed> = ['status']

these fields if present in search array but with value "-1" will be skipped as part of the where clause but will be used for order by clause

Return values
array<string|int, mixed>

string for where clause, string for order by clause

selectCallback()

Controls which columns and the names of those columns from the tables underlying the given model should be return from a getRows call.

public selectCallback([mixed $args = null ]) : string

This defaults to *, but in general will be overridden in subclasses of Model

Parameters
$args : mixed = null

any additional arguments which should be used to determine the columns

Return values
string

a comma separated list of columns suitable for a SQL query

translateDb()

Used to get the translation of a string_id stored in the database to the given locale.

public translateDb(string $string_id, string $locale_tag) : mixed
Parameters
$string_id : string

id to translate

$locale_tag : string

to translate to

Return values
mixed

translation if found, $string_id, otherwise

whereCallback()

Controls the WHERE clause of the SQL query that underlies the given model and should be used in a getRows call.

public whereCallback([mixed $args = null ]) : string

This defaults to an empty WHERE clause.

Parameters
$args : mixed = null

additional arguments that might be used to construct the WHERE clause.

Return values
string

a SQL WHERE clause


        

Search results