Yioop_V9.5_Source_Code_Documentation

ImpressionModel extends Model
in package

Model used to keep track for analytic and user experience activities that users carry out on a Yioop web site. For analytics things that might tracked are wiki page views, queries, query outcomes. For UX things that the impression model allows is to keep track of recent group a user has visited to provide better bread crumb drop downs, make the manage account landing page list more relevant groups, determine start of whether a media item has been watched, completely watched, etc.

In terms of how things are implemented in the database. The tables ITEM_IMPRESSION and ITEM_IMPRESSION_SUMMARY contain the raw statistics of activities. If differential privacy is in use then ITEM_IMPRESSION_STAT keeps track of fuzzified statistics.

Tags
author

Chris Pollett

Table of Contents

DEFAULT_DESCRIPTION_LENGTH  = 150
Default maximum character length of a search summary
MAX_SNIPPET_TITLE_LENGTH  = 20
MIN_SNIPPET_LENGTH  = 100
SNIPPET_LENGTH_LEFT  = 20
SNIPPET_LENGTH_RIGHT  = 40
SNIPPET_TITLE_LENGTH  = 20
$any_fields  : array<string|int, mixed>
These fields if present in $search_array (used by @see getRows() ), but with value "-1", will be skipped as part of the where clause but will be used for order by clause
$cache  : object
Cache object to be used if we are doing caching
$db  : object
Reference to a DatasourceManager
$db_name  : string
Name of the search engine database
$edited_page_summaries  : array<string|int, mixed>
Associative array of page summaries which might be used to override default page summaries if set.
$private_db  : object
Reference to a private DatasourceManager
$private_db_name  : string
Name of the private search engine database
$search_table_column_map  : array<string|int, mixed>
Associations of the form name of field for web forms => database column names/abbreviations
$web_site  : object
Reference to a WebSite object in use to serve pages (if any)
__construct()  : mixed
Sets up the database manager that will be used and name of the search engine database
add()  : mixed
Used to add a count record related to a particular user for a particular activity to the impression analytics. This entails adding both a log-like record of when the activity happened and incrementing a global count of this activity.
addQueryExitImpression()  : mixed
Used to add an impression record related to a user clicking a link on the query result page for web search. This entails adding a record to QUERY_ITEM the particular link click from the query doesn't exist together with an add() impression call call.
addQueryImpression()  : mixed
Used to add a count record related to a web search to the impression analytics. This entails adding a record to QUERY_ITEM if it doesn't exist together with an add() call.
addWithDb()  : mixed
Used to add a count record related to a particular user for a particular activity to the impression analytics. This entails adding both a log-like record of when the activity happened and incrementing a global count of this activity. This static method version requires having an initialized data source manager and may be appropriate to call in the context of another model.
boldKeywords()  : string
Given a string, wraps in bold html tags a set of key words it contains.
computeStatistics()  : mixed
Used by Analytics job to aggregate impression raw data to make hourly, daily, monthly, and yearly impression statistics.
createIfNecessaryDirectory()  : int
Creates a directory and sets it to world permission if it doesn't already exist
delete()  : mixed
Used to delete information related to a particular user from the impression analytics.
deleteQueryStatistics()  : mixed
Deletes statistics related to web queries.
deleteWithDb()  : mixed
Used to delete information related to a particular user from the impression analytics. This static method version requires having an initialized data source manager and may be appropriate to call in the context of another model.
fileGetContents()  : string
Either a wrapper for file_get_contents, or if a WebSite object is being used to serve pages, it reads it in using blocking I/O file_get_contents() and caches it before return its string contents.
filePutContents()  : mixed
Either a wrapper for file_put_contents, or if a WebSite object is being used to serve pages, writes $data to the persistent file with name $filename. Saves a copy in the RAM cache if there is a copy already there.
formatSinglePageResult()  : array<string|int, mixed>
Given a page summary, extracts snippets which are related to a set of search words. For each snippet, bold faces the search terms, and then creates a new summary array.
fromCallback()  : string
Controls which tables and the names of tables underlie the given model and should be used in a getRows call This defaults to the single table whose name is whatever is before Model in the name of the model. For example, by default on FooModel this method would return "FOO". If a different behavior, this can be overridden in subclasses of Model
getDbmsList()  : array<string|int, mixed>
Gets a list of all DBMS that work with the search engine
getImpressionStat()  : array<string|int, mixed>
Returns the fuzzy statistics of the specified impression item.
getLastTimestamp()  : array<string|int, mixed>
Subtracts the timestamp to get actual time period window.
getPeriodHistogramData()  : array<string|int, mixed>
Calculates total number of views of given item id for given time period
getRows()  : array<string|int, mixed>
Gets a range of rows which match the provided search criteria from $th provided table
getSnippets()  : string
Given a string, extracts a snippets of text related to a given set of key words. For a given word a snippet is a window of characters to its left and right that is less than a maximum total number of characters.
getStatistics()  : array<string|int, mixed>
Used to return an array of impression statistics for a particular update period for a particular type of impression.
getUserId()  : string
Get the user_id associated with a given username (In base class as used as an internal method in both signin and user models)
init()  : mixed
Used to create a new counter related to a particular user for a particular activity in the impression analytics. This entails adding both a log-like record of when the activity happened and creating a new global count for this acitvity.
initWithDb()  : mixed
Used to create a new counter related to a particular user for a particular activity in the impression analytics. This entails adding both a log-like record of when the activity happened and creating a new global count for this activity. This static method version requires having an initialized data source manager and may be appropriate to call in the context of another model.
isSingleLocalhost()  : bool
Used to determine if an action involves just one yioop instance on the current local machine or not
loginDbms()  : bool
Returns whether the provided dbms needs a login and password or not (sqlite or sqlite3)
mostRecentGroupView()  : int
Returns the most recent timestamp of any view impression of a group a user has.
mostRecentGroupViews()  : array<string|int, mixed>
For a list of group_ids that $user_id may belong to, returns an array of pairs group_id => timestamp of most recent view
mostRecentThreadView()  : int
Returns the most recent timestamp of any view impression of a group a user has.
postQueryCallback()  : array<string|int, mixed>
Called after getRows has retrieved all the rows that it would retrieve but before they are returned to give one last place where they could be further manipulated. For example, in MachineModel this callback is used to make parallel network calls to get the status of each machine returned by getRows. The default for this method is to leave the rows that would be returned unchanged
recent()  : array<string|int, mixed>
Returns num many most recent impression items of the given type for a user
rowCallback()  : array<string|int, mixed>
Called after as row is retrieved by getRows from the database to perform some manipulation that would be useful for this model.
searchArrayToWhereOrderClauses()  : array<string|int, mixed>
Creates the WHERE and ORDER BY clauses for a query of a Yioop table such as USERS, ROLE, GROUP, which have associated search web forms. Searches are case insensitive
selectCallback()  : string
Controls which columns and the names of those columns from the tables underlying the given model should be return from a getRows call.
translateDb()  : mixed
Used to get the translation of a string_id stored in the database to the given locale.
updateImpressionStat()  : mixed
Used to update the fuzzy statistics of impression items.
updatePrivacyViews()  : mixed
Used to update the fuzzified view counts of a thread item.
whereCallback()  : string
Controls the WHERE clause of the SQL query that underlies the given model and should be used in a getRows call.

Constants

DEFAULT_DESCRIPTION_LENGTH

Default maximum character length of a search summary

public mixed DEFAULT_DESCRIPTION_LENGTH = 150

MAX_SNIPPET_TITLE_LENGTH

public mixed MAX_SNIPPET_TITLE_LENGTH = 20

MIN_SNIPPET_LENGTH

public mixed MIN_SNIPPET_LENGTH = 100

SNIPPET_LENGTH_LEFT

public mixed SNIPPET_LENGTH_LEFT = 20

SNIPPET_LENGTH_RIGHT

public mixed SNIPPET_LENGTH_RIGHT = 40

SNIPPET_TITLE_LENGTH

public mixed SNIPPET_TITLE_LENGTH = 20

Properties

$any_fields

These fields if present in $search_array (used by @see getRows() ), but with value "-1", will be skipped as part of the where clause but will be used for order by clause

public array<string|int, mixed> $any_fields = []

$cache

Cache object to be used if we are doing caching

public static object $cache

$db

Reference to a DatasourceManager

public object $db

$db_name

Name of the search engine database

public string $db_name

$edited_page_summaries

Associative array of page summaries which might be used to override default page summaries if set.

public array<string|int, mixed> $edited_page_summaries = null

$private_db

Reference to a private DatasourceManager

public object $private_db

$private_db_name

Name of the private search engine database

public string $private_db_name

$search_table_column_map

Associations of the form name of field for web forms => database column names/abbreviations

public array<string|int, mixed> $search_table_column_map = []

$web_site

Reference to a WebSite object in use to serve pages (if any)

public object $web_site

Methods

__construct()

Sets up the database manager that will be used and name of the search engine database

public __construct([string $db_name = CDB_NAME ][, bool $connect = true ][, mixed $web_site = null ]) : mixed
Parameters
$db_name : string = CDB_NAME

the name of the database for the search engine

$connect : bool = true

whether to connect to the database by default after making the datasource class

$web_site : mixed = null
Return values
mixed

add()

Used to add a count record related to a particular user for a particular activity to the impression analytics. This entails adding both a log-like record of when the activity happened and incrementing a global count of this activity.

public add(int $user_id, int $item_id, int $type_id) : mixed
Parameters
$user_id : int

id of user we are adding analytic information for

$item_id : int

id of particular item we are adding analytic information of

$type_id : int

type of particular item we are adding analytic information of (group, wiki, thread, etc)

Return values
mixed

addQueryExitImpression()

Used to add an impression record related to a user clicking a link on the query result page for web search. This entails adding a record to QUERY_ITEM the particular link click from the query doesn't exist together with an add() impression call call.

public addQueryExitImpression(string $query, string $link) : mixed
Parameters
$query : string

search query the user was clicking a link on

$link : string

url that the user clicked on from the results

Return values
mixed

addQueryImpression()

Used to add a count record related to a web search to the impression analytics. This entails adding a record to QUERY_ITEM if it doesn't exist together with an add() call.

public addQueryImpression(string $query) : mixed
Parameters
$query : string

search query we are adding an impression for

Return values
mixed

addWithDb()

Used to add a count record related to a particular user for a particular activity to the impression analytics. This entails adding both a log-like record of when the activity happened and incrementing a global count of this activity. This static method version requires having an initialized data source manager and may be appropriate to call in the context of another model.

public static addWithDb(int $user_id, int $item_id, int $type_id, object $db) : mixed
Parameters
$user_id : int

id of user we are adding analytic information for

$item_id : int

id of particular item we are adding analytic information of

$type_id : int

type of particular item we are adding analytic information of (group, wiki, thread, etc)

$db : object

a DatasourceManager used to query the Yioop Yioop database

Return values
mixed

boldKeywords()

Given a string, wraps in bold html tags a set of key words it contains.

public boldKeywords(string $text, array<string|int, mixed> $words) : string
Parameters
$text : string

haystack string to look for the key words

$words : array<string|int, mixed>

an array of words to bold face

Return values
string

the resulting string after boldfacing has been applied

computeStatistics()

Used by Analytics job to aggregate impression raw data to make hourly, daily, monthly, and yearly impression statistics.

public computeStatistics() : mixed
Return values
mixed

createIfNecessaryDirectory()

Creates a directory and sets it to world permission if it doesn't already exist

public createIfNecessaryDirectory(string $directory) : int
Parameters
$directory : string

name of directory to create

Return values
int

-1 on failure, 0 if already existed, 1 if created

delete()

Used to delete information related to a particular user from the impression analytics.

public delete(int $user_id, int $item_id, int $type_id) : mixed
Parameters
$user_id : int

id of user we are deleting analytic information for

$item_id : int

id of particular item we are deleting analytic information of

$type_id : int

type of particular item we are deleting analytic information of (group, wiki, thread, etc)

Return values
mixed

deleteQueryStatistics()

Deletes statistics related to web queries.

public deleteQueryStatistics() : mixed
Return values
mixed

deleteWithDb()

Used to delete information related to a particular user from the impression analytics. This static method version requires having an initialized data source manager and may be appropriate to call in the context of another model.

public static deleteWithDb(int $user_id, int $item_id, int $type_id, object $db) : mixed
Parameters
$user_id : int

id of user we are deleting analytic information for

$item_id : int

id of particular item we are deleting analytic information of

$type_id : int

type of particular item we are deleting analytic information of (group, wiki, thread, etc)

$db : object

a DatasourceManager used to query the Yioop Yioop database

Return values
mixed

fileGetContents()

Either a wrapper for file_get_contents, or if a WebSite object is being used to serve pages, it reads it in using blocking I/O file_get_contents() and caches it before return its string contents.

public fileGetContents(string $filename[, bool $force_read = false ]) : string

Note this function assumes that only the web server is performing I/O with this file. filemtime() can be used to see if a file on disk has been changed and then you can use $force_read = true below to force re- reading the file into the cache

Parameters
$filename : string

name of file to get contents of

$force_read : bool = false

whether to force the file to be read from persistent storage rather than the cache

Return values
string

contents of the file given by $filename

filePutContents()

Either a wrapper for file_put_contents, or if a WebSite object is being used to serve pages, writes $data to the persistent file with name $filename. Saves a copy in the RAM cache if there is a copy already there.

public filePutContents(string $filename, string $data) : mixed
Parameters
$filename : string

name of file to write to persistent storages

$data : string

string of data to store in file

Return values
mixed

formatSinglePageResult()

Given a page summary, extracts snippets which are related to a set of search words. For each snippet, bold faces the search terms, and then creates a new summary array.

public formatSinglePageResult(array<string|int, mixed> $page[, array<string|int, mixed> $words = null ][, int $description_length = self::DEFAULT_DESCRIPTION_LENGTH ]) : array<string|int, mixed>
Parameters
$page : array<string|int, mixed>

a single search result summary

$words : array<string|int, mixed> = null

keywords (typically what was searched on)

$description_length : int = self::DEFAULT_DESCRIPTION_LENGTH

length of the description

Return values
array<string|int, mixed>

$page which has been snippified and bold faced

fromCallback()

Controls which tables and the names of tables underlie the given model and should be used in a getRows call This defaults to the single table whose name is whatever is before Model in the name of the model. For example, by default on FooModel this method would return "FOO". If a different behavior, this can be overridden in subclasses of Model

public fromCallback([mixed $args = null ]) : string
Parameters
$args : mixed = null

any additional arguments which should be used to determine these tables

Return values
string

a comma separated list of tables suitable for a SQL query

getDbmsList()

Gets a list of all DBMS that work with the search engine

public getDbmsList() : array<string|int, mixed>
Return values
array<string|int, mixed>

Names of available data sources

getImpressionStat()

Returns the fuzzy statistics of the specified impression item.

public getImpressionStat(int $item_id, int $item_type, int $period) : array<string|int, mixed>

If no statistics exists it creates default dummy statistics It is assumed this function is always called at least once before

Parameters
$item_id : int

id of the item to return the statistics

$item_type : int

type of the item

$period : int

time period of the item

Tags
see
updateImpressionStat
Return values
array<string|int, mixed>

values of $sum and $fuzzy_num_views

getLastTimestamp()

Subtracts the timestamp to get actual time period window.

public getLastTimestamp(int $period) : array<string|int, mixed>

To get statistics for one time period, lower time period's data needs to be extracted. For example, to get last day's stat, result in every one hour needs to be extracted in the last 24 hours

Parameters
$period : int

time period

Return values
array<string|int, mixed>

getPeriodHistogramData()

Calculates total number of views of given item id for given time period

public getPeriodHistogramData(int $type, int $period, int $item_id) : array<string|int, mixed>
Parameters
$type : int

the impression type to get data for

$period : int

time period for which to show stats

$item_id : int

item identifier of item for which to show stats

Return values
array<string|int, mixed>

getRows()

Gets a range of rows which match the provided search criteria from $th provided table

public getRows(int $limit, int $num, int &$total[, array<string|int, mixed> $search_array = [] ][, array<string|int, mixed> $args = null ]) : array<string|int, mixed>
Parameters
$limit : int

starting row from the potential results to return

$num : int

number of rows after start row to return

$total : int

gets set with the total number of rows that can be returned by the given database query

$search_array : array<string|int, mixed> = []

each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by

$args : array<string|int, mixed> = null

additional values which may be used to get rows (what these are will typically depend on the subclass implementation)

Return values
array<string|int, mixed>

getSnippets()

Given a string, extracts a snippets of text related to a given set of key words. For a given word a snippet is a window of characters to its left and right that is less than a maximum total number of characters.

public getSnippets(string $text, array<string|int, mixed> $words, string $description_length) : string

There is also a rule that a snippet should avoid ending in the middle of a word

Parameters
$text : string

haystack to extract snippet from

$words : array<string|int, mixed>

keywords used to make look in haystack

$description_length : string

length of the description desired

Return values
string

a concatenation of the extracted snippets of each word

getStatistics()

Used to return an array of impression statistics for a particular update period for a particular type of impression.

public getStatistics(int $type, int $period[, int $filter = "" ][, int $group_id = CPUBLIC_GROUP_ID ][, int $user_id = CPUBLIC_USER_ID ], int $limit[, int $num = 100 ]) : array<string|int, mixed>
Parameters
$type : int

type of impression to return statistic

$period : int

an update period to get statistics for in second, for example, 3600 would give statistics for an hour. Only hour, day, month, and year second quantities supported

$filter : int = ""

a string to filter the items names of the statistics returns (for example, filter could be used to filter statistics about popular thread names with respect to the number of views statistics )

$group_id : int = CPUBLIC_GROUP_ID

group identifier of group want stats for

$user_id : int = CPUBLIC_USER_ID

user identifier of user want stats for

$limit : int

first row we want from the result set

$num : int = 100

number of rows we want starting from the first row in the result set

Return values
array<string|int, mixed>

getUserId()

Get the user_id associated with a given username (In base class as used as an internal method in both signin and user models)

public getUserId(string $username) : string
Parameters
$username : string

the username to look up

Return values
string

the corresponding userid

init()

Used to create a new counter related to a particular user for a particular activity in the impression analytics. This entails adding both a log-like record of when the activity happened and creating a new global count for this acitvity.

public init(int $user_id, int $item_id, int $type_id) : mixed
Parameters
$user_id : int

id of user we are adding analytic information for

$item_id : int

id of particular item we are adding analytic information of

$type_id : int

type of particular item we are adding analytic information of (group, wiki, thread, etc)

Return values
mixed

initWithDb()

Used to create a new counter related to a particular user for a particular activity in the impression analytics. This entails adding both a log-like record of when the activity happened and creating a new global count for this activity. This static method version requires having an initialized data source manager and may be appropriate to call in the context of another model.

public static initWithDb(int $user_id, int $item_id, int $type_id, object $db) : mixed
Parameters
$user_id : int

id of user we are adding analytic information for

$item_id : int

id of particular item we are adding analytic information of

$type_id : int

type of particular item we are adding analytic information of (group, wiki, thread, etc)

$db : object

a DatasourceManager used to query the Yioop Yioop database

Return values
mixed

isSingleLocalhost()

Used to determine if an action involves just one yioop instance on the current local machine or not

public isSingleLocalhost(array<string|int, mixed> $machine_urls[, string $index_timestamp = -1 ]) : bool
Parameters
$machine_urls : array<string|int, mixed>

urls of yioop instances to which the action applies

$index_timestamp : string = -1

if timestamp exists checks if the index has declared itself to be a no network index.

Return values
bool

whether it involves a single local yioop instance (true) or not (false)

loginDbms()

Returns whether the provided dbms needs a login and password or not (sqlite or sqlite3)

public loginDbms(string $dbms) : bool
Parameters
$dbms : string

the name of a database management system

Return values
bool

true if needs a login and password; false otherwise

mostRecentGroupView()

Returns the most recent timestamp of any view impression of a group a user has.

public mostRecentGroupView(int $user_id) : int
Parameters
$user_id : int

want most recent impression for

Return values
int

timstamp of most recent impression

mostRecentGroupViews()

For a list of group_ids that $user_id may belong to, returns an array of pairs group_id => timestamp of most recent view

public mostRecentGroupViews(int $user_id, array<string|int, mixed> $group_ids) : array<string|int, mixed>
Parameters
$user_id : int

user to look up most recent views for

$group_ids : array<string|int, mixed>

groups to check in

Return values
array<string|int, mixed>

pairs group_id => timestamp

mostRecentThreadView()

Returns the most recent timestamp of any view impression of a group a user has.

public mostRecentThreadView(int $user_id, mixed $thread_id) : int
Parameters
$user_id : int

want most recent impression for

$thread_id : mixed
Return values
int

timstamp of most recent impression

postQueryCallback()

Called after getRows has retrieved all the rows that it would retrieve but before they are returned to give one last place where they could be further manipulated. For example, in MachineModel this callback is used to make parallel network calls to get the status of each machine returned by getRows. The default for this method is to leave the rows that would be returned unchanged

public postQueryCallback(array<string|int, mixed> $rows) : array<string|int, mixed>
Parameters
$rows : array<string|int, mixed>

that have been calculated so far by getRows

Return values
array<string|int, mixed>

$rows after this final manipulation

recent()

Returns num many most recent impression items of the given type for a user

public recent(int $user_id, int $type_id, int $num) : array<string|int, mixed>
Parameters
$user_id : int

id of user we are looking for information about

$type_id : int

type of particular item we want information on (group, wiki, thread, etc)

$num : int

how many most recent entries we want to get

Return values
array<string|int, mixed>

of $num many most recent item id's of $type_id of for the given $user_id

rowCallback()

Called after as row is retrieved by getRows from the database to perform some manipulation that would be useful for this model.

public rowCallback(array<string|int, mixed> $row, mixed $args) : array<string|int, mixed>

For example, in CrawlModel, after a row representing a crawl mix has been gotten, this is used to perform an additional query to marshal its components. By default this method just returns this row unchanged.

Parameters
$row : array<string|int, mixed>

row as retrieved from database query

$args : mixed

additional arguments that might be used by this callback

Return values
array<string|int, mixed>

$row after callback manipulation

searchArrayToWhereOrderClauses()

Creates the WHERE and ORDER BY clauses for a query of a Yioop table such as USERS, ROLE, GROUP, which have associated search web forms. Searches are case insensitive

public searchArrayToWhereOrderClauses(array<string|int, mixed> $search_array[, array<string|int, mixed> $any_fields = ['status'] ]) : array<string|int, mixed>
Parameters
$search_array : array<string|int, mixed>

each element of this is a quadruple name of a field, what comparison to perform, a value to check, and an order (ascending/descending) to sort by

$any_fields : array<string|int, mixed> = ['status']

these fields if present in search array but with value "-1" will be skipped as part of the where clause but will be used for order by clause

Return values
array<string|int, mixed>

string for where clause, string for order by clause

selectCallback()

Controls which columns and the names of those columns from the tables underlying the given model should be return from a getRows call.

public selectCallback([mixed $args = null ]) : string

This defaults to *, but in general will be overridden in subclasses of Model

Parameters
$args : mixed = null

any additional arguments which should be used to determine the columns

Return values
string

a comma separated list of columns suitable for a SQL query

translateDb()

Used to get the translation of a string_id stored in the database to the given locale.

public translateDb(string $string_id, string $locale_tag) : mixed
Parameters
$string_id : string

id to translate

$locale_tag : string

to translate to

Return values
mixed

translation if found, $string_id, otherwise

updateImpressionStat()

Used to update the fuzzy statistics of impression items.

public updateImpressionStat(int $item_id, int $item_type, int $period, int $num_views, int $fuzzy_num_views) : mixed
Parameters
$item_id : int

id of the item to update the statistics

$item_type : int

type of the item

$period : int

time period of the item

$num_views : int

number of views of the item for specified time period

$fuzzy_num_views : int

fuzzified views of the item for specified time period

Return values
mixed

updatePrivacyViews()

Used to update the fuzzified view counts of a thread item.

public updatePrivacyViews(int $item_id, int $num_views, int $fuzzy_num_views) : mixed
Parameters
$item_id : int

id of the thread item to update fuzzified counts for

$num_views : int

current number of views for item. This value is stored in the TMP_NUM_VIEWS column to remember the last time the FUZZY_NUM_VIEWS column was updated. Only when the TMP_NUM_VIEWS column differs from the NUM_VIEWS COLUMN will this method need to be called.

$fuzzy_num_views : int

number of views after epsilon privacy fuzzification applied.

Return values
mixed

whereCallback()

Controls the WHERE clause of the SQL query that underlies the given model and should be used in a getRows call.

public whereCallback([mixed $args = null ]) : string

This defaults to an empty WHERE clause.

Parameters
$args : mixed = null

additional arguments that might be used to construct the WHERE clause.

Return values
string

a SQL WHERE clause


        

Search results