Yioop_V9.5_Source_Code_Documentation

RecommendationJob extends MediaJob
in package

Recommendation Job recommends the trending threads as well as threads and groups which are relevant based on the users viewing history

Table of Contents

CONTEXT_WINDOW_LENGTH  = 5
Length of context window for calculating term embeddings
DESCRIPTION_STOP_WORDS  = ["author", "authors", "plot", "genre", "genres", "star", "stars", "credits", "rating", "ratings", "year", "director", "cast", "runtime"]
Stop words to exclude from the descriptions fetched by DescriptionUpdate media job
HASH_ALGORITHM  = "md5"
Hash algorithm to be used for calculating hash in Hash2Vec embedding
MAX_BATCH_SIZE  = 200
Maximum number of resources used in making resource recommendations/ Maximum number of group items to hold in memory in one go
MAX_GROUP_ITEMS  = 50000
Maximum number of group items used in making recommendations
MAX_TERM_EMBEDDINGS  = 500
MAX term embeddings fetched from database to initialize LRUCache
MAX_TERMS  = 20000
Maximum number of terms used in making recommendations
RECOMMENDATION_FILE  = \seekquarry\yioop\configs\APP_DIR . "/resources/recommendation.txt"
File containing paths to description folders of wiki page resources that should be used to create data corpus for computing recommendations
SIGN_HASH_ALGORITHM  = "crc32"
Hash algorithm to be used for calculating sign in Hash2Vec term embedding
UPDATE_PERIOD  = \seekquarry\yioop\configs\ONE_MONTH
Update period to consider for fetching the records from ITEM_IMPRESSION_SUMMARY table
$active_time  : int
Used to track what is the active recommendation timestamp
$controller  : object
If MediaJob was instantiated in the web app, the controller that instatiated it
$cron_model  : object
Model used for timing when things were computed
$db  : object
Datasource object used to run db queries related to recommendation items (for storing and updating them)
$item_idf  : array<string|int, mixed>
Associative array of the number of items a term appears in
$lru_cache  : mixed
LRUCache for term embeddings
$media_updater  : object
If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
$name_server_does_client_tasks  : bool
Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
$name_server_does_client_tasks_only  : bool
Whether this MediaJob performs name server only tasks
$tasks  : array<string|int, mixed>
The most recently received from the name server tasks for this MediaJob
$update_time  : int
Time in current epoch when analytics last updated
$user_idf  : array<string|int, mixed>
Associative array of the number of user views a term appears in
__construct()  : mixed
Instiates the MediaJob with a reference to the object that instatiated it
checkPrerequisites()  : bool
Only update if its been more than an hour since the last update
cleanRemoveStopWords()  : array<string|int, mixed>
Split the given text into terms, clean the terms by removing non alphanumeric characters and remove the stop terms in order to reduce the noise while calculating the embeddings
computeGroupEmbeddings()  : array<string|int, mixed>
Computes the group embeddings using the item embeddings for the items in a group. Additionally fetches the existing group embeddings from database and updates them if the item embeddings are updated
computeGroupUserEmbeddings()  : array<string|int, mixed>
Computes the user embeddings based on the group embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD or are a member in the group
computeGroupUserRecommendations()  : array<string|int, mixed>
Computes the group recommendation for user based on the cosine similarity between user embeddings and group embeddings. Recommendations are calculated for the groups whic user has not interacted with yet and they are not member of that group
computeItemEmbeddings()  : array<string|int, mixed>
Computes the item embeddings for individual items (main thread only and not comments) in groups feeds using the term embeddings for their terms.
computeItemTermEmbeddings()  : array<string|int, mixed>
Computes the term embeddings for individual items (main thread only and not comments) in groups feeds for the terms in their title and description text. Processes only MAX_GROUP_ITEMS which are either newly created or recently edited
computeItemUserEmbeddings()  : array<string|int, mixed>
Computes the user embeddings based on the item embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD
computeItemUserRecommendations()  : array<string|int, mixed>
Computes the items recommendation for user based on the cosine similarity between user embeddings and item embeddings. Recommendations are calculated for the items user have not interacted with yet and items should be from the groups where the user is already a memeber
computeThreadGroupRecommendations()  : mixed
Manages the whole process of computing thread and group recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result
computeWikiResourceEmbeddings()  : array<string|int, mixed>
Computes the embeddings for wiki page resources using the calculated term embeddings and add the metadata details separately to the embeddings
computeWikiResourceRecommendations()  : mixed
Manages the whole process of computing wiki resource recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result
computeWikiTermEmbeddings()  : array<string|int, mixed>
Computes the embedding for new terms in the description of wiki resources and updates the embedding of existing terms using Hash2Vec approach
computeWikiUserEmbeddings()  : array<string|int, mixed>
Computes user embeddings for wiki resources based on the user's resources impression logged in ITEM_IMPRESSION_SUMMARY table for the defined update period
computeWikiUserRecommendations()  : mixed
Computes the wiki resource recommendations based on cosine similarity between resource embeddings and user embeddings
doTasks()  : mixed
This method is run on MediaUpdater client with data gotten from the name server by getTasks. The idea is the client is supposed to then this information and if need be send the results back to the name server
execNameServer()  : array<string|int, mixed>
Executes a method on the name server's JobController.
finishTasks()  : mixed
This method is called on the name server to finish processing any data returned by MediaUpdater clients.
getCurrentMachine()  : string
Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
getDescriptionFiles()  : array<string|int, mixed>
Returns all the resource description files in a given thumb folder and also recursively scan through subfolders if any
getJobName()  : string
Gets the class name (less namespace and the word Job ) of the current MediaJob
getTasks()  : array<string|int, mixed>
Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
getTermEmbedding()  : string
Returns the term embedding either from LRU cache or database
getWikiResourceDescriptions()  : array<string|int, mixed>
Fetches the description for the eligible wiki resources having the root folder path captured in RECOMMENDATION_FILE
init()  : mixed
Sets up the database connection so can access tables related to recommendations. Initialize timing info related to job.
initializeNewUserRecommendations()  : mixed
Computes recommendations for users who have yet to receive any recommendation of the given type based on what is the most most popular recommendation
nondistributedTasks()  : mixed
For now analytics update is only done on name server as Yioop currently only supports one DBMS at a time.
prepareTasks()  : mixed
This method is called on the name server to prepare data for any MediaUpdater clients.
putTasks()  : array<string|int, mixed>
After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
run()  : mixed
Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
saveTermEmbeddingsCacheToDb()  : mixed
Writes back the term embeddings in cache to database and free up memory
updateTermEmbeddingCache()  : mixed
Updates LRU cache of term embeddings and save the evicted embedding back to database

Constants

CONTEXT_WINDOW_LENGTH

Length of context window for calculating term embeddings

public mixed CONTEXT_WINDOW_LENGTH = 5

DESCRIPTION_STOP_WORDS

Stop words to exclude from the descriptions fetched by DescriptionUpdate media job

public mixed DESCRIPTION_STOP_WORDS = ["author", "authors", "plot", "genre", "genres", "star", "stars", "credits", "rating", "ratings", "year", "director", "cast", "runtime"]

HASH_ALGORITHM

Hash algorithm to be used for calculating hash in Hash2Vec embedding

public mixed HASH_ALGORITHM = "md5"

MAX_BATCH_SIZE

Maximum number of resources used in making resource recommendations/ Maximum number of group items to hold in memory in one go

public mixed MAX_BATCH_SIZE = 200

MAX_GROUP_ITEMS

Maximum number of group items used in making recommendations

public mixed MAX_GROUP_ITEMS = 50000

MAX_TERM_EMBEDDINGS

MAX term embeddings fetched from database to initialize LRUCache

public mixed MAX_TERM_EMBEDDINGS = 500

MAX_TERMS

Maximum number of terms used in making recommendations

public mixed MAX_TERMS = 20000

RECOMMENDATION_FILE

File containing paths to description folders of wiki page resources that should be used to create data corpus for computing recommendations

public mixed RECOMMENDATION_FILE = \seekquarry\yioop\configs\APP_DIR . "/resources/recommendation.txt"

SIGN_HASH_ALGORITHM

Hash algorithm to be used for calculating sign in Hash2Vec term embedding

public mixed SIGN_HASH_ALGORITHM = "crc32"

UPDATE_PERIOD

Update period to consider for fetching the records from ITEM_IMPRESSION_SUMMARY table

public mixed UPDATE_PERIOD = \seekquarry\yioop\configs\ONE_MONTH

Properties

$active_time

Used to track what is the active recommendation timestamp

public int $active_time

$controller

If MediaJob was instantiated in the web app, the controller that instatiated it

public object $controller

$cron_model

Model used for timing when things were computed

public object $cron_model

$db

Datasource object used to run db queries related to recommendation items (for storing and updating them)

public object $db

$item_idf

Associative array of the number of items a term appears in

public array<string|int, mixed> $item_idf

$media_updater

If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater

public object $media_updater

$name_server_does_client_tasks

Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks

public bool $name_server_does_client_tasks

$name_server_does_client_tasks_only

Whether this MediaJob performs name server only tasks

public bool $name_server_does_client_tasks_only

$tasks

The most recently received from the name server tasks for this MediaJob

public array<string|int, mixed> $tasks

$update_time

Time in current epoch when analytics last updated

public int $update_time

$user_idf

Associative array of the number of user views a term appears in

public array<string|int, mixed> $user_idf

Methods

__construct()

Instiates the MediaJob with a reference to the object that instatiated it

public __construct([object $media_updater = null ][, object $controller = null ]) : mixed
Parameters
$media_updater : object = null

a reference to the media updater that instatiated this object (if being run in MediaUpdater)

$controller : object = null

a reference to the controller that instantiated this object (if being run in the web app)

Return values
mixed

checkPrerequisites()

Only update if its been more than an hour since the last update

public checkPrerequisites() : bool
Return values
bool

whether its been an hour since the last update

cleanRemoveStopWords()

Split the given text into terms, clean the terms by removing non alphanumeric characters and remove the stop terms in order to reduce the noise while calculating the embeddings

public cleanRemoveStopWords(string $text[, bool $description_stop_word_flag = false ]) : array<string|int, mixed>
Parameters
$text : string

which needs to be processed

$description_stop_word_flag : bool = false

to remove words present in DESCRIPTION_STOP_WORDS

Return values
array<string|int, mixed>

$terms [term_id, term] term_id calculated using md5 hash for the term

computeGroupEmbeddings()

Computes the group embeddings using the item embeddings for the items in a group. Additionally fetches the existing group embeddings from database and updates them if the item embeddings are updated

public computeGroupEmbeddings(array<string|int, mixed> $item_embeddings) : array<string|int, mixed>
Parameters
$item_embeddings : array<string|int, mixed>

embedding for the items

Return values
array<string|int, mixed>

$updated_group_embeddings containing embeddings for groups

computeGroupUserEmbeddings()

Computes the user embeddings based on the group embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD or are a member in the group

public computeGroupUserEmbeddings(array<string|int, mixed> $group_embeddings) : array<string|int, mixed>
Parameters
$group_embeddings : array<string|int, mixed>

embedding vectors of groups

Return values
array<string|int, mixed>

[$group_user_embedding, $user_groups] user embeddings for groups and the groups id user have membership

computeGroupUserRecommendations()

Computes the group recommendation for user based on the cosine similarity between user embeddings and group embeddings. Recommendations are calculated for the groups whic user has not interacted with yet and they are not member of that group

public computeGroupUserRecommendations(array<string|int, mixed> $group_embeddings, array<string|int, mixed> $group_user_embeddings, array<string|int, mixed> $user_groups, mixed $user_group_impression) : array<string|int, mixed>
Parameters
$group_embeddings : array<string|int, mixed>

embeddings vector for groups

$group_user_embeddings : array<string|int, mixed>

embeddings vector for users

$user_groups : array<string|int, mixed>

groups id for user having membership

$user_group_impression : mixed
Return values
array<string|int, mixed>

$user_group_impression group ids which user has seen

computeItemEmbeddings()

Computes the item embeddings for individual items (main thread only and not comments) in groups feeds using the term embeddings for their terms.

public computeItemEmbeddings(array<string|int, mixed> $item_terms) : array<string|int, mixed>

Additionally fetches the existing item embeddings from database and updates them if the term embeddings are updated for their terms

Parameters
$item_terms : array<string|int, mixed>

terms in each item

Return values
array<string|int, mixed>

$updated_item_embeddings containing embeddings for items

computeItemTermEmbeddings()

Computes the term embeddings for individual items (main thread only and not comments) in groups feeds for the terms in their title and description text. Processes only MAX_GROUP_ITEMS which are either newly created or recently edited

public computeItemTermEmbeddings() : array<string|int, mixed>
Return values
array<string|int, mixed>

$item_terms terms in each item

computeItemUserEmbeddings()

Computes the user embeddings based on the item embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD

public computeItemUserEmbeddings(array<string|int, mixed> $item_embeddings) : array<string|int, mixed>
Parameters
$item_embeddings : array<string|int, mixed>

embedding vectors of items

Return values
array<string|int, mixed>

[$item_user_embedding, $user_items] user embeddings for items and the items id user have impression

computeItemUserRecommendations()

Computes the items recommendation for user based on the cosine similarity between user embeddings and item embeddings. Recommendations are calculated for the items user have not interacted with yet and items should be from the groups where the user is already a memeber

public computeItemUserRecommendations(array<string|int, mixed> $item_embeddings, array<string|int, mixed> $item_user_embeddings, array<string|int, mixed> $user_items) : array<string|int, mixed>
Parameters
$item_embeddings : array<string|int, mixed>

embeddings vectors for items

$item_user_embeddings : array<string|int, mixed>

embeddings vectors for user

$user_items : array<string|int, mixed>

items id for user in impression table

Return values
array<string|int, mixed>

$user_groups group ids where the user is a member

computeThreadGroupRecommendations()

Manages the whole process of computing thread and group recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result

public computeThreadGroupRecommendations() : mixed
Return values
mixed

computeWikiResourceEmbeddings()

Computes the embeddings for wiki page resources using the calculated term embeddings and add the metadata details separately to the embeddings

public computeWikiResourceEmbeddings(array<string|int, mixed> $resource_terms, array<string|int, mixed> $meta_details_terms) : array<string|int, mixed>
Parameters
$resource_terms : array<string|int, mixed>

of processed terms from resource description

$meta_details_terms : array<string|int, mixed>

of raw resource descriptions

Return values
array<string|int, mixed>

$updated_item_embeddings array of updated wiki resource embeddings

computeWikiResourceRecommendations()

Manages the whole process of computing wiki resource recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result

public computeWikiResourceRecommendations() : mixed
Return values
mixed

computeWikiTermEmbeddings()

Computes the embedding for new terms in the description of wiki resources and updates the embedding of existing terms using Hash2Vec approach

public computeWikiTermEmbeddings(array<string|int, mixed> $descriptions) : array<string|int, mixed>
Parameters
$descriptions : array<string|int, mixed>

of resources

Return values
array<string|int, mixed>

[$resource_terms, $meta_details_term]

computeWikiUserEmbeddings()

Computes user embeddings for wiki resources based on the user's resources impression logged in ITEM_IMPRESSION_SUMMARY table for the defined update period

public computeWikiUserEmbeddings(array<string|int, mixed> $item_embeddings) : array<string|int, mixed>
Parameters
$item_embeddings : array<string|int, mixed>

of wiki page resources embedding

Return values
array<string|int, mixed>

[$user_embeddings, $user_items] of user embeddings for wiki resources and the user resource impression

computeWikiUserRecommendations()

Computes the wiki resource recommendations based on cosine similarity between resource embeddings and user embeddings

public computeWikiUserRecommendations(array<string|int, mixed> $item_embeddings, array<string|int, mixed> $user_embeddings, array<string|int, mixed> $user_items, mixed $resource_metadata) : mixed
Parameters
$item_embeddings : array<string|int, mixed>

of wiki resources embeddings

$user_embeddings : array<string|int, mixed>

of users consumed wiki resources embeddings

$user_items : array<string|int, mixed>

of users consumed wiki resources

$resource_metadata : mixed
Return values
mixed

doTasks()

This method is run on MediaUpdater client with data gotten from the name server by getTasks. The idea is the client is supposed to then this information and if need be send the results back to the name server

public doTasks(array<string|int, mixed> $tasks) : mixed
Parameters
$tasks : array<string|int, mixed>

data that the MediaJob running on a client MediaUpdater needs to process

Return values
mixed

the result of carrying out that processing

execNameServer()

Executes a method on the name server's JobController.

public static execNameServer(string $command[, string $args = null ]) : array<string|int, mixed>

It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.

Parameters
$command : string

the method to invoke on the name server

$args : string = null

additional arguments to be passed to the name server

Return values
array<string|int, mixed>

data returned by the name server.

finishTasks()

This method is called on the name server to finish processing any data returned by MediaUpdater clients.

public finishTasks() : mixed
Return values
mixed

getCurrentMachine()

Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request

public static getCurrentMachine() : string
Return values
string

hash of current machine url

getDescriptionFiles()

Returns all the resource description files in a given thumb folder and also recursively scan through subfolders if any

public getDescriptionFiles(string $thumb_folder) : array<string|int, mixed>
Parameters
$thumb_folder : string

path of a thumb folder

Return values
array<string|int, mixed>

$files list of description files path in given folder

getJobName()

Gets the class name (less namespace and the word Job ) of the current MediaJob

public static getJobName() : string
Return values
string

name of the current job

getTasks()

Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.

public getTasks(int $machine_id[, array<string|int, mixed> $data = null ]) : array<string|int, mixed>
Parameters
$machine_id : int

id of client requesting data

$data : array<string|int, mixed> = null

any additional info about data being requested

Return values
array<string|int, mixed>

work for the client to process

getTermEmbedding()

Returns the term embedding either from LRU cache or database

public getTermEmbedding(int $term_id, int $item_type[, bool $update = false ]) : string
Parameters
$term_id : int
$item_type : int
$update : bool = false

indicates whether to update the cache

Return values
string

$term_embedding

getWikiResourceDescriptions()

Fetches the description for the eligible wiki resources having the root folder path captured in RECOMMENDATION_FILE

public getWikiResourceDescriptions() : array<string|int, mixed>
Return values
array<string|int, mixed>

$descriptions of resources

init()

Sets up the database connection so can access tables related to recommendations. Initialize timing info related to job.

public init() : mixed
Return values
mixed

initializeNewUserRecommendations()

Computes recommendations for users who have yet to receive any recommendation of the given type based on what is the most most popular recommendation

public initializeNewUserRecommendations() : mixed
Return values
mixed

nondistributedTasks()

For now analytics update is only done on name server as Yioop currently only supports one DBMS at a time.

public nondistributedTasks() : mixed
Return values
mixed

prepareTasks()

This method is called on the name server to prepare data for any MediaUpdater clients.

public prepareTasks() : mixed
Return values
mixed

putTasks()

After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server

public putTasks(int $machine_id, mixed $data) : array<string|int, mixed>
Parameters
$machine_id : int

id of client that is sending data to name server

$data : mixed

results of computation done by client

Return values
array<string|int, mixed>

any response information to send back to the client

run()

Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.

public run() : mixed
Return values
mixed

saveTermEmbeddingsCacheToDb()

Writes back the term embeddings in cache to database and free up memory

public saveTermEmbeddingsCacheToDb(int $item_type) : mixed
Parameters
$item_type : int

value for ITEM_TYPE column

Return values
mixed

updateTermEmbeddingCache()

Updates LRU cache of term embeddings and save the evicted embedding back to database

public updateTermEmbeddingCache(int $term_id, string $term_embedding, int $item_type[, mixed $message = "" ]) : mixed
Parameters
$term_id : int
$term_embedding : string
$item_type : int
$message : mixed = ""
Return values
mixed

        

Search results