
RecommendationJob extends MediaJob
in package

Recommendation Job recommends the trending threads as well as threads and groups which are relevant based on the users viewing history

Table of Contents

Length of context window for calculating term embeddings
DESCRIPTION_STOP_WORDS  = ["author", "authors", "plot", "genre", "genres", "star", "stars", "credits", "rating", "ratings", "year", "director", "cast", "runtime"]
Stop words to exclude from the descriptions fetched by DescriptionUpdate media job
Hash algorithm to be used for calculating hash in Hash2Vec embedding
Maximum number of resources used in making resource recommendations/ Maximum number of group items to hold in memory in one go
Maximum number of group items used in making recommendations
MAX term embeddings fetched from database to initialize LRUCache
MAX_TERMS  = 20000
Maximum number of terms used in making recommendations
RECOMMENDATION_FILE  = \seekquarry\yioop\configs\APP_DIR . "/resources/recommendation.txt"
File containing paths to description folders of wiki page resources that should be used to create data corpus for computing recommendations
Hash algorithm to be used for calculating sign in Hash2Vec term embedding
UPDATE_PERIOD  = \seekquarry\yioop\configs\ONE_MONTH
Update period to consider for fetching the records from ITEM_IMPRESSION_SUMMARY table
$active_time  : int
Used to track what is the active recommendation timestamp
$controller  : object
If MediaJob was instantiated in the web app, the controller that instatiated it
$cron_model  : object
Model used for timing when things were computed
$db  : object
Datasource object used to run db queries related to recommendation items (for storing and updating them)
$item_idf  : array<string|int, mixed>
Associative array of the number of items a term appears in
$lru_cache  : mixed
LRUCache for term embeddings
$media_updater  : object
If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater
$name_server_does_client_tasks  : bool
Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks
$name_server_does_client_tasks_only  : bool
Whether this MediaJob performs name server only tasks
$tasks  : array<string|int, mixed>
The most recently received from the name server tasks for this MediaJob
$update_time  : int
Time in current epoch when analytics last updated
$user_idf  : array<string|int, mixed>
Associative array of the number of user views a term appears in
__construct()  : mixed
Instiates the MediaJob with a reference to the object that instatiated it
checkPrerequisites()  : bool
Only update if its been more than an hour since the last update
cleanRemoveStopWords()  : array<string|int, mixed>
Split the given text into terms, clean the terms by removing non alphanumeric characters and remove the stop terms in order to reduce the noise while calculating the embeddings
computeGroupEmbeddings()  : array<string|int, mixed>
Computes the group embeddings using the item embeddings for the items in a group. Additionally fetches the existing group embeddings from database and updates them if the item embeddings are updated
computeGroupUserEmbeddings()  : array<string|int, mixed>
Computes the user embeddings based on the group embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD or are a member in the group
computeGroupUserRecommendations()  : array<string|int, mixed>
Computes the group recommendation for user based on the cosine similarity between user embeddings and group embeddings. Recommendations are calculated for the groups whic user has not interacted with yet and they are not member of that group
computeItemEmbeddings()  : array<string|int, mixed>
Computes the item embeddings for individual items (main thread only and not comments) in groups feeds using the term embeddings for their terms.
computeItemTermEmbeddings()  : array<string|int, mixed>
Computes the term embeddings for individual items (main thread only and not comments) in groups feeds for the terms in their title and description text. Processes only MAX_GROUP_ITEMS which are either newly created or recently edited
computeItemUserEmbeddings()  : array<string|int, mixed>
Computes the user embeddings based on the item embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD
computeItemUserRecommendations()  : array<string|int, mixed>
Computes the items recommendation for user based on the cosine similarity between user embeddings and item embeddings. Recommendations are calculated for the items user have not interacted with yet and items should be from the groups where the user is already a memeber
computeThreadGroupRecommendations()  : mixed
Manages the whole process of computing thread and group recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result
computeWikiResourceEmbeddings()  : array<string|int, mixed>
Computes the embeddings for wiki page resources using the calculated term embeddings and add the metadata details separately to the embeddings
computeWikiResourceRecommendations()  : mixed
Manages the whole process of computing wiki resource recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result
computeWikiTermEmbeddings()  : array<string|int, mixed>
Computes the embedding for new terms in the description of wiki resources and updates the embedding of existing terms using Hash2Vec approach
computeWikiUserEmbeddings()  : array<string|int, mixed>
Computes user embeddings for wiki resources based on the user's resources impression logged in ITEM_IMPRESSION_SUMMARY table for the defined update period
computeWikiUserRecommendations()  : mixed
Computes the wiki resource recommendations based on cosine similarity between resource embeddings and user embeddings
doTasks()  : mixed
This method is run on MediaUpdater client with data gotten from the name server by getTasks. The idea is the client is supposed to then this information and if need be send the results back to the name server
execNameServer()  : array<string|int, mixed>
Executes a method on the name server's JobController.
finishTasks()  : mixed
This method is called on the name server to finish processing any data returned by MediaUpdater clients.
getCurrentMachine()  : string
Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request
getDescriptionFiles()  : array<string|int, mixed>
Returns all the resource description files in a given thumb folder and also recursively scan through subfolders if any
getJobName()  : string
Gets the class name (less namespace and the word Job ) of the current MediaJob
getTasks()  : array<string|int, mixed>
Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.
getTermEmbedding()  : string
Returns the term embedding either from LRU cache or database
getWikiResourceDescriptions()  : array<string|int, mixed>
Fetches the description for the eligible wiki resources having the root folder path captured in RECOMMENDATION_FILE
init()  : mixed
Sets up the database connection so can access tables related to recommendations. Initialize timing info related to job.
initializeNewUserRecommendations()  : mixed
Computes recommendations for users who have yet to receive any recommendation of the given type based on what is the most most popular recommendation
nondistributedTasks()  : mixed
For now analytics update is only done on name server as Yioop currently only supports one DBMS at a time.
prepareTasks()  : mixed
This method is called on the name server to prepare data for any MediaUpdater clients.
putTasks()  : array<string|int, mixed>
After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server
run()  : mixed
Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.
saveTermEmbeddingsCacheToDb()  : mixed
Writes back the term embeddings in cache to database and free up memory
updateTermEmbeddingCache()  : mixed
Updates LRU cache of term embeddings and save the evicted embedding back to database



Length of context window for calculating term embeddings

public mixed CONTEXT_WINDOW_LENGTH = 5


Stop words to exclude from the descriptions fetched by DescriptionUpdate media job

public mixed DESCRIPTION_STOP_WORDS = ["author", "authors", "plot", "genre", "genres", "star", "stars", "credits", "rating", "ratings", "year", "director", "cast", "runtime"]


Hash algorithm to be used for calculating hash in Hash2Vec embedding

public mixed HASH_ALGORITHM = "md5"


Maximum number of resources used in making resource recommendations/ Maximum number of group items to hold in memory in one go

public mixed MAX_BATCH_SIZE = 200


Maximum number of group items used in making recommendations

public mixed MAX_GROUP_ITEMS = 50000


MAX term embeddings fetched from database to initialize LRUCache

public mixed MAX_TERM_EMBEDDINGS = 500


Maximum number of terms used in making recommendations

public mixed MAX_TERMS = 20000


File containing paths to description folders of wiki page resources that should be used to create data corpus for computing recommendations

public mixed RECOMMENDATION_FILE = \seekquarry\yioop\configs\APP_DIR . "/resources/recommendation.txt"


Hash algorithm to be used for calculating sign in Hash2Vec term embedding

public mixed SIGN_HASH_ALGORITHM = "crc32"


Update period to consider for fetching the records from ITEM_IMPRESSION_SUMMARY table

public mixed UPDATE_PERIOD = \seekquarry\yioop\configs\ONE_MONTH



Used to track what is the active recommendation timestamp

public int $active_time


If MediaJob was instantiated in the web app, the controller that instatiated it

public object $controller


Model used for timing when things were computed

public object $cron_model


Datasource object used to run db queries related to recommendation items (for storing and updating them)

public object $db


Associative array of the number of items a term appears in

public array<string|int, mixed> $item_idf


If the MediaJob was instantiated in a MediaUpdater, this is a reference to that updater

public object $media_updater


Whether to run the job's client tasks on the name server in addition to prepareTasks and finishTasks

public bool $name_server_does_client_tasks


Whether this MediaJob performs name server only tasks

public bool $name_server_does_client_tasks_only


The most recently received from the name server tasks for this MediaJob

public array<string|int, mixed> $tasks


Time in current epoch when analytics last updated

public int $update_time


Associative array of the number of user views a term appears in

public array<string|int, mixed> $user_idf



Instiates the MediaJob with a reference to the object that instatiated it

public __construct([object $media_updater = null ][, object $controller = null ]) : mixed
$media_updater : object = null

a reference to the media updater that instatiated this object (if being run in MediaUpdater)

$controller : object = null

a reference to the controller that instantiated this object (if being run in the web app)

Return values


Only update if its been more than an hour since the last update

public checkPrerequisites() : bool
Return values

whether its been an hour since the last update


Split the given text into terms, clean the terms by removing non alphanumeric characters and remove the stop terms in order to reduce the noise while calculating the embeddings

public cleanRemoveStopWords(string $text[, bool $description_stop_word_flag = false ]) : array<string|int, mixed>
$text : string

which needs to be processed

$description_stop_word_flag : bool = false

to remove words present in DESCRIPTION_STOP_WORDS

Return values
array<string|int, mixed>

$terms [term_id, term] term_id calculated using md5 hash for the term


Computes the group embeddings using the item embeddings for the items in a group. Additionally fetches the existing group embeddings from database and updates them if the item embeddings are updated

public computeGroupEmbeddings(array<string|int, mixed> $item_embeddings) : array<string|int, mixed>
$item_embeddings : array<string|int, mixed>

embedding for the items

Return values
array<string|int, mixed>

$updated_group_embeddings containing embeddings for groups


Computes the user embeddings based on the group embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD or are a member in the group

public computeGroupUserEmbeddings(array<string|int, mixed> $group_embeddings) : array<string|int, mixed>
$group_embeddings : array<string|int, mixed>

embedding vectors of groups

Return values
array<string|int, mixed>

[$group_user_embedding, $user_groups] user embeddings for groups and the groups id user have membership


Computes the group recommendation for user based on the cosine similarity between user embeddings and group embeddings. Recommendations are calculated for the groups whic user has not interacted with yet and they are not member of that group

public computeGroupUserRecommendations(array<string|int, mixed> $group_embeddings, array<string|int, mixed> $group_user_embeddings, array<string|int, mixed> $user_groups, mixed $user_group_impression) : array<string|int, mixed>
$group_embeddings : array<string|int, mixed>

embeddings vector for groups

$group_user_embeddings : array<string|int, mixed>

embeddings vector for users

$user_groups : array<string|int, mixed>

groups id for user having membership

$user_group_impression : mixed
Return values
array<string|int, mixed>

$user_group_impression group ids which user has seen


Computes the item embeddings for individual items (main thread only and not comments) in groups feeds using the term embeddings for their terms.

public computeItemEmbeddings(array<string|int, mixed> $item_terms) : array<string|int, mixed>

Additionally fetches the existing item embeddings from database and updates them if the term embeddings are updated for their terms

$item_terms : array<string|int, mixed>

terms in each item

Return values
array<string|int, mixed>

$updated_item_embeddings containing embeddings for items


Computes the term embeddings for individual items (main thread only and not comments) in groups feeds for the terms in their title and description text. Processes only MAX_GROUP_ITEMS which are either newly created or recently edited

public computeItemTermEmbeddings() : array<string|int, mixed>
Return values
array<string|int, mixed>

$item_terms terms in each item


Computes the user embeddings based on the item embeddings which user have impression in ITEM_IMPRESSION_SUMMARY table for defined UPDATE_PERIOD

public computeItemUserEmbeddings(array<string|int, mixed> $item_embeddings) : array<string|int, mixed>
$item_embeddings : array<string|int, mixed>

embedding vectors of items

Return values
array<string|int, mixed>

[$item_user_embedding, $user_items] user embeddings for items and the items id user have impression


Computes the items recommendation for user based on the cosine similarity between user embeddings and item embeddings. Recommendations are calculated for the items user have not interacted with yet and items should be from the groups where the user is already a memeber

public computeItemUserRecommendations(array<string|int, mixed> $item_embeddings, array<string|int, mixed> $item_user_embeddings, array<string|int, mixed> $user_items) : array<string|int, mixed>
$item_embeddings : array<string|int, mixed>

embeddings vectors for items

$item_user_embeddings : array<string|int, mixed>

embeddings vectors for user

$user_items : array<string|int, mixed>

items id for user in impression table

Return values
array<string|int, mixed>

$user_groups group ids where the user is a member


Manages the whole process of computing thread and group recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result

public computeThreadGroupRecommendations() : mixed
Return values


Computes the embeddings for wiki page resources using the calculated term embeddings and add the metadata details separately to the embeddings

public computeWikiResourceEmbeddings(array<string|int, mixed> $resource_terms, array<string|int, mixed> $meta_details_terms) : array<string|int, mixed>
$resource_terms : array<string|int, mixed>

of processed terms from resource description

$meta_details_terms : array<string|int, mixed>

of raw resource descriptions

Return values
array<string|int, mixed>

$updated_item_embeddings array of updated wiki resource embeddings


Manages the whole process of computing wiki resource recommendations for users. Makes a series of calls to handle parts of this computation before synthesizing the result

public computeWikiResourceRecommendations() : mixed
Return values


Computes the embedding for new terms in the description of wiki resources and updates the embedding of existing terms using Hash2Vec approach

public computeWikiTermEmbeddings(array<string|int, mixed> $descriptions) : array<string|int, mixed>
$descriptions : array<string|int, mixed>

of resources

Return values
array<string|int, mixed>

[$resource_terms, $meta_details_term]


Computes user embeddings for wiki resources based on the user's resources impression logged in ITEM_IMPRESSION_SUMMARY table for the defined update period

public computeWikiUserEmbeddings(array<string|int, mixed> $item_embeddings) : array<string|int, mixed>
$item_embeddings : array<string|int, mixed>

of wiki page resources embedding

Return values
array<string|int, mixed>

[$user_embeddings, $user_items] of user embeddings for wiki resources and the user resource impression


Computes the wiki resource recommendations based on cosine similarity between resource embeddings and user embeddings

public computeWikiUserRecommendations(array<string|int, mixed> $item_embeddings, array<string|int, mixed> $user_embeddings, array<string|int, mixed> $user_items, mixed $resource_metadata) : mixed
$item_embeddings : array<string|int, mixed>

of wiki resources embeddings

$user_embeddings : array<string|int, mixed>

of users consumed wiki resources embeddings

$user_items : array<string|int, mixed>

of users consumed wiki resources

$resource_metadata : mixed
Return values


This method is run on MediaUpdater client with data gotten from the name server by getTasks. The idea is the client is supposed to then this information and if need be send the results back to the name server

public doTasks(array<string|int, mixed> $tasks) : mixed
$tasks : array<string|int, mixed>

data that the MediaJob running on a client MediaUpdater needs to process

Return values

the result of carrying out that processing


Executes a method on the name server's JobController.

public static execNameServer(string $command[, string $args = null ]) : array<string|int, mixed>

It will typically execute either getTask or putTask for a specific Mediajob or getUpdateProperties to find out the current MediaUpdater should be configured.

$command : string

the method to invoke on the name server

$args : string = null

additional arguments to be passed to the name server

Return values
array<string|int, mixed>

data returned by the name server.


This method is called on the name server to finish processing any data returned by MediaUpdater clients.

public finishTasks() : mixed
Return values


Returns a hash of the url of the current machine based on the value saved to self::current_machine_info_file by a machine statuses request

public static getCurrentMachine() : string
Return values

hash of current machine url


Returns all the resource description files in a given thumb folder and also recursively scan through subfolders if any

public getDescriptionFiles(string $thumb_folder) : array<string|int, mixed>
$thumb_folder : string

path of a thumb folder

Return values
array<string|int, mixed>

$files list of description files path in given folder


Gets the class name (less namespace and the word Job ) of the current MediaJob

public static getJobName() : string
Return values

name of the current job


Method called from JobController when a MediaUpdater client contacts the name server's web app. This method is supposed to marshal any data on the name server that the requesting client should process.

public getTasks(int $machine_id[, array<string|int, mixed> $data = null ]) : array<string|int, mixed>
$machine_id : int

id of client requesting data

$data : array<string|int, mixed> = null

any additional info about data being requested

Return values
array<string|int, mixed>

work for the client to process


Returns the term embedding either from LRU cache or database

public getTermEmbedding(int $term_id, int $item_type[, bool $update = false ]) : string
$term_id : int
$item_type : int
$update : bool = false

indicates whether to update the cache

Return values



Fetches the description for the eligible wiki resources having the root folder path captured in RECOMMENDATION_FILE

public getWikiResourceDescriptions() : array<string|int, mixed>
Return values
array<string|int, mixed>

$descriptions of resources


Sets up the database connection so can access tables related to recommendations. Initialize timing info related to job.

public init() : mixed
Return values


Computes recommendations for users who have yet to receive any recommendation of the given type based on what is the most most popular recommendation

public initializeNewUserRecommendations() : mixed
Return values


For now analytics update is only done on name server as Yioop currently only supports one DBMS at a time.

public nondistributedTasks() : mixed
Return values


This method is called on the name server to prepare data for any MediaUpdater clients.

public prepareTasks() : mixed
Return values


After a MediaUpdater client is done with the task given to it by the name server's media updater, the client contact the name server's web app. The name servers web app's JobController then calls this method to receive the data on the name server

public putTasks(int $machine_id, mixed $data) : array<string|int, mixed>
$machine_id : int

id of client that is sending data to name server

$data : mixed

results of computation done by client

Return values
array<string|int, mixed>

any response information to send back to the client


Method executed by MediaUpdater to perform the MediaJob. This method shouldn't need to be overridden. Instead, the various callbacks it calls (listed in the class description) wshould be overridden.

public run() : mixed
Return values


Writes back the term embeddings in cache to database and free up memory

public saveTermEmbeddingsCacheToDb(int $item_type) : mixed
$item_type : int

value for ITEM_TYPE column

Return values


Updates LRU cache of term embeddings and save the evicted embedding back to database

public updateTermEmbeddingCache(int $term_id, string $term_embedding, int $item_type[, mixed $message = "" ]) : mixed
$term_id : int
$term_embedding : string
$item_type : int
$message : mixed = ""
Return values


Search results