PriorityQueue
extends StringArray
in package
implements
CrawlConstants
Code used to manage a memory efficient priority queue.
Weights for the queue must be flaots. The queue itself is implemented using heaps
Tags
Interfaces, Classes, Traits and Enums
- CrawlConstants
- Shared constants and enums used by components that are involved in the crawling process
Table of Contents
- DEFAULT_SAVE_FREQUENCY = 50000
- If not specified in the constructor, this will be the number of operations between saves
- $adjust_weight_callback : string
- Name of a function to be called if an element in the priority has moved because its weight was adjusted this allows other data structures the ability to know where the element has moved to in the priority queue
- $count : int
- Number of items that are currently stored in the queue
- $data_size : int
- Size of each item in bytes to be stored
- $filename : string
- Name of the file in which to store the PersistentStructure
- $min_or_max : string
- When the polling the queue returns the least or most weighted value
- $notifier : object
- An object that implements the Notifier interface (for instance, WebQueueArchive)
- $num_values : int
- Number of values that can be stored in the priority queue
- $save_frequency : int
- Number of operation between saves. If == -1 never save using checkSave
- $string_array : string
- Character string used to store the packed data of the StringArray
- $string_array_size : int
- Number of bytes of storage need by the string array
- $unsaved_operations : int
- Number of operations since the last save
- $value_size : int
- Number of bytes needed to store a value associated with a weight
- $weight_size : int
- Number of bytes needed to store a weight in the queue
- __construct() : mixed
- Makes a priority queue (implemented as an array heap) with the given operating parameters
- adjustWeight() : mixed
- Add $delta to the $ith element in the priority queue and then adjusts the queue to store the heap property
- checkSave() : mixed
- Add one to the unsaved_operations count. If this goes above the save_frquency then save the PersistentStructure to secondary storage
- compare() : int
- Computes the difference of the two values $value1 and $value2
- get() : string
- Looks up the ith item in the StringArray
- getContents() : array<string|int, mixed>
- Return the contents of the priority queue as an array of value weight pairs.
- getRow() : array<string|int, mixed>
- Gets the ith element of the PriorityQueue viewed as an array
- insert() : mixed
- Inserts a new item into the priority queue.
- load() : object
- Load a PersistentStructure from a file
- normalize() : mixed
- Scales the weights of elements in the queue so that the sum fo the new weights is $new_total
- peek() : mixed
- Gets the data stored at the ith location in the priority queue
- percolateDown() : mixed
- If the ith element in the PriorityQueue violates the heap property with some child node (children should be of lower priority than the parent), this function tries modify the heap to restore the heap property.
- percolateUp() : int
- If the $ith element in the PriorityQueue violates the heap property with its parent node (children should be of lower priority than the parent), this function tries modify the heap to restore the heap property.
- poll() : mixed
- Removes and returns the ith element out of the Priority queue.
- printContents() : mixed
- Pretty prints the contents of the queue viewed as an array.
- put() : mixed
- Puts data into the ith item of the StringArray
- putRow() : mixed
- Add data to the $i row of the priority queue viewed as an array Calls the notifier associated with this queue about the change in data's location
- save() : mixed
- Save the PersistentStructure to its filename This method is generic but super memory inefficient, so reimplement for subclasses is needed
- totalWeight() : int
- Computes and returns the weight of all items in priority queue
Constants
DEFAULT_SAVE_FREQUENCY
If not specified in the constructor, this will be the number of operations between saves
public
int
DEFAULT_SAVE_FREQUENCY
= 50000
Properties
$adjust_weight_callback
Name of a function to be called if an element in the priority has moved because its weight was adjusted this allows other data structures the ability to know where the element has moved to in the priority queue
public
string
$adjust_weight_callback
$count
Number of items that are currently stored in the queue
public
int
$count
$data_size
Size of each item in bytes to be stored
public
int
$data_size
$filename
Name of the file in which to store the PersistentStructure
public
string
$filename
$min_or_max
When the polling the queue returns the least or most weighted value
public
string
$min_or_max
$notifier
An object that implements the Notifier interface (for instance, WebQueueArchive)
public
object
$notifier
$num_values
Number of values that can be stored in the priority queue
public
int
$num_values
$save_frequency
Number of operation between saves. If == -1 never save using checkSave
public
int
$save_frequency
$string_array
Character string used to store the packed data of the StringArray
public
string
$string_array
$string_array_size
Number of bytes of storage need by the string array
public
int
$string_array_size
$unsaved_operations
Number of operations since the last save
public
int
$unsaved_operations
$value_size
Number of bytes needed to store a value associated with a weight
public
int
$value_size
$weight_size
Number of bytes needed to store a weight in the queue
public
int
$weight_size
= 4
Methods
__construct()
Makes a priority queue (implemented as an array heap) with the given operating parameters
public
__construct(string $fname, int $num_values, int $value_size, string $min_or_max[, object $notifier = null ][, int $save_frequency = self::DEFAULT_SAVE_FREQUENCY ][, string $adjust_weight_callback = null ]) : mixed
Parameters
- $fname : string
-
filename to store the data associated with the queue
- $num_values : int
-
number of values the queue can hold
- $value_size : int
-
the size in a bytes of a value
- $min_or_max : string
-
whether this priority queue return least or most weight values when polled
- $notifier : object = null
-
object to call when a value changes in the queue
- $save_frequency : int = self::DEFAULT_SAVE_FREQUENCY
-
how often the data in the queue should be save to disk. (It's default location is RAM)
- $adjust_weight_callback : string = null
-
name of a function to be called if an element in the priority has moved because its weight was adjusted this allows other data structures the ability to know where the element has moved to in the priority queue
Return values
mixed —adjustWeight()
Add $delta to the $ith element in the priority queue and then adjusts the queue to store the heap property
public
adjustWeight(int $i, int $delta) : mixed
Parameters
- $i : int
-
element whose weight should be adjusted
- $delta : int
-
how much to change the weight by
Return values
mixed —checkSave()
Add one to the unsaved_operations count. If this goes above the save_frquency then save the PersistentStructure to secondary storage
public
checkSave() : mixed
Return values
mixed —compare()
Computes the difference of the two values $value1 and $value2
public
compare(int $value1, int $value2) : int
Which is subtracted from which is determined by whether this is a min_or_max priority queue
Parameters
- $value1 : int
-
a value to take the difference between
- $value2 : int
-
the other value
Return values
int —the differences
get()
Looks up the ith item in the StringArray
public
get(int $i) : string
Parameters
- $i : int
-
array index of item to look up
Return values
string —the looked-up item of length $this->data_size
getContents()
Return the contents of the priority queue as an array of value weight pairs.
public
getContents() : array<string|int, mixed>
Return values
array<string|int, mixed> —contents of the queue
getRow()
Gets the ith element of the PriorityQueue viewed as an array
public
getRow(int $i) : array<string|int, mixed>
Parameters
- $i : int
-
element to get
Return values
array<string|int, mixed> —value stored in queue together with its weight as a two element array
insert()
Inserts a new item into the priority queue.
public
insert(string $data, int $weight) : mixed
Parameters
- $data : string
-
what to insert into the queue
- $weight : int
-
how much the new data should be weighted
Return values
mixed —index location in queue where item was stored if successful, otherwise false.
load()
Load a PersistentStructure from a file
public
static load(string $fname) : object
Parameters
- $fname : string
-
the name of the file to load the PersistentStructure from
Return values
object —the PersistentStructure loaded
normalize()
Scales the weights of elements in the queue so that the sum fo the new weights is $new_total
public
normalize([int $new_total = CNUM_URLS_QUEUE_RAM ]) : mixed
This function is used periodically to prevent the queue from being gummed up because all of the weights stored in it are too small.
Parameters
- $new_total : int = CNUM_URLS_QUEUE_RAM
-
what the new sum of weights of elements in the queue will be after normalization
Return values
mixed —peek()
Gets the data stored at the ith location in the priority queue
public
peek([int $i = 1 ]) : mixed
Parameters
- $i : int = 1
-
location to return data from
Return values
mixed —array data if the value of $i is between 1 and count, false otherwise
percolateDown()
If the ith element in the PriorityQueue violates the heap property with some child node (children should be of lower priority than the parent), this function tries modify the heap to restore the heap property.
public
percolateDown(int $i) : mixed
Parameters
- $i : int
-
node to consider in restoring the heap property
Return values
mixed —percolateUp()
If the $ith element in the PriorityQueue violates the heap property with its parent node (children should be of lower priority than the parent), this function tries modify the heap to restore the heap property.
public
percolateUp(int $i) : int
Parameters
- $i : int
-
node to consider in restoring the heap property
Return values
int —final position $ith node ends up at
poll()
Removes and returns the ith element out of the Priority queue.
public
poll([int $i = 1 ]) : mixed
Since this is a priority queue the first element in the queue will either be the min or max (depending on queue type) element stored. If $i is not in range an error message is written to the log. This operation also performs a check to see if the queue should be saved to disk
Parameters
- $i : int = 1
-
element to get out of the queue
Return values
mixed —array data if the value of $i is between 1 and count, false otherwise
printContents()
Pretty prints the contents of the queue viewed as an array.
public
printContents() : mixed
Return values
mixed —put()
Puts data into the ith item of the StringArray
public
put(int $i, string $data) : mixed
Parameters
- $i : int
-
array index of where to store data
- $data : string
-
at least $this->data_size many bytes of data to store
Return values
mixed —putRow()
Add data to the $i row of the priority queue viewed as an array Calls the notifier associated with this queue about the change in data's location
public
putRow(int $i, array<string|int, mixed> $row) : mixed
Parameters
- $i : int
-
location to add data
- $row : array<string|int, mixed>
-
data to add (a two element array in the form key, int value).
Return values
mixed —save()
Save the PersistentStructure to its filename This method is generic but super memory inefficient, so reimplement for subclasses is needed
public
save() : mixed
Return values
mixed —totalWeight()
Computes and returns the weight of all items in priority queue
public
totalWeight() : int
Return values
int —weight of all items stored in the priority queue