LinearHashTable
in package
This class implements a linear hash table for storing records that use PackedTableTools for their format
Tags
Table of Contents
- ACTIVE_INDEX = "active"
- Name of staging partition used for rows before they are added to the main linear hash table
- DEFAULT_COMPRESSOR = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
- Compression strategy used to compress blob and serial columns
- DEFAULT_PARAMETERS = ["COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]
- Default parameters to use when constructing a LinearHashTable
- HASH_INDEX_EXTENSION = ".hix"
- File extension to use for hash index files for partitions These are the file buckets which have names based on portions of hash key values and store records for keys matching the bit pattern
- INDEX_CACHE_SIZE = 100
- Maximum number of indexes to partitions to cache for reads
- KEY_PREFIX = "key_"
- For each archive partition, the file with the list of keys (not hashes of keys) that were added for this partition will be prefixed this string.
- MAX_ITEMS_PER_FILE = 16384
- Default maximum number of records to store in a hash table partition
- PARAMETERS_FILE = "lht_parameters.txt"
- File name of file used to store the parameters of this LinearHashTable
- PARTITION_PREFIX = "partition_"
- Prefix to file names of LinearHashTable partition files
- PARTITION_SIZE_THRESHOLD = 2147483648
- Maximum number of bytes a partition can have before a split.
- $active_index : index
- In memory copy of the active partition index. key => packed_row pairs for the active partition.
- $add_archive_cache : array<string|int, mixed>
- Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition added to
- $add_index_cache : array<string|int, mixed>
- Field variable used as a cache for the file handle, file name, and time of the last index hash bucket file was added to
- $add_key_archive_cache : array<string|int, mixed>
- Field variable used as a cache for the file handle, file name, and time of the last key index file for a partition added to (the key index file is just a sorted file of the keys that appear in a hash table partition). Its intended use is to aid in the creation of additional index structures on top of the linear hash table
- $blob_columns : array<string|int, mixed>
- Array of column names for the columns in a LinearHashTable which are of type BLOB or SERIAL
- $compressor : object
- The seekquarry\yioop\library\compressors\Compressor object used to compress record files and blob items.
- $folder : string
- Folder path where the LinearHashTable is stored
- $get_archive_cache : array<string|int, mixed>
- Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition accessed for a value
- $index_buffer : array<string|int, mixed>
- Used to cached hash_buck_path => contents_of_hash_bucket pairs in to speed up retrieval and update.
- $instance_time : int
- Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the table)
- $key_field : string
- Name of primary key column for records
- $parameters : array<string|int, mixed>
- Stores the constructor parameters used to create this LinearHashTable
- $save_partition_handle : resource
- File handle for index file of current partition
- $serial_columns : array<string|int, mixed>
- Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
- $table_tools : object
- The PackedTableTools object used to pack and unpack records in LinearHashTable
- __construct() : mixed
- Creates/Loads LinearHashTable having specified folder and parameters
- addCount() : mixed
- Add $num to maintained counter $field
- advanceSavePartition() : bool
- Save the active partition to $new_save_partition and advances the next save partition to be $new_save_partition + 1. If $new_save_partition ==0, $this->parameters["SAVE_PARTITION"] + 1 is used for $new_save_partition . If $new_save_partition <= $this->parameters["SAVE_PARTITION"], then this function returns false. Saving the active partition means moving key => record entries from the active index to their appropriate linear hash table buckets. The Archive file for the save partition is closed, and a new file handle for the next save partition file is opened.
- bitStatistics() : array<string|int, mixed>
- Given an item $count, and a maximum number of items in a bucket for the LHT determines, the number of buckets, the maximum number of bits to be used from the hash function to determine bucket, 2^max_bits -1, and a threshold on when to use this number of bit versus one less.
- delete() : bool
- Deletes a key or set of keys and their associated values from the linear hash table
- exists() : bool
- Checks if a key exists in the linear hash table
- get() : mixed
- Returns the record associated with the given $key in the LinearHashTable
- getActiveIndex() : string
- Returns the path to the index file for the active partition (the key value pairs not yet stored in the main LinearHashTable)
- getArchive() : string
- Returns the blob item from $archive_filename at $offset of length $len, uncompress the result
- getHashPath() : string
- Gets the file path for the bucket associated with $hash_key, assuming the number of items in the linear hash table is $count, $max_items_per_file is the maximum number of items per file.
- getKeyPartition() : string
- Returns the path to the key index file for the partition $i (file used to store blob and serial columns)
- getParameterInfo() : mixed
- Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the LinearHashTable stored at $folder
- getPartition() : string
- Returns the path to the archive file for the partition $i (file used to store blob and serial columns)
- hashKey() : string
- Computes the fixed length hash value of a $key for the purposes of storing the key associated with a value in this lienar hash table.
- initCountIfNotExists() : mixed
- Creates a new counter $field to be maintained
- put() : bool
- Used to add a new record or an array of new records to the LinearHashTable
- saveParameters() : mixed
- Save the operating parameters of this LinearHashTable
- addArchive() : array<string|int, mixed>
- Add a $value to save partition of this linear hash table
- addIndex() : bool
- Adds the $hash_key, $value pair to the linear hash table bucket (index file) for $hash_key under the assumption that $count items are in the hash table. If bulk_mode is enabled then the index file is kept in memory rather than immediately flushed to disk.
- addKeyArchive() : mixed
- Add a key (not its hash) to the archive file for keys for the current save partition. This is used to keep track of all the keys stored in a partition. Note a key archive is only created if the constructor $with_key_archive parameter was called with value true
- checkSplitMerge() : int
- Checks if $new_count many stored items would entail splitting one of the hash partitions of this linear hash table (> 0), require no change in the number of hash partitions (0), or require two partitions to be merges (< 0)
- computeMigratePaths() : array<string|int, mixed>
- Given $count items in LHT with bucket size $max_items_per_file, and assuming a split/merge just occurred, returns the path of the two new buckets if a split, and the one new merged bucket if a merge
- getIndexInfo() : array<string|int, mixed>
- Computes the path to the hash partition that would contain $hash_key as well as the contents of that partition
- insertRecordsFromIndex() : mixed
- Reads in index records from $hash_path, then reinserts them into LHT under the assumption that the size of the LHT is $new_count (defaults to current size + 1)
- mergeMigrate() : mixed
- Handles merging two buckets into one, which might happen after a delete on the LHT
- putIndex() : bool
- Assuming a $value record where blob items have already been replaced by their integer offsets into the archive partition, inserts in the LHT a $value record according to this LHT's signature for $key.
- splitMigrate() : mixed
- Handles splitting one bucket into two, which might happen after an insert on the LHT
- unlinkHashPath() : mixed
- Unlinks the index file at the given $hash_path if it exists
Constants
ACTIVE_INDEX
Name of staging partition used for rows before they are added to the main linear hash table
public
mixed
ACTIVE_INDEX
= "active"
DEFAULT_COMPRESSOR
Compression strategy used to compress blob and serial columns
public
mixed
DEFAULT_COMPRESSOR
= \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
DEFAULT_PARAMETERS
Default parameters to use when constructing a LinearHashTable
public
mixed
DEFAULT_PARAMETERS
= ["COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]
HASH_INDEX_EXTENSION
File extension to use for hash index files for partitions These are the file buckets which have names based on portions of hash key values and store records for keys matching the bit pattern
public
mixed
HASH_INDEX_EXTENSION
= ".hix"
INDEX_CACHE_SIZE
Maximum number of indexes to partitions to cache for reads
public
mixed
INDEX_CACHE_SIZE
= 100
KEY_PREFIX
For each archive partition, the file with the list of keys (not hashes of keys) that were added for this partition will be prefixed this string.
public
mixed
KEY_PREFIX
= "key_"
MAX_ITEMS_PER_FILE
Default maximum number of records to store in a hash table partition
public
mixed
MAX_ITEMS_PER_FILE
= 16384
PARAMETERS_FILE
File name of file used to store the parameters of this LinearHashTable
public
mixed
PARAMETERS_FILE
= "lht_parameters.txt"
PARTITION_PREFIX
Prefix to file names of LinearHashTable partition files
public
mixed
PARTITION_PREFIX
= "partition_"
PARTITION_SIZE_THRESHOLD
Maximum number of bytes a partition can have before a split.
public
mixed
PARTITION_SIZE_THRESHOLD
= 2147483648
Notice this implies a maximum file size to store in BLOB columns
Properties
$active_index
In memory copy of the active partition index. key => packed_row pairs for the active partition.
public
index
$active_index
$add_archive_cache
Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition added to
public
array<string|int, mixed>
$add_archive_cache
= [null, "", -1]
$add_index_cache
Field variable used as a cache for the file handle, file name, and time of the last index hash bucket file was added to
public
array<string|int, mixed>
$add_index_cache
= ["", [], -1]
$add_key_archive_cache
Field variable used as a cache for the file handle, file name, and time of the last key index file for a partition added to (the key index file is just a sorted file of the keys that appear in a hash table partition). Its intended use is to aid in the creation of additional index structures on top of the linear hash table
public
array<string|int, mixed>
$add_key_archive_cache
= [null, "", -1]
$blob_columns
Array of column names for the columns in a LinearHashTable which are of type BLOB or SERIAL
public
array<string|int, mixed>
$blob_columns
$compressor
The seekquarry\yioop\library\compressors\Compressor object used to compress record files and blob items.
public
object
$compressor
$folder
Folder path where the LinearHashTable is stored
public
string
$folder
$get_archive_cache
Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition accessed for a value
public
array<string|int, mixed>
$get_archive_cache
= [null, "", -1]
$index_buffer
Used to cached hash_buck_path => contents_of_hash_bucket pairs in to speed up retrieval and update.
public
array<string|int, mixed>
$index_buffer
$instance_time
Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the table)
public
int
$instance_time
$key_field
Name of primary key column for records
public
string
$key_field
$parameters
Stores the constructor parameters used to create this LinearHashTable
public
array<string|int, mixed>
$parameters
$save_partition_handle
File handle for index file of current partition
public
resource
$save_partition_handle
$serial_columns
Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
public
array<string|int, mixed>
$serial_columns
$table_tools
The PackedTableTools object used to pack and unpack records in LinearHashTable
public
object
$table_tools
Methods
__construct()
Creates/Loads LinearHashTable having specified folder and parameters
public
__construct(string $folder[, array<string|int, mixed> $format = self::DEFAULT_PARAMETERS["FORMAT"] ][, int $max_items_per_file = self::MAX_ITEMS_PER_FILE ][, int $partition_size_threshold = self::PARTITION_SIZE_THRESHOLD ][, object $compressor_type = self::DEFAULT_COMPRESSOR ][, bool $with_key_archive = false ]) : mixed
Parameters
- $folder : string
-
is the folder for storing the LinearHashTable files
- $format : array<string|int, mixed> = self::DEFAULT_PARAMETERS["FORMAT"]
-
the column names, keys and types for this LinearHashTable object
- $max_items_per_file : int = self::MAX_ITEMS_PER_FILE
-
maximum number of items to store in a partition before making the next partition
- $partition_size_threshold : int = self::PARTITION_SIZE_THRESHOLD
-
maximum length of a partition file in bytes before a new partition file should be started
- $compressor_type : object = self::DEFAULT_COMPRESSOR
-
seekquarry\yioop\library\compressors\Compressor object used to compress index files and blob items.
- $with_key_archive : bool = false
-
whether to create and add to a key archive when doing inserts
Return values
mixed —addCount()
Add $num to maintained counter $field
public
addCount(int $num[, string $field = "COUNT" ]) : mixed
Parameters
- $num : int
-
number of items to add to current count
- $field : string = "COUNT"
-
field of info struct to add to the count of
Return values
mixed —advanceSavePartition()
Save the active partition to $new_save_partition and advances the next save partition to be $new_save_partition + 1. If $new_save_partition ==0, $this->parameters["SAVE_PARTITION"] + 1 is used for $new_save_partition . If $new_save_partition <= $this->parameters["SAVE_PARTITION"], then this function returns false. Saving the active partition means moving key => record entries from the active index to their appropriate linear hash table buckets. The Archive file for the save partition is closed, and a new file handle for the next save partition file is opened.
public
advanceSavePartition(int $new_save_partition) : bool
Parameters
- $new_save_partition : int
-
index of the save partition when starting a new one.
Return values
bool —success (true) or failure (false)
bitStatistics()
Given an item $count, and a maximum number of items in a bucket for the LHT determines, the number of buckets, the maximum number of bits to be used from the hash function to determine bucket, 2^max_bits -1, and a threshold on when to use this number of bit versus one less.
public
bitStatistics(int $count, int $max_items_per_file) : array<string|int, mixed>
Parameters
- $count : int
-
number of items in LHT
- $max_items_per_file : int
-
max items per bucket
Return values
array<string|int, mixed> —4-tuple [num files, max_num_bit, 2^max_bits -1, threshold]
delete()
Deletes a key or set of keys and their associated values from the linear hash table
public
delete(mixed $key_or_keys[, bool $is_hash_key = false ]) : bool
Parameters
- $key_or_keys : mixed
-
either the string of a key or an array of strings of keys to delete
- $is_hash_key : bool = false
-
whether or not the key or keys have already been hash using the linear hash table's hash function
Return values
bool —success (true) or failure (false) for deletion
exists()
Checks if a key exists in the linear hash table
public
exists(string $key[, bool $compute_hash = true ][, bool $check_active = true ]) : bool
Parameters
- $key : string
-
key to check
- $compute_hash : bool = true
-
whether the key has already had the linear hash table's hash function applied or not
- $check_active : bool = true
-
whether to check the active index of buffered key values that have not yet been put into the main table
Return values
bool —whether the $key exists in the linear hash table
get()
Returns the record associated with the given $key in the LinearHashTable
public
get(string $key[, array<string|int, mixed> $fields = [] ][, bool $is_hash_key = false ]) : mixed
Parameters
- $key : string
-
a key to look up
- $fields : array<string|int, mixed> = []
-
which fields (columns) from the record to return
- $is_hash_key : bool = false
-
has the supplied $key already been hash using the LinearHashTable's hash function (the function used to determine the partition a key should be in)
Return values
mixed —array containing unpacked record associated with $key if it exists in the table, false otherwise
getActiveIndex()
Returns the path to the index file for the active partition (the key value pairs not yet stored in the main LinearHashTable)
public
getActiveIndex() : string
Return values
string —path to active partition
getArchive()
Returns the blob item from $archive_filename at $offset of length $len, uncompress the result
public
getArchive(string $archive_filename, int $offset, int $len) : string
Parameters
- $archive_filename : string
-
path to an archive node file for this LinearHashTable
- $offset : int
-
byte offset into archive node file file
- $len : int
-
length of blob item
Return values
string —uncompressed blob item from $archive_filename
getHashPath()
Gets the file path for the bucket associated with $hash_key, assuming the number of items in the linear hash table is $count, $max_items_per_file is the maximum number of items per file.
public
getHashPath(string $hash_key[, mixed $count = -1 ][, int $max_items_per_file = -1 ][, bool $mkdir_if_not_exists = false ]) : string
Parameters
- $hash_key : string
-
LHT hash of a key value
- $count : mixed = -1
- $max_items_per_file : int = -1
-
maximum number of items per bucket (defaults to this parameter for the current LHT)
- $mkdir_if_not_exists : bool = false
-
make folders along path that don't exist already if true.
Return values
string —path to bucket
getKeyPartition()
Returns the path to the key index file for the partition $i (file used to store blob and serial columns)
public
getKeyPartition(mixed $i) : string
Parameters
- $i : mixed
Return values
string —path to partition $i key index file
getParameterInfo()
Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the LinearHashTable stored at $folder
public
static getParameterInfo(mixed $folder) : mixed
Parameters
- $folder : mixed
Return values
mixed —getPartition()
Returns the path to the archive file for the partition $i (file used to store blob and serial columns)
public
getPartition(mixed $i) : string
Parameters
- $i : mixed
Return values
string —path to partition $i archive file
hashKey()
Computes the fixed length hash value of a $key for the purposes of storing the key associated with a value in this lienar hash table.
public
hashKey(string $key) : string
Parameters
- $key : string
-
to hash
Return values
string —fixed length result of hashing
initCountIfNotExists()
Creates a new counter $field to be maintained
public
initCountIfNotExists([string $field = "COUNT" ]) : mixed
Parameters
- $field : string = "COUNT"
-
field of info struct to add a counter for
Return values
mixed —put()
Used to add a new record or an array of new records to the LinearHashTable
public
put(array<string|int, mixed> $row_or_rows[, bool $is_hash_key = false ][, bool $allow_duplicates = true ]) : bool
Parameters
- $row_or_rows : array<string|int, mixed>
-
either array of record with fields given by this LinearHashTable's signature or an array of rows.
- $is_hash_key : bool = false
-
whether or not the key_field in the row or rows has already been hash to a string suitable for storage in this LinearHashTable
- $allow_duplicates : bool = true
-
whether to allow multiple records with same key or not. If allow then a find will return the most recent.
Return values
bool —success (true) or not (false) for the insert
saveParameters()
Save the operating parameters of this LinearHashTable
public
saveParameters() : mixed
Return values
mixed —addArchive()
Add a $value to save partition of this linear hash table
protected
addArchive(string $value) : array<string|int, mixed>
Parameters
- $value : string
-
value to add
Return values
array<string|int, mixed> —[offset into archive, length data saved, index of partition]
addIndex()
Adds the $hash_key, $value pair to the linear hash table bucket (index file) for $hash_key under the assumption that $count items are in the hash table. If bulk_mode is enabled then the index file is kept in memory rather than immediately flushed to disk.
protected
addIndex(string $hash_key, string $value[, int $count = -1 ][, bool $bulk_mode = false ]) : bool
Parameters
- $hash_key : string
-
key to determine hash table bucket (index file)
- $value : string
-
packed table data to associate with key
- $count : int = -1
-
number of items to assume in linear hash table. If -1 then use based on the saved parameter count
- $bulk_mode : bool = false
-
whether to immediately flush index/bucket to disk
Return values
bool —success (true) or failure (false) of addition
addKeyArchive()
Add a key (not its hash) to the archive file for keys for the current save partition. This is used to keep track of all the keys stored in a partition. Note a key archive is only created if the constructor $with_key_archive parameter was called with value true
protected
addKeyArchive(string $key) : mixed
Parameters
- $key : string
-
key to add
Return values
mixed —checkSplitMerge()
Checks if $new_count many stored items would entail splitting one of the hash partitions of this linear hash table (> 0), require no change in the number of hash partitions (0), or require two partitions to be merges (< 0)
protected
checkSplitMerge(int $new_count) : int
Parameters
- $new_count : int
-
new item count
Return values
int —change (<0 merge, >0split) or no change required (0).
computeMigratePaths()
Given $count items in LHT with bucket size $max_items_per_file, and assuming a split/merge just occurred, returns the path of the two new buckets if a split, and the one new merged bucket if a merge
protected
computeMigratePaths([int $count = -1 ][, int $max_items_per_file = -1 ]) : array<string|int, mixed>
Parameters
- $count : int = -1
-
number of items in LHT
- $max_items_per_file : int = -1
-
maximum number of items/bucket
Return values
array<string|int, mixed> —[path to merge bucket, path to split bucket low, path to split bucket high]
getIndexInfo()
Computes the path to the hash partition that would contain $hash_key as well as the contents of that partition
protected
getIndexInfo(string $hash_key[, int $count = -1 ]) : array<string|int, mixed>
Parameters
- $hash_key : string
-
key to find partition for
- $count : int = -1
-
number of items assumed to be stored in table, if -1 uses $this->parameters['COUNT']
Return values
array<string|int, mixed> —[hash_partition_path, hash_partition_contents] if $count is not -1, then the partition might not exists in which case the second component might be -1
insertRecordsFromIndex()
Reads in index records from $hash_path, then reinserts them into LHT under the assumption that the size of the LHT is $new_count (defaults to current size + 1)
protected
insertRecordsFromIndex(string $hash_path[, int $new_count = -1 ]) : mixed
Parameters
- $hash_path : string
-
location of bucket to read records from
- $new_count : int = -1
-
size of LHT to assume when reinserting records
Return values
mixed —mergeMigrate()
Handles merging two buckets into one, which might happen after a delete on the LHT
protected
mergeMigrate() : mixed
Return values
mixed —putIndex()
Assuming a $value record where blob items have already been replaced by their integer offsets into the archive partition, inserts in the LHT a $value record according to this LHT's signature for $key.
protected
putIndex(string $key, string $value[, bool $is_hash_key = false ], int $change_count[, bool $bulk_insert = false ]) : bool
Parameters
- $key : string
-
key to use for insert
- $value : string
-
record accordin to this LHT's signature
- $is_hash_key : bool = false
-
whether or not $key has already been hash using this LHT's hash function
- $change_count : int
-
for much the size of the LHT should change by
- $bulk_insert : bool = false
-
whether the current insert is one of many (affects how long items are kept in memory flushing)
Return values
bool —success of insert (true) or failure (false)
splitMigrate()
Handles splitting one bucket into two, which might happen after an insert on the LHT
protected
splitMigrate() : mixed
Return values
mixed —unlinkHashPath()
Unlinks the index file at the given $hash_path if it exists
protected
unlinkHashPath(string $hash_path) : mixed
Parameters
- $hash_path : string
-
path to index file