Yioop_V9.5_Source_Code_Documentation

LinearHashTable
in package

This class implements a linear hash table for storing records that use PackedTableTools for their format

Tags
author

Chris Pollett

Table of Contents

ACTIVE_INDEX  = "active"
Name of staging partition used for rows before they are added to the main linear hash table
DEFAULT_COMPRESSOR  = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
Compression strategy used to compress blob and serial columns
DEFAULT_PARAMETERS  = ["COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]
Default parameters to use when constructing a LinearHashTable
HASH_INDEX_EXTENSION  = ".hix"
File extension to use for hash index files for partitions These are the file buckets which have names based on portions of hash key values and store records for keys matching the bit pattern
INDEX_CACHE_SIZE  = 100
Maximum number of indexes to partitions to cache for reads
KEY_PREFIX  = "key_"
For each archive partition, the file with the list of keys (not hashes of keys) that were added for this partition will be prefixed this string.
MAX_ITEMS_PER_FILE  = 16384
Default maximum number of records to store in a hash table partition
PARAMETERS_FILE  = "lht_parameters.txt"
File name of file used to store the parameters of this LinearHashTable
PARTITION_PREFIX  = "partition_"
Prefix to file names of LinearHashTable partition files
PARTITION_SIZE_THRESHOLD  = 2147483648
Maximum number of bytes a partition can have before a split.
$active_index  : index
In memory copy of the active partition index. key => packed_row pairs for the active partition.
$add_archive_cache  : array<string|int, mixed>
Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition added to
$add_index_cache  : array<string|int, mixed>
Field variable used as a cache for the file handle, file name, and time of the last index hash bucket file was added to
$add_key_archive_cache  : array<string|int, mixed>
Field variable used as a cache for the file handle, file name, and time of the last key index file for a partition added to (the key index file is just a sorted file of the keys that appear in a hash table partition). Its intended use is to aid in the creation of additional index structures on top of the linear hash table
$blob_columns  : array<string|int, mixed>
Array of column names for the columns in a LinearHashTable which are of type BLOB or SERIAL
$compressor  : object
The seekquarry\yioop\library\compressors\Compressor object used to compress record files and blob items.
$folder  : string
Folder path where the LinearHashTable is stored
$get_archive_cache  : array<string|int, mixed>
Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition accessed for a value
$index_buffer  : array<string|int, mixed>
Used to cached hash_buck_path => contents_of_hash_bucket pairs in to speed up retrieval and update.
$instance_time  : int
Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the table)
$key_field  : string
Name of primary key column for records
$parameters  : array<string|int, mixed>
Stores the constructor parameters used to create this LinearHashTable
$save_partition_handle  : resource
File handle for index file of current partition
$serial_columns  : array<string|int, mixed>
Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
$table_tools  : object
The PackedTableTools object used to pack and unpack records in LinearHashTable
__construct()  : mixed
Creates/Loads LinearHashTable having specified folder and parameters
addCount()  : mixed
Add $num to maintained counter $field
advanceSavePartition()  : bool
Save the active partition to $new_save_partition and advances the next save partition to be $new_save_partition + 1. If $new_save_partition ==0, $this->parameters["SAVE_PARTITION"] + 1 is used for $new_save_partition . If $new_save_partition <= $this->parameters["SAVE_PARTITION"], then this function returns false. Saving the active partition means moving key => record entries from the active index to their appropriate linear hash table buckets. The Archive file for the save partition is closed, and a new file handle for the next save partition file is opened.
bitStatistics()  : array<string|int, mixed>
Given an item $count, and a maximum number of items in a bucket for the LHT determines, the number of buckets, the maximum number of bits to be used from the hash function to determine bucket, 2^max_bits -1, and a threshold on when to use this number of bit versus one less.
delete()  : bool
Deletes a key or set of keys and their associated values from the linear hash table
exists()  : bool
Checks if a key exists in the linear hash table
get()  : mixed
Returns the record associated with the given $key in the LinearHashTable
getActiveIndex()  : string
Returns the path to the index file for the active partition (the key value pairs not yet stored in the main LinearHashTable)
getArchive()  : string
Returns the blob item from $archive_filename at $offset of length $len, uncompress the result
getHashPath()  : string
Gets the file path for the bucket associated with $hash_key, assuming the number of items in the linear hash table is $count, $max_items_per_file is the maximum number of items per file.
getKeyPartition()  : string
Returns the path to the key index file for the partition $i (file used to store blob and serial columns)
getParameterInfo()  : mixed
Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the LinearHashTable stored at $folder
getPartition()  : string
Returns the path to the archive file for the partition $i (file used to store blob and serial columns)
hashKey()  : string
Computes the fixed length hash value of a $key for the purposes of storing the key associated with a value in this lienar hash table.
initCountIfNotExists()  : mixed
Creates a new counter $field to be maintained
put()  : bool
Used to add a new record or an array of new records to the LinearHashTable
saveParameters()  : mixed
Save the operating parameters of this LinearHashTable
addArchive()  : array<string|int, mixed>
Add a $value to save partition of this linear hash table
addIndex()  : bool
Adds the $hash_key, $value pair to the linear hash table bucket (index file) for $hash_key under the assumption that $count items are in the hash table. If bulk_mode is enabled then the index file is kept in memory rather than immediately flushed to disk.
addKeyArchive()  : mixed
Add a key (not its hash) to the archive file for keys for the current save partition. This is used to keep track of all the keys stored in a partition. Note a key archive is only created if the constructor $with_key_archive parameter was called with value true
checkSplitMerge()  : int
Checks if $new_count many stored items would entail splitting one of the hash partitions of this linear hash table (> 0), require no change in the number of hash partitions (0), or require two partitions to be merges (< 0)
computeMigratePaths()  : array<string|int, mixed>
Given $count items in LHT with bucket size $max_items_per_file, and assuming a split/merge just occurred, returns the path of the two new buckets if a split, and the one new merged bucket if a merge
getIndexInfo()  : array<string|int, mixed>
Computes the path to the hash partition that would contain $hash_key as well as the contents of that partition
insertRecordsFromIndex()  : mixed
Reads in index records from $hash_path, then reinserts them into LHT under the assumption that the size of the LHT is $new_count (defaults to current size + 1)
mergeMigrate()  : mixed
Handles merging two buckets into one, which might happen after a delete on the LHT
putIndex()  : bool
Assuming a $value record where blob items have already been replaced by their integer offsets into the archive partition, inserts in the LHT a $value record according to this LHT's signature for $key.
splitMigrate()  : mixed
Handles splitting one bucket into two, which might happen after an insert on the LHT
unlinkHashPath()  : mixed
Unlinks the index file at the given $hash_path if it exists

Constants

ACTIVE_INDEX

Name of staging partition used for rows before they are added to the main linear hash table

public mixed ACTIVE_INDEX = "active"

DEFAULT_COMPRESSOR

Compression strategy used to compress blob and serial columns

public mixed DEFAULT_COMPRESSOR = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"

DEFAULT_PARAMETERS

Default parameters to use when constructing a LinearHashTable

public mixed DEFAULT_PARAMETERS = ["COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]

HASH_INDEX_EXTENSION

File extension to use for hash index files for partitions These are the file buckets which have names based on portions of hash key values and store records for keys matching the bit pattern

public mixed HASH_INDEX_EXTENSION = ".hix"

INDEX_CACHE_SIZE

Maximum number of indexes to partitions to cache for reads

public mixed INDEX_CACHE_SIZE = 100

KEY_PREFIX

For each archive partition, the file with the list of keys (not hashes of keys) that were added for this partition will be prefixed this string.

public mixed KEY_PREFIX = "key_"

MAX_ITEMS_PER_FILE

Default maximum number of records to store in a hash table partition

public mixed MAX_ITEMS_PER_FILE = 16384

PARAMETERS_FILE

File name of file used to store the parameters of this LinearHashTable

public mixed PARAMETERS_FILE = "lht_parameters.txt"

PARTITION_PREFIX

Prefix to file names of LinearHashTable partition files

public mixed PARTITION_PREFIX = "partition_"

PARTITION_SIZE_THRESHOLD

Maximum number of bytes a partition can have before a split.

public mixed PARTITION_SIZE_THRESHOLD = 2147483648

Notice this implies a maximum file size to store in BLOB columns

Properties

$active_index

In memory copy of the active partition index. key => packed_row pairs for the active partition.

public index $active_index

$add_archive_cache

Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition added to

public array<string|int, mixed> $add_archive_cache = [null, "", -1]

$add_index_cache

Field variable used as a cache for the file handle, file name, and time of the last index hash bucket file was added to

public array<string|int, mixed> $add_index_cache = ["", [], -1]

$add_key_archive_cache

Field variable used as a cache for the file handle, file name, and time of the last key index file for a partition added to (the key index file is just a sorted file of the keys that appear in a hash table partition). Its intended use is to aid in the creation of additional index structures on top of the linear hash table

public array<string|int, mixed> $add_key_archive_cache = [null, "", -1]

$blob_columns

Array of column names for the columns in a LinearHashTable which are of type BLOB or SERIAL

public array<string|int, mixed> $blob_columns

$compressor

The seekquarry\yioop\library\compressors\Compressor object used to compress record files and blob items.

public object $compressor

$folder

Folder path where the LinearHashTable is stored

public string $folder

$get_archive_cache

Field variable used as a cache for the file handle, file name, and time of the last archive file for a partition accessed for a value

public array<string|int, mixed> $get_archive_cache = [null, "", -1]

$index_buffer

Used to cached hash_buck_path => contents_of_hash_bucket pairs in to speed up retrieval and update.

public array<string|int, mixed> $index_buffer

$instance_time

Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the table)

public int $instance_time

$key_field

Name of primary key column for records

public string $key_field

$parameters

Stores the constructor parameters used to create this LinearHashTable

public array<string|int, mixed> $parameters

$save_partition_handle

File handle for index file of current partition

public resource $save_partition_handle

$serial_columns

Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL

public array<string|int, mixed> $serial_columns

$table_tools

The PackedTableTools object used to pack and unpack records in LinearHashTable

public object $table_tools

Methods

__construct()

Creates/Loads LinearHashTable having specified folder and parameters

public __construct(string $folder[, array<string|int, mixed> $format = self::DEFAULT_PARAMETERS["FORMAT"] ][, int $max_items_per_file = self::MAX_ITEMS_PER_FILE ][, int $partition_size_threshold = self::PARTITION_SIZE_THRESHOLD ][, object $compressor_type = self::DEFAULT_COMPRESSOR ][, bool $with_key_archive = false ]) : mixed
Parameters
$folder : string

is the folder for storing the LinearHashTable files

$format : array<string|int, mixed> = self::DEFAULT_PARAMETERS["FORMAT"]

the column names, keys and types for this LinearHashTable object

$max_items_per_file : int = self::MAX_ITEMS_PER_FILE

maximum number of items to store in a partition before making the next partition

$partition_size_threshold : int = self::PARTITION_SIZE_THRESHOLD

maximum length of a partition file in bytes before a new partition file should be started

$compressor_type : object = self::DEFAULT_COMPRESSOR

seekquarry\yioop\library\compressors\Compressor object used to compress index files and blob items.

$with_key_archive : bool = false

whether to create and add to a key archive when doing inserts

Return values
mixed

addCount()

Add $num to maintained counter $field

public addCount(int $num[, string $field = "COUNT" ]) : mixed
Parameters
$num : int

number of items to add to current count

$field : string = "COUNT"

field of info struct to add to the count of

Return values
mixed

advanceSavePartition()

Save the active partition to $new_save_partition and advances the next save partition to be $new_save_partition + 1. If $new_save_partition ==0, $this->parameters["SAVE_PARTITION"] + 1 is used for $new_save_partition . If $new_save_partition <= $this->parameters["SAVE_PARTITION"], then this function returns false. Saving the active partition means moving key => record entries from the active index to their appropriate linear hash table buckets. The Archive file for the save partition is closed, and a new file handle for the next save partition file is opened.

public advanceSavePartition(int $new_save_partition) : bool
Parameters
$new_save_partition : int

index of the save partition when starting a new one.

Return values
bool

success (true) or failure (false)

bitStatistics()

Given an item $count, and a maximum number of items in a bucket for the LHT determines, the number of buckets, the maximum number of bits to be used from the hash function to determine bucket, 2^max_bits -1, and a threshold on when to use this number of bit versus one less.

public bitStatistics(int $count, int $max_items_per_file) : array<string|int, mixed>
Parameters
$count : int

number of items in LHT

$max_items_per_file : int

max items per bucket

Return values
array<string|int, mixed>

4-tuple [num files, max_num_bit, 2^max_bits -1, threshold]

delete()

Deletes a key or set of keys and their associated values from the linear hash table

public delete(mixed $key_or_keys[, bool $is_hash_key = false ]) : bool
Parameters
$key_or_keys : mixed

either the string of a key or an array of strings of keys to delete

$is_hash_key : bool = false

whether or not the key or keys have already been hash using the linear hash table's hash function

Return values
bool

success (true) or failure (false) for deletion

exists()

Checks if a key exists in the linear hash table

public exists(string $key[, bool $compute_hash = true ][, bool $check_active = true ]) : bool
Parameters
$key : string

key to check

$compute_hash : bool = true

whether the key has already had the linear hash table's hash function applied or not

$check_active : bool = true

whether to check the active index of buffered key values that have not yet been put into the main table

Return values
bool

whether the $key exists in the linear hash table

get()

Returns the record associated with the given $key in the LinearHashTable

public get(string $key[, array<string|int, mixed> $fields = [] ][, bool $is_hash_key = false ]) : mixed
Parameters
$key : string

a key to look up

$fields : array<string|int, mixed> = []

which fields (columns) from the record to return

$is_hash_key : bool = false

has the supplied $key already been hash using the LinearHashTable's hash function (the function used to determine the partition a key should be in)

Return values
mixed

array containing unpacked record associated with $key if it exists in the table, false otherwise

getActiveIndex()

Returns the path to the index file for the active partition (the key value pairs not yet stored in the main LinearHashTable)

public getActiveIndex() : string
Return values
string

path to active partition

getArchive()

Returns the blob item from $archive_filename at $offset of length $len, uncompress the result

public getArchive(string $archive_filename, int $offset, int $len) : string
Parameters
$archive_filename : string

path to an archive node file for this LinearHashTable

$offset : int

byte offset into archive node file file

$len : int

length of blob item

Return values
string

uncompressed blob item from $archive_filename

getHashPath()

Gets the file path for the bucket associated with $hash_key, assuming the number of items in the linear hash table is $count, $max_items_per_file is the maximum number of items per file.

public getHashPath(string $hash_key[, mixed $count = -1 ][, int $max_items_per_file = -1 ][, bool $mkdir_if_not_exists = false ]) : string
Parameters
$hash_key : string

LHT hash of a key value

$count : mixed = -1
$max_items_per_file : int = -1

maximum number of items per bucket (defaults to this parameter for the current LHT)

$mkdir_if_not_exists : bool = false

make folders along path that don't exist already if true.

Return values
string

path to bucket

getKeyPartition()

Returns the path to the key index file for the partition $i (file used to store blob and serial columns)

public getKeyPartition(mixed $i) : string
Parameters
$i : mixed
Return values
string

path to partition $i key index file

getParameterInfo()

Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the LinearHashTable stored at $folder

public static getParameterInfo(mixed $folder) : mixed
Parameters
$folder : mixed
Return values
mixed

getPartition()

Returns the path to the archive file for the partition $i (file used to store blob and serial columns)

public getPartition(mixed $i) : string
Parameters
$i : mixed
Return values
string

path to partition $i archive file

hashKey()

Computes the fixed length hash value of a $key for the purposes of storing the key associated with a value in this lienar hash table.

public hashKey(string $key) : string
Parameters
$key : string

to hash

Return values
string

fixed length result of hashing

initCountIfNotExists()

Creates a new counter $field to be maintained

public initCountIfNotExists([string $field = "COUNT" ]) : mixed
Parameters
$field : string = "COUNT"

field of info struct to add a counter for

Return values
mixed

put()

Used to add a new record or an array of new records to the LinearHashTable

public put(array<string|int, mixed> $row_or_rows[, bool $is_hash_key = false ][, bool $allow_duplicates = true ]) : bool
Parameters
$row_or_rows : array<string|int, mixed>

either array of record with fields given by this LinearHashTable's signature or an array of rows.

$is_hash_key : bool = false

whether or not the key_field in the row or rows has already been hash to a string suitable for storage in this LinearHashTable

$allow_duplicates : bool = true

whether to allow multiple records with same key or not. If allow then a find will return the most recent.

Return values
bool

success (true) or not (false) for the insert

saveParameters()

Save the operating parameters of this LinearHashTable

public saveParameters() : mixed
Return values
mixed

addArchive()

Add a $value to save partition of this linear hash table

protected addArchive(string $value) : array<string|int, mixed>
Parameters
$value : string

value to add

Return values
array<string|int, mixed>

[offset into archive, length data saved, index of partition]

addIndex()

Adds the $hash_key, $value pair to the linear hash table bucket (index file) for $hash_key under the assumption that $count items are in the hash table. If bulk_mode is enabled then the index file is kept in memory rather than immediately flushed to disk.

protected addIndex(string $hash_key, string $value[, int $count = -1 ][, bool $bulk_mode = false ]) : bool
Parameters
$hash_key : string

key to determine hash table bucket (index file)

$value : string

packed table data to associate with key

$count : int = -1

number of items to assume in linear hash table. If -1 then use based on the saved parameter count

$bulk_mode : bool = false

whether to immediately flush index/bucket to disk

Return values
bool

success (true) or failure (false) of addition

addKeyArchive()

Add a key (not its hash) to the archive file for keys for the current save partition. This is used to keep track of all the keys stored in a partition. Note a key archive is only created if the constructor $with_key_archive parameter was called with value true

protected addKeyArchive(string $key) : mixed
Parameters
$key : string

key to add

Return values
mixed

checkSplitMerge()

Checks if $new_count many stored items would entail splitting one of the hash partitions of this linear hash table (> 0), require no change in the number of hash partitions (0), or require two partitions to be merges (< 0)

protected checkSplitMerge(int $new_count) : int
Parameters
$new_count : int

new item count

Return values
int

change (<0 merge, >0split) or no change required (0).

computeMigratePaths()

Given $count items in LHT with bucket size $max_items_per_file, and assuming a split/merge just occurred, returns the path of the two new buckets if a split, and the one new merged bucket if a merge

protected computeMigratePaths([int $count = -1 ][, int $max_items_per_file = -1 ]) : array<string|int, mixed>
Parameters
$count : int = -1

number of items in LHT

$max_items_per_file : int = -1

maximum number of items/bucket

Return values
array<string|int, mixed>

[path to merge bucket, path to split bucket low, path to split bucket high]

getIndexInfo()

Computes the path to the hash partition that would contain $hash_key as well as the contents of that partition

protected getIndexInfo(string $hash_key[, int $count = -1 ]) : array<string|int, mixed>
Parameters
$hash_key : string

key to find partition for

$count : int = -1

number of items assumed to be stored in table, if -1 uses $this->parameters['COUNT']

Return values
array<string|int, mixed>

[hash_partition_path, hash_partition_contents] if $count is not -1, then the partition might not exists in which case the second component might be -1

insertRecordsFromIndex()

Reads in index records from $hash_path, then reinserts them into LHT under the assumption that the size of the LHT is $new_count (defaults to current size + 1)

protected insertRecordsFromIndex(string $hash_path[, int $new_count = -1 ]) : mixed
Parameters
$hash_path : string

location of bucket to read records from

$new_count : int = -1

size of LHT to assume when reinserting records

Return values
mixed

mergeMigrate()

Handles merging two buckets into one, which might happen after a delete on the LHT

protected mergeMigrate() : mixed
Return values
mixed

putIndex()

Assuming a $value record where blob items have already been replaced by their integer offsets into the archive partition, inserts in the LHT a $value record according to this LHT's signature for $key.

protected putIndex(string $key, string $value[, bool $is_hash_key = false ], int $change_count[, bool $bulk_insert = false ]) : bool
Parameters
$key : string

key to use for insert

$value : string

record accordin to this LHT's signature

$is_hash_key : bool = false

whether or not $key has already been hash using this LHT's hash function

$change_count : int

for much the size of the LHT should change by

$bulk_insert : bool = false

whether the current insert is one of many (affects how long items are kept in memory flushing)

Return values
bool

success of insert (true) or failure (false)

splitMigrate()

Handles splitting one bucket into two, which might happen after an insert on the LHT

protected splitMigrate() : mixed
Return values
mixed

unlinkHashPath()

Unlinks the index file at the given $hash_path if it exists

protected unlinkHashPath(string $hash_path) : mixed
Parameters
$hash_path : string

path to index file

Return values
mixed

        

Search results