BPlusTree
in package
This class implements the B+-tree structure over existing file system
Tags
Table of Contents
- ARCHIVE_PREFIX = "a"
- Leaf node archive file (used for blob and serial record columns) name prefix
- DEFAULT_COMPRESSOR = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
- The default compressor object used to compress tree nodes
- DEFAULT_PARAMETERS = ["NODE_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "FORMAT" => ["PRIMARY KEY" => ["KEY", -1], "VALUE" => "BLOB"], "MAX_KEYS" => self::MAX_KEYS]
- Default parameters to use when constructing a BPlusTree
- LEAST_NODE_NAME = "start"
- Internal nodes of BPlusTree are folders, subfolders/subfiles are names according to their least key except for the first subdfolder/ subfile of the node which is given the name of the LEAST_NODE_NAME constant
- MAX_CACHE_SIZE = 200
- Maximum size of a cache of any type used by this BPlusTree
- MAX_KEYS = 501
- The MAXIMUM number of keys stored in a tree node (will split if have more)
- NEXT_NODE_NAME = "next"
- Internal nodes of BPlusTree are folders. For nodes of the same height in the tree NEXT_NODE_NAME is used as the name of the file with the serialized name of the next folder of the same height in the tree.
- NODE_PREFIX = "i"
- Leaf node index record file name prefix
- PARAMETERS_FILE = "bpt_parameters.txt"
- File name of file used to store the parameters of this BPlusTree
- TEMP_NODE_NAME = "tmp_node"
- Name of temporary file used when splitting a BPlusTree node.
- $add_archive_cache : array<string|int, mixed>
- Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node added to
- $blob_columns : array<string|int, mixed>
- Array of column names for the columns in a BPlusTree which are of type BLOB or SERIAL
- $blob_compressor : object
- The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
- $folder : string
- Folder for storing the B-Tree files
- $get_archive_cache : array<string|int, mixed>
- Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node accessed for a value
- $insert_node_cache : array<string|int, mixed>
- A cache for contents of nodes that have been recently added to.
- $instance_time : int
- Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the BPlusTree)
- $key_field : string
- Name of primary key column for records
- $last_find_folder : string
- Last folder path of a find operation, provided this was cacheable
- $last_find_key : string
- Last encoded key used for a find operation, provided this was cacheable Used to avoid recomputing path down tree if will be the same.
- $last_find_next_key : string
- First key of next node after returned node for the last find operation, provided this was cacheable
- $last_insert_node_path : string
- File path of last inserted node
- $node_compressor : object
- The seekquarry\yioop\library\compressors\Compressor object used to compress node files.
- $parameters : object
- Storage for root node of the B-Tree
- $serial_columns : array<string|int, mixed>
- Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
- $table_tools : object
- The PackedTableTools object used to pack and unpack records in BPlusTree
- $tree_path_cache : array<string|int, mixed>
- Cache of key-values (internal_node_path => array of subnodes) used to speed up tree traversal
- __construct() : mixed
- Creates/Loads B+-Tree having specified folder and minimum_degree.
- archiveFilenameFromNodeFilename() : string
- Given the path to an index node file in the BPlusTree compute the path to the corresponding archive node file.
- find() : string
- Returns the file path to the file that may or may not contain $key in the BPlusTree
- flushLastPutNode() : mixed
- Saves the the BPlusTree $insert_node_path (if not empty) node to disk.
- get() : array<string|int, mixed>
- Returns the record associated with a $key as stored in the BPlusTree.
- getArchive() : string
- Return the blob item from $archive_filename at $offset of length $len, uncompress the result
- getFromNode() : array<string|int, mixed>
- Returns records associated with $key in $key_node and the file associated with $archive_filename (for blob and serial column values) if it exists.
- getParameterInfo() : array<string|int, mixed>
- Returns the parameters (such as its signature, max keys per nodes, etc) used to configure the BPlusTree stored at $folder
- getParentFolder() : string
- Returns the file path of the parent node of the current internal node
- put() : bool
- Adds an entry to the BPlusTree. Entries added might not be fully on disk until flushLastPutNode is called, as this memory tries to avoid disk writes.
- putNode() : mixed
- Given an entry to add to a BPlusTree node, the index file of the node (for non blob or serial fields), and the name of the archive file for node (for blob or serial fields) stores the entry to the node.
- saveParameters() : mixed
- Save the operating parameters of this BPlusTree
- splitRecordsInLeaf() : mixed
- Splits BPlusTree leaf $node with path $node_path into two nodes each with an equal number of keys.
- splitRootNode() : bool
- This method handles split the root node of a BPlusTree when it gets too full. Unlike for other nodes when we can add a key in the parent, for the root, we make a temporary folder and move all the root node's nodes to it then rename this the the least node of the root, and call updateNodePath on this to restore the BPlusTree property.
- updateNodePath() : bool
- For a non-leaf node checks if it needs to be split, and if so splits and fixes keys up the tree to restore BPlus-Tree properties.
- addArchive() : array<string|int, mixed>
- Add a value to an archive file $archive_filename of a B+-tree node
- getArchiveName() : string
- Given a encoded hash_key to be used as the name of in the name of an archive file for a BPlusTree node and a folder where that node should be stored, return the path name for an archive file for a node
- writeBlobsAndPack() : string
- Write the blob and serial columns from a row to insert into BPlusTree and then pack the rest of the row as a string
Constants
ARCHIVE_PREFIX
Leaf node archive file (used for blob and serial record columns) name prefix
public
mixed
ARCHIVE_PREFIX
= "a"
DEFAULT_COMPRESSOR
The default compressor object used to compress tree nodes
public
mixed
DEFAULT_COMPRESSOR
= \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
DEFAULT_PARAMETERS
Default parameters to use when constructing a BPlusTree
public
mixed
DEFAULT_PARAMETERS
= ["NODE_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "FORMAT" => ["PRIMARY KEY" => ["KEY", -1], "VALUE" => "BLOB"], "MAX_KEYS" => self::MAX_KEYS]
LEAST_NODE_NAME
Internal nodes of BPlusTree are folders, subfolders/subfiles are names according to their least key except for the first subdfolder/ subfile of the node which is given the name of the LEAST_NODE_NAME constant
public
mixed
LEAST_NODE_NAME
= "start"
MAX_CACHE_SIZE
Maximum size of a cache of any type used by this BPlusTree
public
mixed
MAX_CACHE_SIZE
= 200
MAX_KEYS
The MAXIMUM number of keys stored in a tree node (will split if have more)
public
mixed
MAX_KEYS
= 501
NEXT_NODE_NAME
Internal nodes of BPlusTree are folders. For nodes of the same height in the tree NEXT_NODE_NAME is used as the name of the file with the serialized name of the next folder of the same height in the tree.
public
mixed
NEXT_NODE_NAME
= "next"
NODE_PREFIX
Leaf node index record file name prefix
public
mixed
NODE_PREFIX
= "i"
PARAMETERS_FILE
File name of file used to store the parameters of this BPlusTree
public
mixed
PARAMETERS_FILE
= "bpt_parameters.txt"
TEMP_NODE_NAME
Name of temporary file used when splitting a BPlusTree node.
public
mixed
TEMP_NODE_NAME
= "tmp_node"
Properties
$add_archive_cache
Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node added to
public
array<string|int, mixed>
$add_archive_cache
= [null, "", -1]
$blob_columns
Array of column names for the columns in a BPlusTree which are of type BLOB or SERIAL
public
array<string|int, mixed>
$blob_columns
$blob_compressor
The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
public
object
$blob_compressor
$folder
Folder for storing the B-Tree files
public
string
$folder
$get_archive_cache
Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node accessed for a value
public
array<string|int, mixed>
$get_archive_cache
= [null, "", -1]
$insert_node_cache
A cache for contents of nodes that have been recently added to.
public
array<string|int, mixed>
$insert_node_cache
= []
consists of path_to_node => node_content pairs
$instance_time
Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the BPlusTree)
public
int
$instance_time
$key_field
Name of primary key column for records
public
string
$key_field
$last_find_folder
Last folder path of a find operation, provided this was cacheable
public
string
$last_find_folder
= null
$last_find_key
Last encoded key used for a find operation, provided this was cacheable Used to avoid recomputing path down tree if will be the same.
public
string
$last_find_key
= null
$last_find_next_key
First key of next node after returned node for the last find operation, provided this was cacheable
public
string
$last_find_next_key
= null
$last_insert_node_path
File path of last inserted node
public
string
$last_insert_node_path
$node_compressor
The seekquarry\yioop\library\compressors\Compressor object used to compress node files.
public
object
$node_compressor
$parameters
Storage for root node of the B-Tree
public
object
$parameters
$serial_columns
Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
public
array<string|int, mixed>
$serial_columns
$table_tools
The PackedTableTools object used to pack and unpack records in BPlusTree
public
object
$table_tools
$tree_path_cache
Cache of key-values (internal_node_path => array of subnodes) used to speed up tree traversal
public
array<string|int, mixed>
$tree_path_cache
= []
Methods
__construct()
Creates/Loads B+-Tree having specified folder and minimum_degree.
public
__construct(string $folder[, array<string|int, mixed> $format = self::DEFAULT_PARAMETERS["FORMAT"] ][, int $max_keys = self::MAX_KEYS ][, object $node_compressor_type = self::DEFAULT_COMPRESSOR ][, object $blob_compressor_type = self::DEFAULT_COMPRESSOR ]) : mixed
Parameters
- $folder : string
-
is the folder for storing the B+-Tree files
- $format : array<string|int, mixed> = self::DEFAULT_PARAMETERS["FORMAT"]
-
the column names, keys and types for this B+-Tree object
- $max_keys : int = self::MAX_KEYS
-
the maximum number of keys a node is allowed to hold
- $node_compressor_type : object = self::DEFAULT_COMPRESSOR
-
seekquarry\yioop\library\compressors\Compressor object used to compress index node files.
- $blob_compressor_type : object = self::DEFAULT_COMPRESSOR
-
seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
Return values
mixed —archiveFilenameFromNodeFilename()
Given the path to an index node file in the BPlusTree compute the path to the corresponding archive node file.
public
archiveFilenameFromNodeFilename(string $node_filename) : string
Parameters
- $node_filename : string
-
of index node path
Return values
string —associated archive node path
find()
Returns the file path to the file that may or may not contain $key in the BPlusTree
public
find(string $key[, bool $is_encoded_key = false ]) : string
Parameters
- $key : string
-
to look up in BPlusTree
- $is_encoded_key : bool = false
-
whether $key has been rawurlencode'd or not
Return values
string —file path to node in BPlusTree that may or may not contain this key
flushLastPutNode()
Saves the the BPlusTree $insert_node_path (if not empty) node to disk.
public
flushLastPutNode([string $insert_node_path = "" ]) : mixed
If $insert_node_path == "" then the last node put'ted to will be saved.
Parameters
- $insert_node_path : string = ""
-
name of node to save to disk.
Return values
mixed —get()
Returns the record associated with a $key as stored in the BPlusTree.
public
get(string $key[, bool $is_encoded_key = true ][, bool $use_string_node = false ][, bool $decode = true ][, bool $look_up_blobs = true ], int $offset[, int $limit = -1 ]) : array<string|int, mixed>
If $key is not present in tree then returns null
Parameters
- $key : string
-
to look up in current BPlusTree
- $is_encoded_key : bool = true
-
whether $key has been rawurlencode'd or not
- $use_string_node : bool = false
-
rather than load in a node and split into keys for use of all keys, read in node as a string and uncompress but do not perform further decoding (faster if only doing reads).
- $decode : bool = true
-
whether or not to unpack record
- $look_up_blobs : bool = true
-
whether to leave in the record the offset and length for blob items or to look up the blob item
- $offset : int
-
starting record item associated with this key to return
- $limit : int = -1
-
maximum number of record items associated with this key to return
Return values
array<string|int, mixed> —of items associated with this key in the BPlusTree if $look_up_blobs == false, then returns a pair ["ARCHIVE_FILE" => [filename where blob columns stored, array of items associated with this key]]
getArchive()
Return the blob item from $archive_filename at $offset of length $len, uncompress the result
public
getArchive(string $archive_filename, int $offset, int $len) : string
Parameters
- $archive_filename : string
-
path to an archive node file for this BPlusTree
- $offset : int
-
byte offset into archive node file file
- $len : int
-
length of blob item
Return values
string —uncompressed blob item from $archive_filename
getFromNode()
Returns records associated with $key in $key_node and the file associated with $archive_filename (for blob and serial column values) if it exists.
public
getFromNode(string $key, array<string|int, mixed> $key_node, string $archive_filename[, bool $is_encoded_key = true ][, mixed $is_string_node = false ][, bool $decode = true ][, bool $look_up_blobs = true ], int $offset[, int $limit = -1 ]) : array<string|int, mixed>
Parameters
- $key : string
-
the key to look up
- $key_node : array<string|int, mixed>
-
the BPlusTree index node in which to look up the key
- $archive_filename : string
-
file name of archive data (blob and serial columns) associated with $key_node
- $is_encoded_key : bool = true
-
whether the key is rawurlencoded or not
- $is_string_node : mixed = false
- $decode : bool = true
-
whether to $decode the records or leave them packed
- $look_up_blobs : bool = true
-
whether to look up blob and serial columns
- $offset : int
-
index of the first record associated with $key to return
- $limit : int = -1
-
maximum number of records associated with key to return
Return values
array<string|int, mixed> —of value records associated with $key
getParameterInfo()
Returns the parameters (such as its signature, max keys per nodes, etc) used to configure the BPlusTree stored at $folder
public
static getParameterInfo(string $folder) : array<string|int, mixed>
Parameters
- $folder : string
-
file path to a stored BPlusTree
Return values
array<string|int, mixed> —configuration info about the BPlusTree
getParentFolder()
Returns the file path of the parent node of the current internal node
public
getParentFolder(string $folder) : string
Parameters
- $folder : string
-
path to internal BPlusTree node folder. (Internal nodes are folders, leaves are files)
Return values
string —file_path to parent or false, if no parent.
put()
Adds an entry to the BPlusTree. Entries added might not be fully on disk until flushLastPutNode is called, as this memory tries to avoid disk writes.
public
put(array<string|int, mixed> $row[, bool $is_encoded_key = true ][, int $mode = PackedTableTools::APPEND_MODE ]) : bool
Parameters
- $row : array<string|int, mixed>
-
associative array of filed_name => values for each column field field this BPlusTree's signature has
- $is_encoded_key : bool = true
-
whether in $row the primary key field has already been encoded (rawurlencode) or not
- $mode : int = PackedTableTools::APPEND_MODE
-
says what to do in the event that an entry with the given key already exists in the BPlusTree. Possibilities are either PackedTableTools::APPEND_MODE or PackedTableTools::REPLACE_MODE The former appends as a subrecord the nonkey fields of the entry to the existing list of sub-records associated to the given key, the later replaces the key value pair.
Return values
bool —success (true) or failure (false)
putNode()
Given an entry to add to a BPlusTree node, the index file of the node (for non blob or serial fields), and the name of the archive file for node (for blob or serial fields) stores the entry to the node.
public
putNode(array<string|int, mixed> $row, array<string|int, mixed> &$node, string $archive_filename[, bool $is_encoded_key = true ][, int $mode = PackedTableTools::APPEND_MODE ]) : mixed
Parameters
- $row : array<string|int, mixed>
-
associative array of filed_name => values for each column field field this BPlusTree's signature has
- $node : array<string|int, mixed>
-
an in-memory code of a node (will be modified by this method)
- $archive_filename : string
-
the name of the archive file associated with this file used for blob or serial fields.
- $is_encoded_key : bool = true
-
whether in $row the primary key field has already been encoded (rawurlencode) or not
- $mode : int = PackedTableTools::APPEND_MODE
-
says what to do in the event that an entry with the given key already exists in the BPlusTree. Possibilities are either PackedTableTools::APPEND_MODE or PackedTableTools::REPLACE_MODE The former appends as a subrecord the nonkey fields of the entry to the existing list of sub-records associated to the given key, the later replaces the key value pair.
Return values
mixed —saveParameters()
Save the operating parameters of this BPlusTree
public
saveParameters() : mixed
Return values
mixed —splitRecordsInLeaf()
Splits BPlusTree leaf $node with path $node_path into two nodes each with an equal number of keys.
public
splitRecordsInLeaf(string $node_path, mixed $node) : mixed
Parameters
- $node_path : string
-
path to file used to store a BPlusTree node Used to name files after split.
- $node : mixed
Return values
mixed —splitRootNode()
This method handles split the root node of a BPlusTree when it gets too full. Unlike for other nodes when we can add a key in the parent, for the root, we make a temporary folder and move all the root node's nodes to it then rename this the the least node of the root, and call updateNodePath on this to restore the BPlusTree property.
public
splitRootNode() : bool
Return values
bool —success (true) or failure (false)
updateNodePath()
For a non-leaf node checks if it needs to be split, and if so splits and fixes keys up the tree to restore BPlus-Tree properties.
public
updateNodePath(string $folder) : bool
Parameters
- $folder : string
-
to check if needs split
Return values
bool —success (true) or failure (false)
addArchive()
Add a value to an archive file $archive_filename of a B+-tree node
protected
addArchive(string $archive_filename, string $value) : array<string|int, mixed>
Parameters
- $archive_filename : string
-
name of archive file
- $value : string
-
value to add
Return values
array<string|int, mixed> —[offset into archive, length data saved]
getArchiveName()
Given a encoded hash_key to be used as the name of in the name of an archive file for a BPlusTree node and a folder where that node should be stored, return the path name for an archive file for a node
protected
getArchiveName(string $current_folder, string $encode_key) : string
Parameters
- $current_folder : string
-
to store archive file in
- $encode_key : string
-
encode key value to be used in the file name of a BPlusTree node archive file
Return values
string —archive file path
writeBlobsAndPack()
Write the blob and serial columns from a row to insert into BPlusTree and then pack the rest of the row as a string
protected
writeBlobsAndPack(string $archive_filename, array<string|int, mixed> $row) : string
Parameters
- $archive_filename : string
-
name of archive file for a BPlusTree node
- $row : array<string|int, mixed>
-
BPlusTree row to pack Blob columns for
Return values
string —packed row