Yioop_V9.5_Source_Code_Documentation

BPlusTree
in package

This class implements the B+-tree structure over existing file system

Tags
author

Chris Pollett

Table of Contents

ARCHIVE_PREFIX  = "a"
Leaf node archive file (used for blob and serial record columns) name prefix
DEFAULT_COMPRESSOR  = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
The default compressor object used to compress tree nodes
DEFAULT_PARAMETERS  = ["NODE_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "FORMAT" => ["PRIMARY KEY" => ["KEY", -1], "VALUE" => "BLOB"], "MAX_KEYS" => self::MAX_KEYS]
Default parameters to use when constructing a BPlusTree
LEAST_NODE_NAME  = "start"
Internal nodes of BPlusTree are folders, subfolders/subfiles are names according to their least key except for the first subdfolder/ subfile of the node which is given the name of the LEAST_NODE_NAME constant
MAX_CACHE_SIZE  = 200
Maximum size of a cache of any type used by this BPlusTree
MAX_KEYS  = 501
The MAXIMUM number of keys stored in a tree node (will split if have more)
NEXT_NODE_NAME  = "next"
Internal nodes of BPlusTree are folders. For nodes of the same height in the tree NEXT_NODE_NAME is used as the name of the file with the serialized name of the next folder of the same height in the tree.
NODE_PREFIX  = "i"
Leaf node index record file name prefix
PARAMETERS_FILE  = "bpt_parameters.txt"
File name of file used to store the parameters of this BPlusTree
TEMP_NODE_NAME  = "tmp_node"
Name of temporary file used when splitting a BPlusTree node.
$add_archive_cache  : array<string|int, mixed>
Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node added to
$blob_columns  : array<string|int, mixed>
Array of column names for the columns in a BPlusTree which are of type BLOB or SERIAL
$blob_compressor  : object
The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
$folder  : string
Folder for storing the B-Tree files
$get_archive_cache  : array<string|int, mixed>
Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node accessed for a value
$insert_node_cache  : array<string|int, mixed>
A cache for contents of nodes that have been recently added to.
$instance_time  : int
Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the BPlusTree)
$key_field  : string
Name of primary key column for records
$last_find_folder  : string
Last folder path of a find operation, provided this was cacheable
$last_find_key  : string
Last encoded key used for a find operation, provided this was cacheable Used to avoid recomputing path down tree if will be the same.
$last_find_next_key  : string
First key of next node after returned node for the last find operation, provided this was cacheable
$last_insert_node_path  : string
File path of last inserted node
$node_compressor  : object
The seekquarry\yioop\library\compressors\Compressor object used to compress node files.
$parameters  : object
Storage for root node of the B-Tree
$serial_columns  : array<string|int, mixed>
Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
$table_tools  : object
The PackedTableTools object used to pack and unpack records in BPlusTree
$tree_path_cache  : array<string|int, mixed>
Cache of key-values (internal_node_path => array of subnodes) used to speed up tree traversal
__construct()  : mixed
Creates/Loads B+-Tree having specified folder and minimum_degree.
archiveFilenameFromNodeFilename()  : string
Given the path to an index node file in the BPlusTree compute the path to the corresponding archive node file.
find()  : string
Returns the file path to the file that may or may not contain $key in the BPlusTree
flushLastPutNode()  : mixed
Saves the the BPlusTree $insert_node_path (if not empty) node to disk.
get()  : array<string|int, mixed>
Returns the record associated with a $key as stored in the BPlusTree.
getArchive()  : string
Return the blob item from $archive_filename at $offset of length $len, uncompress the result
getFromNode()  : array<string|int, mixed>
Returns records associated with $key in $key_node and the file associated with $archive_filename (for blob and serial column values) if it exists.
getParameterInfo()  : array<string|int, mixed>
Returns the parameters (such as its signature, max keys per nodes, etc) used to configure the BPlusTree stored at $folder
getParentFolder()  : string
Returns the file path of the parent node of the current internal node
put()  : bool
Adds an entry to the BPlusTree. Entries added might not be fully on disk until flushLastPutNode is called, as this memory tries to avoid disk writes.
putNode()  : mixed
Given an entry to add to a BPlusTree node, the index file of the node (for non blob or serial fields), and the name of the archive file for node (for blob or serial fields) stores the entry to the node.
saveParameters()  : mixed
Save the operating parameters of this BPlusTree
splitRecordsInLeaf()  : mixed
Splits BPlusTree leaf $node with path $node_path into two nodes each with an equal number of keys.
splitRootNode()  : bool
This method handles split the root node of a BPlusTree when it gets too full. Unlike for other nodes when we can add a key in the parent, for the root, we make a temporary folder and move all the root node's nodes to it then rename this the the least node of the root, and call updateNodePath on this to restore the BPlusTree property.
updateNodePath()  : bool
For a non-leaf node checks if it needs to be split, and if so splits and fixes keys up the tree to restore BPlus-Tree properties.
addArchive()  : array<string|int, mixed>
Add a value to an archive file $archive_filename of a B+-tree node
getArchiveName()  : string
Given a encoded hash_key to be used as the name of in the name of an archive file for a BPlusTree node and a folder where that node should be stored, return the path name for an archive file for a node
writeBlobsAndPack()  : string
Write the blob and serial columns from a row to insert into BPlusTree and then pack the rest of the row as a string

Constants

ARCHIVE_PREFIX

Leaf node archive file (used for blob and serial record columns) name prefix

public mixed ARCHIVE_PREFIX = "a"

DEFAULT_COMPRESSOR

The default compressor object used to compress tree nodes

public mixed DEFAULT_COMPRESSOR = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"

DEFAULT_PARAMETERS

Default parameters to use when constructing a BPlusTree

public mixed DEFAULT_PARAMETERS = ["NODE_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "FORMAT" => ["PRIMARY KEY" => ["KEY", -1], "VALUE" => "BLOB"], "MAX_KEYS" => self::MAX_KEYS]

LEAST_NODE_NAME

Internal nodes of BPlusTree are folders, subfolders/subfiles are names according to their least key except for the first subdfolder/ subfile of the node which is given the name of the LEAST_NODE_NAME constant

public mixed LEAST_NODE_NAME = "start"

MAX_CACHE_SIZE

Maximum size of a cache of any type used by this BPlusTree

public mixed MAX_CACHE_SIZE = 200

MAX_KEYS

The MAXIMUM number of keys stored in a tree node (will split if have more)

public mixed MAX_KEYS = 501

NEXT_NODE_NAME

Internal nodes of BPlusTree are folders. For nodes of the same height in the tree NEXT_NODE_NAME is used as the name of the file with the serialized name of the next folder of the same height in the tree.

public mixed NEXT_NODE_NAME = "next"

NODE_PREFIX

Leaf node index record file name prefix

public mixed NODE_PREFIX = "i"

PARAMETERS_FILE

File name of file used to store the parameters of this BPlusTree

public mixed PARAMETERS_FILE = "bpt_parameters.txt"

TEMP_NODE_NAME

Name of temporary file used when splitting a BPlusTree node.

public mixed TEMP_NODE_NAME = "tmp_node"

Properties

$add_archive_cache

Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node added to

public array<string|int, mixed> $add_archive_cache = [null, "", -1]

$blob_columns

Array of column names for the columns in a BPlusTree which are of type BLOB or SERIAL

public array<string|int, mixed> $blob_columns

$blob_compressor

The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.

public object $blob_compressor

$folder

Folder for storing the B-Tree files

public string $folder

$get_archive_cache

Field variable used as a cache for the file handle, file name, and time of the last archive file for a B+-tree node accessed for a value

public array<string|int, mixed> $get_archive_cache = [null, "", -1]

$insert_node_cache

A cache for contents of nodes that have been recently added to.

public array<string|int, mixed> $insert_node_cache = []

consists of path_to_node => node_content pairs

$instance_time

Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the BPlusTree)

public int $instance_time

$key_field

Name of primary key column for records

public string $key_field

$last_find_folder

Last folder path of a find operation, provided this was cacheable

public string $last_find_folder = null

$last_find_key

Last encoded key used for a find operation, provided this was cacheable Used to avoid recomputing path down tree if will be the same.

public string $last_find_key = null

$last_find_next_key

First key of next node after returned node for the last find operation, provided this was cacheable

public string $last_find_next_key = null

$last_insert_node_path

File path of last inserted node

public string $last_insert_node_path

$node_compressor

The seekquarry\yioop\library\compressors\Compressor object used to compress node files.

public object $node_compressor

$parameters

Storage for root node of the B-Tree

public object $parameters

$serial_columns

Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL

public array<string|int, mixed> $serial_columns

$table_tools

The PackedTableTools object used to pack and unpack records in BPlusTree

public object $table_tools

$tree_path_cache

Cache of key-values (internal_node_path => array of subnodes) used to speed up tree traversal

public array<string|int, mixed> $tree_path_cache = []

Methods

__construct()

Creates/Loads B+-Tree having specified folder and minimum_degree.

public __construct(string $folder[, array<string|int, mixed> $format = self::DEFAULT_PARAMETERS["FORMAT"] ][, int $max_keys = self::MAX_KEYS ][, object $node_compressor_type = self::DEFAULT_COMPRESSOR ][, object $blob_compressor_type = self::DEFAULT_COMPRESSOR ]) : mixed
Parameters
$folder : string

is the folder for storing the B+-Tree files

$format : array<string|int, mixed> = self::DEFAULT_PARAMETERS["FORMAT"]

the column names, keys and types for this B+-Tree object

$max_keys : int = self::MAX_KEYS

the maximum number of keys a node is allowed to hold

$node_compressor_type : object = self::DEFAULT_COMPRESSOR

seekquarry\yioop\library\compressors\Compressor object used to compress index node files.

$blob_compressor_type : object = self::DEFAULT_COMPRESSOR

seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.

Return values
mixed

archiveFilenameFromNodeFilename()

Given the path to an index node file in the BPlusTree compute the path to the corresponding archive node file.

public archiveFilenameFromNodeFilename(string $node_filename) : string
Parameters
$node_filename : string

of index node path

Return values
string

associated archive node path

find()

Returns the file path to the file that may or may not contain $key in the BPlusTree

public find(string $key[, bool $is_encoded_key = false ]) : string
Parameters
$key : string

to look up in BPlusTree

$is_encoded_key : bool = false

whether $key has been rawurlencode'd or not

Return values
string

file path to node in BPlusTree that may or may not contain this key

flushLastPutNode()

Saves the the BPlusTree $insert_node_path (if not empty) node to disk.

public flushLastPutNode([string $insert_node_path = "" ]) : mixed

If $insert_node_path == "" then the last node put'ted to will be saved.

Parameters
$insert_node_path : string = ""

name of node to save to disk.

Return values
mixed

get()

Returns the record associated with a $key as stored in the BPlusTree.

public get(string $key[, bool $is_encoded_key = true ][, bool $use_string_node = false ][, bool $decode = true ][, bool $look_up_blobs = true ], int $offset[, int $limit = -1 ]) : array<string|int, mixed>

If $key is not present in tree then returns null

Parameters
$key : string

to look up in current BPlusTree

$is_encoded_key : bool = true

whether $key has been rawurlencode'd or not

$use_string_node : bool = false

rather than load in a node and split into keys for use of all keys, read in node as a string and uncompress but do not perform further decoding (faster if only doing reads).

$decode : bool = true

whether or not to unpack record

$look_up_blobs : bool = true

whether to leave in the record the offset and length for blob items or to look up the blob item

$offset : int

starting record item associated with this key to return

$limit : int = -1

maximum number of record items associated with this key to return

Return values
array<string|int, mixed>

of items associated with this key in the BPlusTree if $look_up_blobs == false, then returns a pair ["ARCHIVE_FILE" => [filename where blob columns stored, array of items associated with this key]]

getArchive()

Return the blob item from $archive_filename at $offset of length $len, uncompress the result

public getArchive(string $archive_filename, int $offset, int $len) : string
Parameters
$archive_filename : string

path to an archive node file for this BPlusTree

$offset : int

byte offset into archive node file file

$len : int

length of blob item

Return values
string

uncompressed blob item from $archive_filename

getFromNode()

Returns records associated with $key in $key_node and the file associated with $archive_filename (for blob and serial column values) if it exists.

public getFromNode(string $key, array<string|int, mixed> $key_node, string $archive_filename[, bool $is_encoded_key = true ][, mixed $is_string_node = false ][, bool $decode = true ][, bool $look_up_blobs = true ], int $offset[, int $limit = -1 ]) : array<string|int, mixed>
Parameters
$key : string

the key to look up

$key_node : array<string|int, mixed>

the BPlusTree index node in which to look up the key

$archive_filename : string

file name of archive data (blob and serial columns) associated with $key_node

$is_encoded_key : bool = true

whether the key is rawurlencoded or not

$is_string_node : mixed = false
$decode : bool = true

whether to $decode the records or leave them packed

$look_up_blobs : bool = true

whether to look up blob and serial columns

$offset : int

index of the first record associated with $key to return

$limit : int = -1

maximum number of records associated with key to return

Return values
array<string|int, mixed>

of value records associated with $key

getParameterInfo()

Returns the parameters (such as its signature, max keys per nodes, etc) used to configure the BPlusTree stored at $folder

public static getParameterInfo(string $folder) : array<string|int, mixed>
Parameters
$folder : string

file path to a stored BPlusTree

Return values
array<string|int, mixed>

configuration info about the BPlusTree

getParentFolder()

Returns the file path of the parent node of the current internal node

public getParentFolder(string $folder) : string
Parameters
$folder : string

path to internal BPlusTree node folder. (Internal nodes are folders, leaves are files)

Return values
string

file_path to parent or false, if no parent.

put()

Adds an entry to the BPlusTree. Entries added might not be fully on disk until flushLastPutNode is called, as this memory tries to avoid disk writes.

public put(array<string|int, mixed> $row[, bool $is_encoded_key = true ][, int $mode = PackedTableTools::APPEND_MODE ]) : bool
Parameters
$row : array<string|int, mixed>

associative array of filed_name => values for each column field field this BPlusTree's signature has

$is_encoded_key : bool = true

whether in $row the primary key field has already been encoded (rawurlencode) or not

$mode : int = PackedTableTools::APPEND_MODE

says what to do in the event that an entry with the given key already exists in the BPlusTree. Possibilities are either PackedTableTools::APPEND_MODE or PackedTableTools::REPLACE_MODE The former appends as a subrecord the nonkey fields of the entry to the existing list of sub-records associated to the given key, the later replaces the key value pair.

Return values
bool

success (true) or failure (false)

putNode()

Given an entry to add to a BPlusTree node, the index file of the node (for non blob or serial fields), and the name of the archive file for node (for blob or serial fields) stores the entry to the node.

public putNode(array<string|int, mixed> $row, array<string|int, mixed> &$node, string $archive_filename[, bool $is_encoded_key = true ][, int $mode = PackedTableTools::APPEND_MODE ]) : mixed
Parameters
$row : array<string|int, mixed>

associative array of filed_name => values for each column field field this BPlusTree's signature has

$node : array<string|int, mixed>

an in-memory code of a node (will be modified by this method)

$archive_filename : string

the name of the archive file associated with this file used for blob or serial fields.

$is_encoded_key : bool = true

whether in $row the primary key field has already been encoded (rawurlencode) or not

$mode : int = PackedTableTools::APPEND_MODE

says what to do in the event that an entry with the given key already exists in the BPlusTree. Possibilities are either PackedTableTools::APPEND_MODE or PackedTableTools::REPLACE_MODE The former appends as a subrecord the nonkey fields of the entry to the existing list of sub-records associated to the given key, the later replaces the key value pair.

Return values
mixed

saveParameters()

Save the operating parameters of this BPlusTree

public saveParameters() : mixed
Return values
mixed

splitRecordsInLeaf()

Splits BPlusTree leaf $node with path $node_path into two nodes each with an equal number of keys.

public splitRecordsInLeaf(string $node_path, mixed $node) : mixed
Parameters
$node_path : string

path to file used to store a BPlusTree node Used to name files after split.

$node : mixed
Return values
mixed

splitRootNode()

This method handles split the root node of a BPlusTree when it gets too full. Unlike for other nodes when we can add a key in the parent, for the root, we make a temporary folder and move all the root node's nodes to it then rename this the the least node of the root, and call updateNodePath on this to restore the BPlusTree property.

public splitRootNode() : bool
Return values
bool

success (true) or failure (false)

updateNodePath()

For a non-leaf node checks if it needs to be split, and if so splits and fixes keys up the tree to restore BPlus-Tree properties.

public updateNodePath(string $folder) : bool
Parameters
$folder : string

to check if needs split

Return values
bool

success (true) or failure (false)

addArchive()

Add a value to an archive file $archive_filename of a B+-tree node

protected addArchive(string $archive_filename, string $value) : array<string|int, mixed>
Parameters
$archive_filename : string

name of archive file

$value : string

value to add

Return values
array<string|int, mixed>

[offset into archive, length data saved]

getArchiveName()

Given a encoded hash_key to be used as the name of in the name of an archive file for a BPlusTree node and a folder where that node should be stored, return the path name for an archive file for a node

protected getArchiveName(string $current_folder, string $encode_key) : string
Parameters
$current_folder : string

to store archive file in

$encode_key : string

encode key value to be used in the file name of a BPlusTree node archive file

Return values
string

archive file path

writeBlobsAndPack()

Write the blob and serial columns from a row to insert into BPlusTree and then pack the rest of the row as a string

protected writeBlobsAndPack(string $archive_filename, array<string|int, mixed> $row) : string
Parameters
$archive_filename : string

name of archive file for a BPlusTree node

$row : array<string|int, mixed>

BPlusTree row to pack Blob columns for

Return values
string

packed row


        

Search results