PartitionDocumentBundle
in package
A partition document bundle is a collection of partition each of which in turn can hold a concatenated sequence of compressed documents and which are managed together. It is a successor format to the earlier WebArchiveBundle of Yioop. The partition document bundle stores individual records using a record format defined via the PackedTableTools class.
This basic format has been extended by two new types BLOB and SERIAL (a PHP serialized object represesnted as a blob). Data for columns of these types are stored in separate files from the rest of records. Offset into these archive files for blobs and serial's are stored in a record as columns representing a difference list of int's together with a LAST_BLOB_LEN column. Using this info, blob's and serial's associated with a record can be retrieved. How many documents are together with a collected into a partition can be tuned for read, write, and in-memory efficiency.
Tags
Table of Contents
- DEFAULT_COMPRESSOR = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
- Compression strategy used to compress blob and serial columns
- DEFAULT_PARAMETERS = ["RECORD_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]
- Default parameters to use when constructing a PartitionDocumentBundle
- INDEX_EXTENSION = ".ix"
- Extension for PartitionDocumentBundle partition files used to contain records
- MAX_ITEMS_PER_FILE = 16384
- Default maximum number of records to store in a partition
- PARAMETERS_FILE = "pdb_parameters.txt"
- File name of file used to store the parameters of this PartitionDocumentBundle
- PARTITION_PREFIX = "partition_"
- Prefix to file names of PartitionDocumentBundle partition files
- PARTITION_SIZE_THRESHOLD = 2147483648
- Maximum number of bytes a partition can have before the next partition is started. Notice this implies a maximum file size to store in BLOB columns
- $add_archive_cache : array<string|int, mixed>
- Used to store the file handle to, the partition number, and last add time for the last time an item's blob/serial columns were added to for the PartitionDocumentBundle
- $blob_columns : array<string|int, mixed>
- Array of column names for the columns in a PartitionDocumentBundle which are of type BLOB or SERIAL
- $blob_compressor : object
- The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
- $folder : string
- Folder path where the PartitionDocumentBundle is stored
- $get_archive_cache : array<string|int, mixed>
- Used to store the file handle to, the partition number, and last access time for the last time an item's blob/serial columns were accessed for the PartitionDocumentBundle
- $index_cache : array<string|int, mixed>
- In memory cache of partitions from the PartitionDocumentBundle
- $index_cache_size : mixed
- Maximum number of items the partition cache is allowed to hold
- $instance_time : int
- Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the bundle)
- $key_field : string
- Name of primary key column for records
- $parameters : array<string|int, mixed>
- Stores the constructor parameters used to create this PartitionDocumentBundle
- $record_compressor : object
- The seekquarry\yioop\library\compressors\Compressor object used to compress record files.
- $save_index : mixed
- Holds loaded unserialized index file data from $partition partition bundle
- $serial_columns : array<string|int, mixed>
- Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
- $table_tools : object
- The PackedTableTools object used to pack and unpack records in partitions
- __construct() : mixed
- Used to create a new instance of a PartitionDocumentBundle
- addCount() : mixed
- Add $num to maintained counter $field
- advanceSavePartition() : mixed
- Saves the current save partition, adds one to the save partition number, and starts a new save partition.
- get() : array<string|int, mixed>|false
- Returns $fields columns from the record associated with $key in the $partition partition of this PartitionDocumentBundle if exists.
- getArchive() : string
- Retrieve a BLOB string in the file $archive_filename at byte position $offset of length $len. It uncompresses this string using $compressor->uncompress and return the result.
- getParameterInfo() : array<string|int, mixed>
- Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the PartitionDocumentBundle stored at $folder
- getPartition() : string
- Returns the path to the archive file (used to store BLOB and SERIAL columns) for the $i partition in this PartitionDocumentBundle
- getPartitionIndex() : string
- Returns the path to the index file (used to store all columns a partition record except blob and serial columns) for the $i partition in this PartitionDocumentBundle
- initCountIfNotExists() : mixed
- Creates a new counter $field to be maintained
- loadPartitionIndex() : mixed
- Returns the unserialized index file for the $partition partition of this PartitionDocumentBundle. If $force_load is set to true then reloads from disk rather than use a cached value if present.
- put() : bool
- Used to add new records to the PartitionDocumentBundle
- saveParameters() : mixed
- Save the operating parameters of this PartitionDocumentBundle
- addArchive() : array<string|int, mixed>
- Used to add a blob item to the current save partition file.
Constants
DEFAULT_COMPRESSOR
Compression strategy used to compress blob and serial columns
public
mixed
DEFAULT_COMPRESSOR
= \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
DEFAULT_PARAMETERS
Default parameters to use when constructing a PartitionDocumentBundle
public
mixed
DEFAULT_PARAMETERS
= ["RECORD_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]
INDEX_EXTENSION
Extension for PartitionDocumentBundle partition files used to contain records
public
mixed
INDEX_EXTENSION
= ".ix"
MAX_ITEMS_PER_FILE
Default maximum number of records to store in a partition
public
mixed
MAX_ITEMS_PER_FILE
= 16384
PARAMETERS_FILE
File name of file used to store the parameters of this PartitionDocumentBundle
public
mixed
PARAMETERS_FILE
= "pdb_parameters.txt"
PARTITION_PREFIX
Prefix to file names of PartitionDocumentBundle partition files
public
mixed
PARTITION_PREFIX
= "partition_"
PARTITION_SIZE_THRESHOLD
Maximum number of bytes a partition can have before the next partition is started. Notice this implies a maximum file size to store in BLOB columns
public
mixed
PARTITION_SIZE_THRESHOLD
= 2147483648
Properties
$add_archive_cache
Used to store the file handle to, the partition number, and last add time for the last time an item's blob/serial columns were added to for the PartitionDocumentBundle
public
array<string|int, mixed>
$add_archive_cache
= [null, "", -1]
$blob_columns
Array of column names for the columns in a PartitionDocumentBundle which are of type BLOB or SERIAL
public
array<string|int, mixed>
$blob_columns
$blob_compressor
The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
public
object
$blob_compressor
$folder
Folder path where the PartitionDocumentBundle is stored
public
string
$folder
$get_archive_cache
Used to store the file handle to, the partition number, and last access time for the last time an item's blob/serial columns were accessed for the PartitionDocumentBundle
public
array<string|int, mixed>
$get_archive_cache
= [null, "", -1]
$index_cache
In memory cache of partitions from the PartitionDocumentBundle
public
array<string|int, mixed>
$index_cache
$index_cache_size
Maximum number of items the partition cache is allowed to hold
public
mixed
$index_cache_size
$instance_time
Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the bundle)
public
int
$instance_time
$key_field
Name of primary key column for records
public
string
$key_field
$parameters
Stores the constructor parameters used to create this PartitionDocumentBundle
public
array<string|int, mixed>
$parameters
$record_compressor
The seekquarry\yioop\library\compressors\Compressor object used to compress record files.
public
object
$record_compressor
$save_index
Holds loaded unserialized index file data from $partition partition bundle
public
mixed
$save_index
$serial_columns
Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
public
array<string|int, mixed>
$serial_columns
$table_tools
The PackedTableTools object used to pack and unpack records in partitions
public
object
$table_tools
Methods
__construct()
Used to create a new instance of a PartitionDocumentBundle
public
__construct(string $folder[, array<string|int, mixed> $format = self::DEFAULT_PARAMETERS["FORMAT"] ][, int $max_items_per_file = self::MAX_ITEMS_PER_FILE ][, int $partition_size_threshold = self::PARTITION_SIZE_THRESHOLD ][, object $record_compressor_type = self::DEFAULT_COMPRESSOR ][, object $blob_compressor_type = self::DEFAULT_COMPRESSOR ]) : mixed
Parameters
- $folder : string
-
the path to the folder to store this PartitionDocumentBundle
- $format : array<string|int, mixed> = self::DEFAULT_PARAMETERS["FORMAT"]
-
the column names, keys and types for this PartitionDocumentBundle object
- $max_items_per_file : int = self::MAX_ITEMS_PER_FILE
-
maximum number of items to store in a partition before making the next partition
- $partition_size_threshold : int = self::PARTITION_SIZE_THRESHOLD
-
maximum length of a partition file in bytes before a new partition file should be started
- $record_compressor_type : object = self::DEFAULT_COMPRESSOR
-
seekquarry\yioop\library\compressors\Compressor object used to compress record files excluding blob columns.
- $blob_compressor_type : object = self::DEFAULT_COMPRESSOR
-
seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
Return values
mixed —addCount()
Add $num to maintained counter $field
public
addCount(int $num[, string $field = "COUNT" ]) : mixed
Parameters
- $num : int
-
number of items to add to current count
- $field : string = "COUNT"
-
field of info struct to add to the count of
Return values
mixed —advanceSavePartition()
Saves the current save partition, adds one to the save partition number, and starts a new save partition.
public
advanceSavePartition(int $new_save_partition) : mixed
Parameters
- $new_save_partition : int
-
partition and add one to. If use default, then this method will use the parameters "SAVE_PARTITION" value.
Return values
mixed —get()
Returns $fields columns from the record associated with $key in the $partition partition of this PartitionDocumentBundle if exists.
public
get(string $key, int $partition[, array<string|int, mixed> $fields = [] ]) : array<string|int, mixed>|false
If $fields is empty all columns returned.
Parameters
- $key : string
-
to look up in partition
- $partition : int
-
to look for record in
- $fields : array<string|int, mixed> = []
-
names of fields in this PartitionDocumentBundle to return
Return values
array<string|int, mixed>|false —unpacked record on success, otherwise false
getArchive()
Retrieve a BLOB string in the file $archive_filename at byte position $offset of length $len. It uncompresses this string using $compressor->uncompress and return the result.
public
getArchive(string $archive_filename, int $offset, int $len) : string
Parameters
- $archive_filename : string
-
the filename of a partition archive file to get a blob object from
- $offset : int
-
a byte position in that file
- $len : int
-
number of bytes from $offset to read.
Return values
string —the result of uncompressing the string at $offset of length $len
getParameterInfo()
Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the PartitionDocumentBundle stored at $folder
public
static getParameterInfo(string $folder) : array<string|int, mixed>
Parameters
- $folder : string
-
file path to a stored PartitionDocumentBundle
Return values
array<string|int, mixed> —configuration info about the PartitionDocumentBundle
getPartition()
Returns the path to the archive file (used to store BLOB and SERIAL columns) for the $i partition in this PartitionDocumentBundle
public
getPartition(int $i) : string
Parameters
- $i : int
-
partition to get the archive file name for
Return values
string —path of $i partition archive file
getPartitionIndex()
Returns the path to the index file (used to store all columns a partition record except blob and serial columns) for the $i partition in this PartitionDocumentBundle
public
getPartitionIndex(int $i) : string
Parameters
- $i : int
-
partition to get the index file name for
Return values
string —path of $i partition index file
initCountIfNotExists()
Creates a new counter $field to be maintained
public
initCountIfNotExists([string $field = "COUNT" ]) : mixed
Parameters
- $field : string = "COUNT"
-
field of info struct to add a counter for
Return values
mixed —loadPartitionIndex()
Returns the unserialized index file for the $partition partition of this PartitionDocumentBundle. If $force_load is set to true then reloads from disk rather than use a cached value if present.
public
loadPartitionIndex(int $partition[, bool $force_load = false ][, int $mode = PackedTableTools::REPLACE_MODE ]) : mixed
Parameters
- $partition : int
-
which partition index to read
- $force_load : bool = false
-
whether to reload the index from disk or to use a cached value if present
- $mode : int = PackedTableTools::REPLACE_MODE
-
PackedTableTools mode to use when reading in partition
Return values
mixed —either a string if $mode as AS_STRING_MODE, or array $key => packed records pairs where records are packed according to this PartitionDocumentBundle's signature
put()
Used to add new records to the PartitionDocumentBundle
public
put(array<string|int, mixed> $row_or_rows) : bool
Parameters
- $row_or_rows : array<string|int, mixed>
-
either array of record with fields given by this PartitionDocumentBundle's signature or an array of rows.
Return values
bool —success or not
saveParameters()
Save the operating parameters of this PartitionDocumentBundle
public
saveParameters() : mixed
Return values
mixed —addArchive()
Used to add a blob item to the current save partition file.
protected
addArchive(string $value) : array<string|int, mixed>
Parameters
- $value : string
-
blob item to be added to file
Return values
array<string|int, mixed> —[offset into save partition, length stored, partition number OF current save partition]