A partition document bundle is a collection of partition each of which in turn can hold a concatenated sequence of compressed documents and which are managed together. It is a successor format to the earlier WebArchiveBundle of Yioop. The partition document bundle stores individual records using a record format defined via the PackedTableTools class.

This basic format has been extended by two new types BLOB and SERIAL (a PHP serialized object represesnted as a blob). Data for columns of these types are stored in separate files from the rest of records. Offset into these archive files for blobs and serial's are stored in a record as columns representing a difference list of int's together with a LAST_BLOB_LEN column. Using this info, blob's and serial's associated with a record can be retrieved. How many documents are together with a collected into a partition can be tuned for read, write, and in-memory efficiency.


Chris Pollett

Table of Contents

DEFAULT_COMPRESSOR  = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
Compression strategy used to compress blob and serial columns
Default parameters to use when constructing a PartitionDocumentBundle
Extension for PartitionDocumentBundle partition files used to contain records
Default maximum number of records to store in a partition
PARAMETERS_FILE  = "pdb_parameters.txt"
File name of file used to store the parameters of this PartitionDocumentBundle
PARTITION_PREFIX  = "partition_"
Prefix to file names of PartitionDocumentBundle partition files
Maximum number of bytes a partition can have before the next partition is started. Notice this implies a maximum file size to store in BLOB columns
$add_archive_cache  : array<string|int, mixed>
Used to store the file handle to, the partition number, and last add time for the last time an item's blob/serial columns were added to for the PartitionDocumentBundle
$blob_columns  : array<string|int, mixed>
Array of column names for the columns in a PartitionDocumentBundle which are of type BLOB or SERIAL
$blob_compressor  : object
The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
$folder  : string
Folder path where the PartitionDocumentBundle is stored
$get_archive_cache  : array<string|int, mixed>
Used to store the file handle to, the partition number, and last access time for the last time an item's blob/serial columns were accessed for the PartitionDocumentBundle
$index_cache  : array<string|int, mixed>
In memory cache of partitions from the PartitionDocumentBundle
$index_cache_size  : mixed
Maximum number of items the partition cache is allowed to hold
$instance_time  : int
Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the bundle)
$key_field  : string
Name of primary key column for records
$parameters  : array<string|int, mixed>
Stores the constructor parameters used to create this PartitionDocumentBundle
$record_compressor  : object
The seekquarry\yioop\library\compressors\Compressor object used to compress record files.
$save_index  : mixed
Holds loaded unserialized index file data from $partition partition bundle
$serial_columns  : array<string|int, mixed>
Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
$table_tools  : object
The PackedTableTools object used to pack and unpack records in partitions
__construct()  : mixed
Used to create a new instance of a PartitionDocumentBundle
addCount()  : mixed
Add $num to maintained counter $field
advanceSavePartition()  : mixed
Saves the current save partition, adds one to the save partition number, and starts a new save partition.
get()  : array<string|int, mixed>|false
Returns $fields columns from the record associated with $key in the $partition partition of this PartitionDocumentBundle if exists.
getArchive()  : string
Retrieve a BLOB string in the file $archive_filename at byte position $offset of length $len. It uncompresses this string using $compressor->uncompress and return the result.
getParameterInfo()  : array<string|int, mixed>
Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the PartitionDocumentBundle stored at $folder
getPartition()  : string
Returns the path to the archive file (used to store BLOB and SERIAL columns) for the $i partition in this PartitionDocumentBundle
getPartitionIndex()  : string
Returns the path to the index file (used to store all columns a partition record except blob and serial columns) for the $i partition in this PartitionDocumentBundle
initCountIfNotExists()  : mixed
Creates a new counter $field to be maintained
loadPartitionIndex()  : mixed
Returns the unserialized index file for the $partition partition of this PartitionDocumentBundle. If $force_load is set to true then reloads from disk rather than use a cached value if present.
put()  : bool
Used to add new records to the PartitionDocumentBundle
saveParameters()  : mixed
Save the operating parameters of this PartitionDocumentBundle
addArchive()  : array<string|int, mixed>
Used to add a blob item to the current save partition file.



Compression strategy used to compress blob and serial columns

public mixed DEFAULT_COMPRESSOR = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"


Default parameters to use when constructing a PartitionDocumentBundle



Extension for PartitionDocumentBundle partition files used to contain records

public mixed INDEX_EXTENSION = ".ix"


Default maximum number of records to store in a partition

public mixed MAX_ITEMS_PER_FILE = 16384


File name of file used to store the parameters of this PartitionDocumentBundle

public mixed PARAMETERS_FILE = "pdb_parameters.txt"


Prefix to file names of PartitionDocumentBundle partition files

public mixed PARTITION_PREFIX = "partition_"


Maximum number of bytes a partition can have before the next partition is started. Notice this implies a maximum file size to store in BLOB columns

public mixed PARTITION_SIZE_THRESHOLD = 2147483648



Used to store the file handle to, the partition number, and last add time for the last time an item's blob/serial columns were added to for the PartitionDocumentBundle

public array<string|int, mixed> $add_archive_cache = [null, "", -1]


Array of column names for the columns in a PartitionDocumentBundle which are of type BLOB or SERIAL

public array<string|int, mixed> $blob_columns


The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.

public object $blob_compressor


Used to store the file handle to, the partition number, and last access time for the last time an item's blob/serial columns were accessed for the PartitionDocumentBundle

public array<string|int, mixed> $get_archive_cache = [null, "", -1]


In memory cache of partitions from the PartitionDocumentBundle

public array<string|int, mixed> $index_cache


Maximum number of items the partition cache is allowed to hold

public mixed $index_cache_size


Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the bundle)

public int $instance_time


Stores the constructor parameters used to create this PartitionDocumentBundle

public array<string|int, mixed> $parameters


The seekquarry\yioop\library\compressors\Compressor object used to compress record files.

public object $record_compressor


Holds loaded unserialized index file data from $partition partition bundle

public mixed $save_index


Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL

public array<string|int, mixed> $serial_columns


The PackedTableTools object used to pack and unpack records in partitions

public object $table_tools



Used to create a new instance of a PartitionDocumentBundle

public __construct(string $folder[, array<string|int, mixed> $format = self::DEFAULT_PARAMETERS["FORMAT"] ][, int $max_items_per_file = self::MAX_ITEMS_PER_FILE ][, int $partition_size_threshold = self::PARTITION_SIZE_THRESHOLD ][, object $record_compressor_type = self::DEFAULT_COMPRESSOR ][, object $blob_compressor_type = self::DEFAULT_COMPRESSOR ]) : mixed
$folder : string

the path to the folder to store this PartitionDocumentBundle

$format : array<string|int, mixed> = self::DEFAULT_PARAMETERS["FORMAT"]

the column names, keys and types for this PartitionDocumentBundle object

$max_items_per_file : int = self::MAX_ITEMS_PER_FILE

maximum number of items to store in a partition before making the next partition

$partition_size_threshold : int = self::PARTITION_SIZE_THRESHOLD

maximum length of a partition file in bytes before a new partition file should be started

$record_compressor_type : object = self::DEFAULT_COMPRESSOR

seekquarry\yioop\library\compressors\Compressor object used to compress record files excluding blob columns.

$blob_compressor_type : object = self::DEFAULT_COMPRESSOR

seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.

Return values


Add $num to maintained counter $field

public addCount(int $num[, string $field = "COUNT" ]) : mixed
$num : int

number of items to add to current count

$field : string = "COUNT"

field of info struct to add to the count of

Return values


Saves the current save partition, adds one to the save partition number, and starts a new save partition.

public advanceSavePartition(int $new_save_partition) : mixed
$new_save_partition : int

partition and add one to. If use default, then this method will use the parameters "SAVE_PARTITION" value.

Return values


Returns $fields columns from the record associated with $key in the $partition partition of this PartitionDocumentBundle if exists.

public get(string $key, int $partition[, array<string|int, mixed> $fields = [] ]) : array<string|int, mixed>|false

If $fields is empty all columns returned.

$key : string

to look up in partition

$partition : int

to look for record in

$fields : array<string|int, mixed> = []

names of fields in this PartitionDocumentBundle to return

Return values
array<string|int, mixed>|false

unpacked record on success, otherwise false


Retrieve a BLOB string in the file $archive_filename at byte position $offset of length $len. It uncompresses this string using $compressor->uncompress and return the result.

public getArchive(string $archive_filename, int $offset, int $len) : string
$archive_filename : string

the filename of a partition archive file to get a blob object from

$offset : int

a byte position in that file

$len : int

number of bytes from $offset to read.

Return values

the result of uncompressing the string at $offset of length $len


Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the PartitionDocumentBundle stored at $folder

public static getParameterInfo(string $folder) : array<string|int, mixed>
$folder : string

file path to a stored PartitionDocumentBundle

Return values
array<string|int, mixed>

configuration info about the PartitionDocumentBundle


Returns the path to the archive file (used to store BLOB and SERIAL columns) for the $i partition in this PartitionDocumentBundle

public getPartition(int $i) : string
$i : int

partition to get the archive file name for

Return values

path of $i partition archive file


Returns the path to the index file (used to store all columns a partition record except blob and serial columns) for the $i partition in this PartitionDocumentBundle

public getPartitionIndex(int $i) : string
$i : int

partition to get the index file name for

Return values

path of $i partition index file


Creates a new counter $field to be maintained

public initCountIfNotExists([string $field = "COUNT" ]) : mixed
$field : string = "COUNT"

field of info struct to add a counter for

Return values


Returns the unserialized index file for the $partition partition of this PartitionDocumentBundle. If $force_load is set to true then reloads from disk rather than use a cached value if present.

public loadPartitionIndex(int $partition[, bool $force_load = false ][, int $mode = PackedTableTools::REPLACE_MODE ]) : mixed
$partition : int

which partition index to read

$force_load : bool = false

whether to reload the index from disk or to use a cached value if present

$mode : int = PackedTableTools::REPLACE_MODE

PackedTableTools mode to use when reading in partition

Return values

either a string if $mode as AS_STRING_MODE, or array $key => packed records pairs where records are packed according to this PartitionDocumentBundle's signature


Used to add new records to the PartitionDocumentBundle

public put(array<string|int, mixed> $row_or_rows) : bool
$row_or_rows : array<string|int, mixed>

either array of record with fields given by this PartitionDocumentBundle's signature or an array of rows.

Return values

success or not


Save the operating parameters of this PartitionDocumentBundle

public saveParameters() : mixed
Return values


Used to add a blob item to the current save partition file.

protected addArchive(string $value) : array<string|int, mixed>
$value : string

blob item to be added to file

Return values
array<string|int, mixed>

[offset into save partition, length stored, partition number OF current save partition]


