Yioop_V9.5_Source_Code_Documentation

PartitionDocumentBundle
in package

A partition document bundle is a collection of partition each of which in turn can hold a concatenated sequence of compressed documents and which are managed together. It is a successor format to the earlier WebArchiveBundle of Yioop. The partition document bundle stores individual records using a record format defined via the PackedTableTools class.

This basic format has been extended by two new types BLOB and SERIAL (a PHP serialized object represesnted as a blob). Data for columns of these types are stored in separate files from the rest of records. Offset into these archive files for blobs and serial's are stored in a record as columns representing a difference list of int's together with a LAST_BLOB_LEN column. Using this info, blob's and serial's associated with a record can be retrieved. How many documents are together with a collected into a partition can be tuned for read, write, and in-memory efficiency.

Tags
author

Chris Pollett

Table of Contents

DEFAULT_COMPRESSOR  = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"
Compression strategy used to compress blob and serial columns
DEFAULT_PARAMETERS  = ["RECORD_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]
Default parameters to use when constructing a PartitionDocumentBundle
INDEX_EXTENSION  = ".ix"
Extension for PartitionDocumentBundle partition files used to contain records
MAX_ITEMS_PER_FILE  = 16384
Default maximum number of records to store in a partition
PARAMETERS_FILE  = "pdb_parameters.txt"
File name of file used to store the parameters of this PartitionDocumentBundle
PARTITION_PREFIX  = "partition_"
Prefix to file names of PartitionDocumentBundle partition files
PARTITION_SIZE_THRESHOLD  = 2147483648
Maximum number of bytes a partition can have before the next partition is started. Notice this implies a maximum file size to store in BLOB columns
$add_archive_cache  : array<string|int, mixed>
Used to store the file handle to, the partition number, and last add time for the last time an item's blob/serial columns were added to for the PartitionDocumentBundle
$blob_columns  : array<string|int, mixed>
Array of column names for the columns in a PartitionDocumentBundle which are of type BLOB or SERIAL
$blob_compressor  : object
The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.
$folder  : string
Folder path where the PartitionDocumentBundle is stored
$get_archive_cache  : array<string|int, mixed>
Used to store the file handle to, the partition number, and last access time for the last time an item's blob/serial columns were accessed for the PartitionDocumentBundle
$index_cache  : array<string|int, mixed>
In memory cache of partitions from the PartitionDocumentBundle
$index_cache_size  : mixed
Maximum number of items the partition cache is allowed to hold
$instance_time  : int
Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the bundle)
$key_field  : string
Name of primary key column for records
$parameters  : array<string|int, mixed>
Stores the constructor parameters used to create this PartitionDocumentBundle
$record_compressor  : object
The seekquarry\yioop\library\compressors\Compressor object used to compress record files.
$save_index  : mixed
Holds loaded unserialized index file data from $partition partition bundle
$serial_columns  : array<string|int, mixed>
Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL
$table_tools  : object
The PackedTableTools object used to pack and unpack records in partitions
__construct()  : mixed
Used to create a new instance of a PartitionDocumentBundle
addCount()  : mixed
Add $num to maintained counter $field
advanceSavePartition()  : mixed
Saves the current save partition, adds one to the save partition number, and starts a new save partition.
get()  : array<string|int, mixed>|false
Returns $fields columns from the record associated with $key in the $partition partition of this PartitionDocumentBundle if exists.
getArchive()  : string
Retrieve a BLOB string in the file $archive_filename at byte position $offset of length $len. It uncompresses this string using $compressor->uncompress and return the result.
getParameterInfo()  : array<string|int, mixed>
Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the PartitionDocumentBundle stored at $folder
getPartition()  : string
Returns the path to the archive file (used to store BLOB and SERIAL columns) for the $i partition in this PartitionDocumentBundle
getPartitionIndex()  : string
Returns the path to the index file (used to store all columns a partition record except blob and serial columns) for the $i partition in this PartitionDocumentBundle
initCountIfNotExists()  : mixed
Creates a new counter $field to be maintained
loadPartitionIndex()  : mixed
Returns the unserialized index file for the $partition partition of this PartitionDocumentBundle. If $force_load is set to true then reloads from disk rather than use a cached value if present.
put()  : bool
Used to add new records to the PartitionDocumentBundle
saveParameters()  : mixed
Save the operating parameters of this PartitionDocumentBundle
addArchive()  : array<string|int, mixed>
Used to add a blob item to the current save partition file.

Constants

DEFAULT_COMPRESSOR

Compression strategy used to compress blob and serial columns

public mixed DEFAULT_COMPRESSOR = \seekquarry\yioop\configs\NS_COMPRESSORS . "NonCompressor"

DEFAULT_PARAMETERS

Default parameters to use when constructing a PartitionDocumentBundle

public mixed DEFAULT_PARAMETERS = ["RECORD_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "BLOB_COMPRESSOR" => self::DEFAULT_COMPRESSOR, "COUNT" => 0, "PARTITION_SIZE_THRESHOLD" => self::PARTITION_SIZE_THRESHOLD, "FORMAT" => ["PRIMARY KEY" => "KEY", "VALUE" => "BLOB"], "MAX_ITEMS_PER_FILE" => self::MAX_ITEMS_PER_FILE, "SAVE_PARTITION" => 0, "ACTIVE_COUNT" => 0]

INDEX_EXTENSION

Extension for PartitionDocumentBundle partition files used to contain records

public mixed INDEX_EXTENSION = ".ix"

MAX_ITEMS_PER_FILE

Default maximum number of records to store in a partition

public mixed MAX_ITEMS_PER_FILE = 16384

PARAMETERS_FILE

File name of file used to store the parameters of this PartitionDocumentBundle

public mixed PARAMETERS_FILE = "pdb_parameters.txt"

PARTITION_PREFIX

Prefix to file names of PartitionDocumentBundle partition files

public mixed PARTITION_PREFIX = "partition_"

PARTITION_SIZE_THRESHOLD

Maximum number of bytes a partition can have before the next partition is started. Notice this implies a maximum file size to store in BLOB columns

public mixed PARTITION_SIZE_THRESHOLD = 2147483648

Properties

$add_archive_cache

Used to store the file handle to, the partition number, and last add time for the last time an item's blob/serial columns were added to for the PartitionDocumentBundle

public array<string|int, mixed> $add_archive_cache = [null, "", -1]

$blob_columns

Array of column names for the columns in a PartitionDocumentBundle which are of type BLOB or SERIAL

public array<string|int, mixed> $blob_columns

$blob_compressor

The seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.

public object $blob_compressor

$get_archive_cache

Used to store the file handle to, the partition number, and last access time for the last time an item's blob/serial columns were accessed for the PartitionDocumentBundle

public array<string|int, mixed> $get_archive_cache = [null, "", -1]

$index_cache

In memory cache of partitions from the PartitionDocumentBundle

public array<string|int, mixed> $index_cache

$index_cache_size

Maximum number of items the partition cache is allowed to hold

public mixed $index_cache_size

$instance_time

Used to keep track of when this instance was created, as part of managing file handles expiration (could be set/updated externally to reflect some other instance using the bundle)

public int $instance_time

$parameters

Stores the constructor parameters used to create this PartitionDocumentBundle

public array<string|int, mixed> $parameters

$record_compressor

The seekquarry\yioop\library\compressors\Compressor object used to compress record files.

public object $record_compressor

$save_index

Holds loaded unserialized index file data from $partition partition bundle

public mixed $save_index

$serial_columns

Array of column names for the columns in a PartitionDocumentBundle which are of type SERIAL

public array<string|int, mixed> $serial_columns

$table_tools

The PackedTableTools object used to pack and unpack records in partitions

public object $table_tools

Methods

__construct()

Used to create a new instance of a PartitionDocumentBundle

public __construct(string $folder[, array<string|int, mixed> $format = self::DEFAULT_PARAMETERS["FORMAT"] ][, int $max_items_per_file = self::MAX_ITEMS_PER_FILE ][, int $partition_size_threshold = self::PARTITION_SIZE_THRESHOLD ][, object $record_compressor_type = self::DEFAULT_COMPRESSOR ][, object $blob_compressor_type = self::DEFAULT_COMPRESSOR ]) : mixed
Parameters
$folder : string

the path to the folder to store this PartitionDocumentBundle

$format : array<string|int, mixed> = self::DEFAULT_PARAMETERS["FORMAT"]

the column names, keys and types for this PartitionDocumentBundle object

$max_items_per_file : int = self::MAX_ITEMS_PER_FILE

maximum number of items to store in a partition before making the next partition

$partition_size_threshold : int = self::PARTITION_SIZE_THRESHOLD

maximum length of a partition file in bytes before a new partition file should be started

$record_compressor_type : object = self::DEFAULT_COMPRESSOR

seekquarry\yioop\library\compressors\Compressor object used to compress record files excluding blob columns.

$blob_compressor_type : object = self::DEFAULT_COMPRESSOR

seekquarry\yioop\library\compressors\Compressor object used to compress blob columns.

Return values
mixed

addCount()

Add $num to maintained counter $field

public addCount(int $num[, string $field = "COUNT" ]) : mixed
Parameters
$num : int

number of items to add to current count

$field : string = "COUNT"

field of info struct to add to the count of

Return values
mixed

advanceSavePartition()

Saves the current save partition, adds one to the save partition number, and starts a new save partition.

public advanceSavePartition(int $new_save_partition) : mixed
Parameters
$new_save_partition : int

partition and add one to. If use default, then this method will use the parameters "SAVE_PARTITION" value.

Return values
mixed

get()

Returns $fields columns from the record associated with $key in the $partition partition of this PartitionDocumentBundle if exists.

public get(string $key, int $partition[, array<string|int, mixed> $fields = [] ]) : array<string|int, mixed>|false

If $fields is empty all columns returned.

Parameters
$key : string

to look up in partition

$partition : int

to look for record in

$fields : array<string|int, mixed> = []

names of fields in this PartitionDocumentBundle to return

Return values
array<string|int, mixed>|false

unpacked record on success, otherwise false

getArchive()

Retrieve a BLOB string in the file $archive_filename at byte position $offset of length $len. It uncompresses this string using $compressor->uncompress and return the result.

public getArchive(string $archive_filename, int $offset, int $len) : string
Parameters
$archive_filename : string

the filename of a partition archive file to get a blob object from

$offset : int

a byte position in that file

$len : int

number of bytes from $offset to read.

Return values
string

the result of uncompressing the string at $offset of length $len

getParameterInfo()

Returns the parameters (such as its signature, max number of documents per partition and counts) used to configure the PartitionDocumentBundle stored at $folder

public static getParameterInfo(string $folder) : array<string|int, mixed>
Parameters
$folder : string

file path to a stored PartitionDocumentBundle

Return values
array<string|int, mixed>

configuration info about the PartitionDocumentBundle

getPartition()

Returns the path to the archive file (used to store BLOB and SERIAL columns) for the $i partition in this PartitionDocumentBundle

public getPartition(int $i) : string
Parameters
$i : int

partition to get the archive file name for

Return values
string

path of $i partition archive file

getPartitionIndex()

Returns the path to the index file (used to store all columns a partition record except blob and serial columns) for the $i partition in this PartitionDocumentBundle

public getPartitionIndex(int $i) : string
Parameters
$i : int

partition to get the index file name for

Return values
string

path of $i partition index file

initCountIfNotExists()

Creates a new counter $field to be maintained

public initCountIfNotExists([string $field = "COUNT" ]) : mixed
Parameters
$field : string = "COUNT"

field of info struct to add a counter for

Return values
mixed

loadPartitionIndex()

Returns the unserialized index file for the $partition partition of this PartitionDocumentBundle. If $force_load is set to true then reloads from disk rather than use a cached value if present.

public loadPartitionIndex(int $partition[, bool $force_load = false ][, int $mode = PackedTableTools::REPLACE_MODE ]) : mixed
Parameters
$partition : int

which partition index to read

$force_load : bool = false

whether to reload the index from disk or to use a cached value if present

$mode : int = PackedTableTools::REPLACE_MODE

PackedTableTools mode to use when reading in partition

Return values
mixed

either a string if $mode as AS_STRING_MODE, or array $key => packed records pairs where records are packed according to this PartitionDocumentBundle's signature

put()

Used to add new records to the PartitionDocumentBundle

public put(array<string|int, mixed> $row_or_rows) : bool
Parameters
$row_or_rows : array<string|int, mixed>

either array of record with fields given by this PartitionDocumentBundle's signature or an array of rows.

Return values
bool

success or not

saveParameters()

Save the operating parameters of this PartitionDocumentBundle

public saveParameters() : mixed
Return values
mixed

addArchive()

Used to add a blob item to the current save partition file.

protected addArchive(string $value) : array<string|int, mixed>
Parameters
$value : string

blob item to be added to file

Return values
array<string|int, mixed>

[offset into save partition, length stored, partition number OF current save partition]


        

Search results