Yioop_V9.5_Source_Code_Documentation

BZip2BlockIterator
in package

This class is used to allow one to iterate through a Bzip2 file.

The main advantage of using this class over the built-in bzip is that it can "remember" where it left off between serializations. So can continue where left off between web invocations. This is used in doing archive crawls of wiki dumps to allow the name server picks up where it left off.

Tags
author

Shawn Tice, (some docs added by Chris Pollett chris@pollett.org)

Table of Contents

BLOCK_ENDMARK  = "\x17rE8P\x90"
String at the end of each bz2 block
BLOCK_HEADER  = "1AY&SY"
String at the start of each bz2 block
BLOCK_LEADER_RE  = ' / \\x41\\x59\\x26\\x53\\x59 | \\xa0\\xac\\x93\\x29\\xac | \\x50\\x56\\x49\\x94\\xd6 |\\x28\\x2b\\x24\\xca\\x6b | \\x14\\x15\\x92\\x65\\x35 | \\x8a\\x0a\\xc9\\x32\\x9a |\\xc5\\x05\\x64\\x99\\x4d | \\x62\\x82\\xb2\\x4c\\xa6 |\\x72\\x45\\x38\\x50\\x90 | \\xb9\\x22\\x9c\\x28\\x48 | \\xdc\\x91\\x4e\\x14\\x24 |\\xee\\x48\\xa7\\x0a\\x12 | \\x77\\x24\\x53\\x85\\x09 | \\xbb\\x92\\x29\\xc2\\x84 |\\x5d\\xc9\\x14\\xe1\\x42 | \\x2e\\xe4\\x8a\\x70\\xa1 /x'
Blocks are NOT byte-aligned, so the block header (and endmark) may show up shifted right by 0-8 bits in various places throughout the file. This regular expression matches any of the possible shifts for both the block header and the block endmark.
BLOCK_SIZE  = 8192
How many bytes to read into buffer from bz2 stream in one go
MAGIC  = 'BZh'
String to tell if file is a bz2 file
$bits  : int
Stores the left over bits of a bz2 block
$block  : string
Used to build and store a bz2 block from the file stream
$buffer  : string
Since block sizes are not constant used to store sufficiently many bytes so can properly extract next blocks
$fd  : resource
File handle for bz2 file
$file_offset  : int
Byte offset into bz2 file
$header  : string
$header_info  : array<string|int, mixed>
Lookup table fpr the number of bits by which the magic number for the next block has been shifted right. Second components of sub-arrays say whether block header or endmark
$num_extra_bits  : int
Store how many left-over bits there are
$path  : string
__construct()  : mixed
Creates a new iterator of a bz2 file by opening the file, doing a sanity check and then setting up the initial file_offset to where the data starts
__wakeup()  : mixed
Called by unserialize prior to execution
close()  : bool
Used to close the file associated with this iterator
eof()  : bool
Checks whether the current Bzip2 file has reached an end of file
nextBlock()  : mixed
Extracts the next bz2 block from the bzip2 file this iterator works on
packLeft()  : mixed
Computes a new bzip2 block portions and bits left over after adding $bytes to the passed $block.

Constants

BLOCK_ENDMARK

String at the end of each bz2 block

public mixed BLOCK_ENDMARK = "\x17rE8P\x90"

BLOCK_HEADER

String at the start of each bz2 block

public mixed BLOCK_HEADER = "1AY&SY"

BLOCK_LEADER_RE

Blocks are NOT byte-aligned, so the block header (and endmark) may show up shifted right by 0-8 bits in various places throughout the file. This regular expression matches any of the possible shifts for both the block header and the block endmark.

public mixed BLOCK_LEADER_RE = ' / \\x41\\x59\\x26\\x53\\x59 | \\xa0\\xac\\x93\\x29\\xac | \\x50\\x56\\x49\\x94\\xd6 |\\x28\\x2b\\x24\\xca\\x6b | \\x14\\x15\\x92\\x65\\x35 | \\x8a\\x0a\\xc9\\x32\\x9a |\\xc5\\x05\\x64\\x99\\x4d | \\x62\\x82\\xb2\\x4c\\xa6 |\\x72\\x45\\x38\\x50\\x90 | \\xb9\\x22\\x9c\\x28\\x48 | \\xdc\\x91\\x4e\\x14\\x24 |\\xee\\x48\\xa7\\x0a\\x12 | \\x77\\x24\\x53\\x85\\x09 | \\xbb\\x92\\x29\\xc2\\x84 |\\x5d\\xc9\\x14\\xe1\\x42 | \\x2e\\xe4\\x8a\\x70\\xa1 /x'

BLOCK_SIZE

How many bytes to read into buffer from bz2 stream in one go

public mixed BLOCK_SIZE = 8192

Properties

$block

Used to build and store a bz2 block from the file stream

public string $block = ''

$buffer

Since block sizes are not constant used to store sufficiently many bytes so can properly extract next blocks

public string $buffer = ''

$header_info

Lookup table fpr the number of bits by which the magic number for the next block has been shifted right. Second components of sub-arrays say whether block header or endmark

public static array<string|int, mixed> $header_info = ["A" => [0, true], "\xa0" => [1, true], "P" => [2, true], "(" => [3, true], "\x14" => [4, true], "\x8a" => [5, true], "\xc5" => [6, true], "b" => [7, true], "r" => [0, false], "\xb9" => [1, false], "\xdc" => [2, false], "\xee" => [3, false], "w" => [4, false], "\xbb" => [5, false], "]" => [6, false], "." => [7, false]]

$num_extra_bits

Store how many left-over bits there are

public int $num_extra_bits = 0

Methods

__construct()

Creates a new iterator of a bz2 file by opening the file, doing a sanity check and then setting up the initial file_offset to where the data starts

public __construct(string $path) : mixed
Parameters
$path : string

file path of bz2 file

Return values
mixed

__wakeup()

Called by unserialize prior to execution

public __wakeup() : mixed
Return values
mixed

close()

Used to close the file associated with this iterator

public close() : bool
Return values
bool

whether the file close was successful

eof()

Checks whether the current Bzip2 file has reached an end of file

public eof() : bool
Return values
bool

eof or not

nextBlock()

Extracts the next bz2 block from the bzip2 file this iterator works on

public nextBlock([bool $raw = false ]) : mixed
Parameters
$raw : bool = false

if false then decompress the recovered block

Return values
mixed

packLeft()

Computes a new bzip2 block portions and bits left over after adding $bytes to the passed $block.

public packLeft(string &$block, int &$bits, string $bytes, int $num_extra_bits) : mixed
Parameters
$block : string

the block to add to

$bits : int

used to hold bits left over

$bytes : string

what to add to the bzip block

$num_extra_bits : int

how many extra bits there are

Return values
mixed

        

Search results