BZip2BlockIterator
in package
This class is used to allow one to iterate through a Bzip2 file.
The main advantage of using this class over the built-in bzip is that it can "remember" where it left off between serializations. So can continue where left off between web invocations. This is used in doing archive crawls of wiki dumps to allow the name server picks up where it left off.
Tags
Table of Contents
- BLOCK_ENDMARK = "\x17rE8P\x90"
- String at the end of each bz2 block
- BLOCK_HEADER = "1AY&SY"
- String at the start of each bz2 block
- BLOCK_LEADER_RE = ' / \\x41\\x59\\x26\\x53\\x59 | \\xa0\\xac\\x93\\x29\\xac | \\x50\\x56\\x49\\x94\\xd6 |\\x28\\x2b\\x24\\xca\\x6b | \\x14\\x15\\x92\\x65\\x35 | \\x8a\\x0a\\xc9\\x32\\x9a |\\xc5\\x05\\x64\\x99\\x4d | \\x62\\x82\\xb2\\x4c\\xa6 |\\x72\\x45\\x38\\x50\\x90 | \\xb9\\x22\\x9c\\x28\\x48 | \\xdc\\x91\\x4e\\x14\\x24 |\\xee\\x48\\xa7\\x0a\\x12 | \\x77\\x24\\x53\\x85\\x09 | \\xbb\\x92\\x29\\xc2\\x84 |\\x5d\\xc9\\x14\\xe1\\x42 | \\x2e\\xe4\\x8a\\x70\\xa1 /x'
- Blocks are NOT byte-aligned, so the block header (and endmark) may show up shifted right by 0-8 bits in various places throughout the file. This regular expression matches any of the possible shifts for both the block header and the block endmark.
- BLOCK_SIZE = 8192
- How many bytes to read into buffer from bz2 stream in one go
- MAGIC = 'BZh'
- String to tell if file is a bz2 file
- $bits : int
- Stores the left over bits of a bz2 block
- $block : string
- Used to build and store a bz2 block from the file stream
- $buffer : string
- Since block sizes are not constant used to store sufficiently many bytes so can properly extract next blocks
- $fd : resource
- File handle for bz2 file
- $file_offset : int
- Byte offset into bz2 file
- $header : string
- $header_info : array<string|int, mixed>
- Lookup table fpr the number of bits by which the magic number for the next block has been shifted right. Second components of sub-arrays say whether block header or endmark
- $num_extra_bits : int
- Store how many left-over bits there are
- $path : string
- __construct() : mixed
- Creates a new iterator of a bz2 file by opening the file, doing a sanity check and then setting up the initial file_offset to where the data starts
- __wakeup() : mixed
- Called by unserialize prior to execution
- close() : bool
- Used to close the file associated with this iterator
- eof() : bool
- Checks whether the current Bzip2 file has reached an end of file
- nextBlock() : mixed
- Extracts the next bz2 block from the bzip2 file this iterator works on
- packLeft() : mixed
- Computes a new bzip2 block portions and bits left over after adding $bytes to the passed $block.
Constants
BLOCK_ENDMARK
String at the end of each bz2 block
public
mixed
BLOCK_ENDMARK
= "\x17rE8P\x90"
BLOCK_HEADER
String at the start of each bz2 block
public
mixed
BLOCK_HEADER
= "1AY&SY"
BLOCK_LEADER_RE
Blocks are NOT byte-aligned, so the block header (and endmark) may show up shifted right by 0-8 bits in various places throughout the file. This regular expression matches any of the possible shifts for both the block header and the block endmark.
public
mixed
BLOCK_LEADER_RE
= '
/
\\x41\\x59\\x26\\x53\\x59 | \\xa0\\xac\\x93\\x29\\xac | \\x50\\x56\\x49\\x94\\xd6
|\\x28\\x2b\\x24\\xca\\x6b | \\x14\\x15\\x92\\x65\\x35 | \\x8a\\x0a\\xc9\\x32\\x9a
|\\xc5\\x05\\x64\\x99\\x4d | \\x62\\x82\\xb2\\x4c\\xa6
|\\x72\\x45\\x38\\x50\\x90 | \\xb9\\x22\\x9c\\x28\\x48 | \\xdc\\x91\\x4e\\x14\\x24
|\\xee\\x48\\xa7\\x0a\\x12 | \\x77\\x24\\x53\\x85\\x09 | \\xbb\\x92\\x29\\xc2\\x84
|\\x5d\\xc9\\x14\\xe1\\x42 | \\x2e\\xe4\\x8a\\x70\\xa1
/x'
BLOCK_SIZE
How many bytes to read into buffer from bz2 stream in one go
public
mixed
BLOCK_SIZE
= 8192
MAGIC
String to tell if file is a bz2 file
public
mixed
MAGIC
= 'BZh'
Properties
$bits
Stores the left over bits of a bz2 block
public
int
$bits
= 0
$block
Used to build and store a bz2 block from the file stream
public
string
$block
= ''
$buffer
Since block sizes are not constant used to store sufficiently many bytes so can properly extract next blocks
public
string
$buffer
= ''
$fd
File handle for bz2 file
public
resource
$fd
= null
$file_offset
Byte offset into bz2 file
public
int
$file_offset
= 0
$header
public
string
$header
$header_info
Lookup table fpr the number of bits by which the magic number for the next block has been shifted right. Second components of sub-arrays say whether block header or endmark
public
static array<string|int, mixed>
$header_info
= ["A" => [0, true], "\xa0" => [1, true], "P" => [2, true], "(" => [3, true], "\x14" => [4, true], "\x8a" => [5, true], "\xc5" => [6, true], "b" => [7, true], "r" => [0, false], "\xb9" => [1, false], "\xdc" => [2, false], "\xee" => [3, false], "w" => [4, false], "\xbb" => [5, false], "]" => [6, false], "." => [7, false]]
$num_extra_bits
Store how many left-over bits there are
public
int
$num_extra_bits
= 0
$path
public
string
$path
Methods
__construct()
Creates a new iterator of a bz2 file by opening the file, doing a sanity check and then setting up the initial file_offset to where the data starts
public
__construct(string $path) : mixed
Parameters
- $path : string
-
file path of bz2 file
Return values
mixed —__wakeup()
Called by unserialize prior to execution
public
__wakeup() : mixed
Return values
mixed —close()
Used to close the file associated with this iterator
public
close() : bool
Return values
bool —whether the file close was successful
eof()
Checks whether the current Bzip2 file has reached an end of file
public
eof() : bool
Return values
bool —eof or not
nextBlock()
Extracts the next bz2 block from the bzip2 file this iterator works on
public
nextBlock([bool $raw = false ]) : mixed
Parameters
- $raw : bool = false
-
if false then decompress the recovered block
Return values
mixed —packLeft()
Computes a new bzip2 block portions and bits left over after adding $bytes to the passed $block.
public
packLeft(string &$block, int &$bits, string $bytes, int $num_extra_bits) : mixed
Parameters
- $block : string
-
the block to add to
- $bits : int
-
used to hold bits left over
- $bytes : string
-
what to add to the bzip block
- $num_extra_bits : int
-
how many extra bits there are