Utility.php
SeekQuarry/Yioop -- Open Source Pure PHP Search Engine, Crawler, and Indexer
Copyright (C) 2009 - 2023 Chris Pollett chris@pollett.org
LICENSE:
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
END LICENSE
A library of string, error reporting, log, hash, time, and conversion functions
Tags
Interfaces, Classes, Traits and Enums
- Mod9Constants
- Mini-class (so not own file) used to hold encode decode info related to Mod9 encoding (as variant of Simplified-9 specify to Yioop).
Table of Contents
- addRegexDelimiters() : string
- Adds delimiters to a regex that may or may not have them
- preg_search() : mixed
- search for a pcre pattern in a subject from a given offset, return position of first match if found -1 otherwise.
- preg_offset_replace() : string
- Replaces a pcre pattern with a replacement in $subject starting from some offset.
- parse_ini_with_fallback() : array<string|int, mixed>
- Yioop replacement for parse_ini_file($name, true) in case parse_ini_file is on the disable_functions list. Name has underscores to match original function. This function checks if parse_ini_file is disabled on not. If not, it just calls parse_ini_file; otherwise, it simulates it enough so that configure.ini files used for string translations can be read.
- getIniAssignMatch() : mixed
- Auxiliary function called from parse_ini_with_fallback to extract from the $matches array produced by the former function's preg_match what kind of assignment occurred in the ini file being parsed.
- charCopy() : mixed
- Copies from $source string beginning at position $start, $length many bytes to destination string
- vByteEncode() : string
- Encodes an integer using variable byte coding.
- vByteDecode() : int
- Decodes from a string using variable byte coding an integer.
- appendUnary() : mixed
- Appends a number re-encoded in unary to the end of an input string starting at a given bit offset into the string. Here n in unary has bit representation n-1 0's followed by a 1.
- decodeUnary() : int
- Decodes a unary number froman input string at a given bit offset. Here n in unary has bit representation n-1 0's followed by a 1.
- appendBits() : string
- Appends $num_bits bits from the start of the binary rep of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. If $num_bits == -1, then appends all of $number.
- decodeBits() : int
- Decode $num_bits many bits from the $input string beginning at offset $start_bit_offset. The result of this operation is up $start_bit_offset by number of bits that were able to be decoded.
- appendGamma() : string
- Appends gamma code of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. $start_bit_offset is updated to bit position after append.
- decodeGammaList() : array<string|int, mixed>
- Decodes up to $num_decode gamma encoded integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers.
- appendRiceSequence() : string
- Appends using a Rice coding a sequence of integers $int_sequence at offset $start_bit_offset to the string $output, overwriting any bits present at that location. $start_bit_offset is updated to bit position after append.
- decodeRiceSequence() : array<string|int, mixed>
- Decodes up to $num_decode rice encoded difference list of integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers. If $delta_start >= 0 then the first int is assumed to be the difference from $delta_start;
- encodePositionList() : string
- Encodes a list of integer positions of a term in a document. This is done as a gamma code of the first integer followed by the Rice coding of the remaining integers using a modulus based on the average gap between integers. If the number of positions is 1 or 2 then a gamma of each position only is used.
- decodePositionList() : array<string|int, mixed>
- Decodes up to $num_decode term in document position integers from string $input under the assumption $input is encoded as per
- encode255() : string
- Recodes a string in a 1-1 fashion to a string not involving \xFF (255). I.e., it maps characters \xFE -> \xFE\FD and \xFF -> \xFE\FE
- decode255() : string
- Decodes a string in a 1-1 fashion from a string not involving \xFF (255). I.e., it maps characters \xFE\FE -> \xFF and \xFE\FD -> \xFF
- encodeUnderscore() : string
- Recodes a string in a 1-1 fashion to a string not involving underscore (_). I.e., it maps characters - -> -- and _ -> -=
- decodeUnderscore() : string
- Decodes a string in a 1-1 fashion from a string not involving underscore (_). I.e., it maps characters -= -> _ and -- -> -
- packEncode255() : string
- Encodes a list of strings as their @see encode255 versions separated by \xFF's
- unpackDecode255() : array<string|int, mixed>
- Decodes a list of strings from a string that encoded as their @see encode255 of its elements separated by \xFF's
- packPosting() : string
- Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.
- unpackPosting() : array<string|int, mixed>
- Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurrences of a word in that document.
- addDocIndexPostings() : string
- This method is used while appending one index shard to another.
- deltaList() : array<string|int, mixed>
- Computes the difference of a list of integers.
- deDeltaList() : array<string|int, mixed>
- Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function
- encodeModified9() : string
- Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string. NOTICE x>=1.
- packListModified9() : string
- Packs the contents of a single word of a sequence being encoded using Modified9.
- nextPostString() : string
- Returns the next complete posting string from $input_string being at offset.
- decodeModified9() : array<string|int, mixed>
- Decoded a sequence of positive integers from a string that has been encoded using Modified 9
- unpackListModified9() : array<string|int, mixed>
- Decode a single word with high two bits off according to modified 9
- docIndexModified9() : int
- Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.
- unpackInt() : int
- Unpacks an int from a 4 char string
- packInt() : string
- Packs an int into a 4 char string
- unpackFloat() : float
- Unpacks a float from a 4 char string
- packFloat() : string
- Packs an float into a four char string
- renameSerializedObject() : string
- Used to change the namespace of a serialized php object (assumes doesn't have nested subobjects)
- getDomFromString() : DOMDocument
- Parses a provided string to make a DOM object. First tries to parse using XML and if this fails uses the more robust HTML Dom parser and manipulates the resulting DOM tree to make correspond to original tags for XML that isn't HTML
- getTags() : array<string|int, mixed>
- Returns an array of DOMDocuments for the nodes that match an xpath query on $dom, a DOMDocument
- toHexString() : string
- Converts a string to string where each char has been replaced by its hexadecimal equivalent
- toIntString() : string
- Converts a string to string where each char has been replaced by a Integer equivalent
- toBinString() : string
- Converts a string to string where each char has been replaced by its binary equivalent
- metricToInt() : int
- Converts a string of the form some int followed by K, M, or G.
- intToMetric() : string
- Converts a number to a string followed by nothing, K, M, G, T depending on whether number is < 1000, < 10^6, < 10^9, or < 10^(12)
- crawlLog() : mixed
- Logs a message to a logfile or the screen. The super-global field $_SERVER['LOG_TO_FILES'] determines if this will log to a file. If not, then in cli mode, will log to stdout, otherwise it will use error_log. When logging to file $_SERVER["NO_ROTATE_LOGS"] controls whether or not there will be a log file rotation. The first call to this method is typically used to set up a process to check for liveness. For example a call: crawlLog("\n\nInitialize logger..", $this->process_name, true); says $this->process_name should be checked for liveness as part of any subsequent logging activity such as a call crawlLog("Another Message"); (note subsequent call don't need to specify the process name).
- makeTimestamp() : string
- Used to make a log file entry time string of format: entry number, time in r format.
- crawlTimeoutLog() : bool
- Writes a log message $msg if more than LOG_TIMEOUT time has passed since the last time crawlTimeoutLog was called. Useful in loops to write a message as progress is made through the loop (but not on every iteration, but say every 30 seconds).
- crawlHash() : string
- Computes an 8 byte hash of a string for use in storing documents.
- crawlHashWord() : string
- Used to create a 20 byte hash of a string (typically a word or phrase with a wikipedia page). Format is 8 byte crawlHash of term (md5 of term two halves XOR'd), followed by a \x00, followed by the first 11 characters from the term. If there are not enough char's to make 20 bytes, then the string is padded with \x00s to 20bytes.
- canonicalTerm() : string
- Take a $term that might have come from adocuments and converts it to a string of 16 bytes which is either the original term padded by underscores or the first seven chars of the term followed by an underscore followed by the base64 encoding of the first 6 chars of its md5 hash.
- compareWordHashes() : int
- Used to compare to ids for index dictionary lookup. ids are a 8 byte crawlHash together with 12 byte non-hash suffix.
- base64Hash() : string
- Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs
- unbase64Hash() : string
- Decodes a crawl hash number from base64 to raw ASCII
- webencode() : string
- Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)
- webdecode() : string
- Decodes a string encoded by webencode
- crawlCrypt() : string
- The crawlHash function is used to encrypt passwords stored in the database.
- partitionByHash() : array<string|int, mixed>
- Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling
- calculatePartition() : int
- Used by a controller to say which queue_server should receive a given input
- changeInMicrotime() : float
- Measures the change in time in seconds between two timestamps to microsecond precision
- microTimestamp() : string
- Timestamp of current epoch with microsecond precision useful for situations where time() might cause too many collisions (account creation, etc)
- checkTimeInterval() : int
- Checks that a timestamp is within the time interval given by a start time (HH:mm) and a duration
- convertPixels() : int
- Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.
- countFiles() : int
- Returns the number of files in a folder
- makePath() : bool
- Creates folders along a filesystem path if they don't exist
- deleteFileOrDir() : mixed
- This is a callback function used in the process of recursively deleting a directory
- setWorldPermissions() : mixed
- This is a callback function used in the process of recursively chmoding to 777 all files in a folder
- fileInfo() : an
- This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directory
- orderCallback() : int
- Callback function used to sort documents by a field
- stringOrderCallback() : int
- Callback function used to sort documents by a field where field is assume to be a string
- stringROrderCallback() : int
- Callback function used to sort documents by a field where field is assume to be a string
- rorderCallback() : int
- Callback function used to sort documents by a field in reverse order
- lessThan() : int
- Callback to check if $a is less than $b
- greaterThan() : int
- Callback to check if $a is greater than $b
- e() : mixed
- shorthand for echo
- remoteAddress() : mixed
- Compute the real remote address of the incoming connection including forwarding
- readInput() : string
- Used to read a line of input from the command-line
- readPassword() : string
- Used to read a line of input from the command-line (on unix machines without echoing it)
- readMessage() : string
- Used to read a several lines from the terminal up until a last line consisting of just a "."
- mimeType() : string
- Returns the mime type of the provided file name if it can be determined.
- generalIsA() : bool
- Checks if class_1 is the same as class_2 or has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.
- stripAttributes() : string
- Given the contents of a start XML/HMTL tag strips out all the attributes non listed in $safe_attribute_list
- parseCsv() : array<string|int, mixed>
- Used to parse into a two dimensional array a string that contains CSV data.
- arraytoCsv() : string
- Converts an array of values to a comma separated value formatted string.
- diff() : string
- Computes a Unix-style diff of two strings. That is it only outputs lines which disagree between the two strings. It outputs +line if a line occurs in the second but not first string and -line if a line occurs in the first string but not the second.
- computeLCS() : mixed
- Computes the longest common subsequence of two arrays
- extractLCSFromTable() : mixed
- Extracts from a table of longest common sequence moves (probably calculated by @see computeLCS) and a starting coordinate $i, $j in that table, a longest common subsequence
- tail() : array<string|int, mixed>
- Returns an array of the last $num_lines many lines our of a file
- lineFilter() : array<string|int, mixed>
- Given an array of lines returns a subarray of those lines containing the filter string or filter array
- logLineTimestamp() : int
- Tries to extract a timestamp from a line which is presumed to come from a Yioop log file
- isPositiveInteger() : bool
- Returns whether an input can be parsed to a positive integer
- measureCall() : mixed
- Used to measure the memory footprint in bytes and time spent calling a method of an object. It also records number of time the method has been called.
- measureObject() : mixed
- Used to measure the memory footprint of an object in Yioop and save it to a statistics file No recording is done until an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to.
- measureObjectCall() : mixed
- General method called by for @see measureCall and @see measureObject Used to measure the memory footprint in bytes of an object or memory and time spent calling a method of an object. It also records number of time the method has been called. When used to call a method before initialization, just calls the method without any recording or timing. To initialize, an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to should be done.
- variableClone() : mixed
- Makes a deep copy of a variable regardless of its type
- garbageCollect() : int
- Runs various system garbage collection functions and returns number of bytes freed.
- utf8SafeSaveHtml() : string
- The dom method saveHTML has a tendency to replace UTF-8, non-ascii characters with html entities. This is supposed to save avoiding the replacement.
- utf8WordWrap() : string
- A UTF-8 safe version of PHP's wordwrap function that wraps a string to a given number of characters
Functions
addRegexDelimiters()
Adds delimiters to a regex that may or may not have them
addRegexDelimiters(string $expression) : string
Parameters
- $expression : string
-
a regex
Return values
string —rgex with delimiters if not there
preg_search()
search for a pcre pattern in a subject from a given offset, return position of first match if found -1 otherwise.
preg_search(string $pattern, string $subject, int $offset[, bool $return_match = false ]) : mixed
Parameters
- $pattern : string
-
a Perl compatible regular expression
- $subject : string
-
to search for pattern in
- $offset : int
-
character offset into $subject to begin searching from
- $return_match : bool = false
-
whether to return as well what the match was for the pattern
Return values
mixed —if $return_match is false then the integer position of first match, otherwise, it returns the ordered pair [$pos, $match].
preg_offset_replace()
Replaces a pcre pattern with a replacement in $subject starting from some offset.
preg_offset_replace(string $pattern, string $replacement, string $subject, int $offset) : string
Parameters
- $pattern : string
-
a Perl compatible regular expression
- $replacement : string
-
what to replace the pattern with
- $subject : string
-
to search for pattern in
- $offset : int
-
character offset into $subject to begin searching from
Return values
string —result of the replacements
parse_ini_with_fallback()
Yioop replacement for parse_ini_file($name, true) in case parse_ini_file is on the disable_functions list. Name has underscores to match original function. This function checks if parse_ini_file is disabled on not. If not, it just calls parse_ini_file; otherwise, it simulates it enough so that configure.ini files used for string translations can be read.
parse_ini_with_fallback(string $file) : array<string|int, mixed>
Parameters
- $file : string
-
filename of ini data to parse into an array
Return values
array<string|int, mixed> —data parse from file
getIniAssignMatch()
Auxiliary function called from parse_ini_with_fallback to extract from the $matches array produced by the former function's preg_match what kind of assignment occurred in the ini file being parsed.
getIniAssignMatch(string $matches) : mixed
Parameters
- $matches : string
-
produced by a preg_match in parse_ini_with_fallback
Return values
mixed —value of ini file assignment
charCopy()
Copies from $source string beginning at position $start, $length many bytes to destination string
charCopy(string $source, string &$destination, int $start, int $length[, string $timeout_msg = "" ]) : mixed
Parameters
- $source : string
-
string to copy from
- $destination : string
-
string to copy to
- $start : int
-
starting offset
- $length : int
-
number of bytes to copy
- $timeout_msg : string = ""
-
message to print if taking more than 30 seconds
Return values
mixed —vByteEncode()
Encodes an integer using variable byte coding.
vByteEncode(int $pos_int) : string
Parameters
- $pos_int : int
-
integer to encode
Return values
string —a string of 1-5 chars depending on how bit $pos_int was
vByteDecode()
Decodes from a string using variable byte coding an integer.
vByteDecode(string $str, int &$offset) : int
Parameters
- $str : string
-
string to use for decoding
- $offset : int
-
byte offset into string when var int stored
Return values
int —the decoded integer
appendUnary()
Appends a number re-encoded in unary to the end of an input string starting at a given bit offset into the string. Here n in unary has bit representation n-1 0's followed by a 1.
appendUnary(int $number, mixed $input, mixed &$start_bit_offset[, mixed $just_bit_offset = false ]) : mixed
Parameters
- $number : int
-
number to append
- $input : mixed
- $start_bit_offset : mixed
- $just_bit_offset : mixed = false
Return values
mixed —either the resulting string or its length
decodeUnary()
Decodes a unary number froman input string at a given bit offset. Here n in unary has bit representation n-1 0's followed by a 1.
decodeUnary(string $input, int &$start_bit_offset) : int
Parameters
- $input : string
-
the string that we want to decode a unary number from
- $start_bit_offset : int
-
the starting bit offset in $input to start decoding from. After the call it will be the position after the decode
Return values
int —the decoded unary number
appendBits()
Appends $num_bits bits from the start of the binary rep of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. If $num_bits == -1, then appends all of $number.
appendBits(int $number, string $input, int &$start_bit_offset[, $num_bits = -1 ]) : string
Parameters
- $number : int
-
to append
- $input : string
-
the string to append to.
- $start_bit_offset : int
-
starting location to begin append from
- $num_bits : = -1
-
number of bits of $input to append.
Return values
string —resulting string
decodeBits()
Decode $num_bits many bits from the $input string beginning at offset $start_bit_offset. The result of this operation is up $start_bit_offset by number of bits that were able to be decoded.
decodeBits(string $input, int &$start_bit_offset, int $num_bits) : int
Parameters
- $input : string
-
string to decode bits from
- $start_bit_offset : int
-
bit offset to start decoding from in $input
- $num_bits : int
-
number of bits tot try to decode
Return values
int —the number decoded
appendGamma()
Appends gamma code of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. $start_bit_offset is updated to bit position after append.
appendGamma(int $number, string $input, int &$start_bit_offset) : string
Parameters
- $number : int
-
to append
- $input : string
-
the string to append to.
- $start_bit_offset : int
-
starting bit location to begin append from
Return values
string —resulting string
decodeGammaList()
Decodes up to $num_decode gamma encoded integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers.
decodeGammaList(string $input, int &$start_bit_offset, int $num_decode) : array<string|int, mixed>
Parameters
- $input : string
-
the string to decode from
- $start_bit_offset : int
-
starting bit location to decode from
- $num_decode : int
-
number of int's to decode
Return values
array<string|int, mixed> —decoded int's
appendRiceSequence()
Appends using a Rice coding a sequence of integers $int_sequence at offset $start_bit_offset to the string $output, overwriting any bits present at that location. $start_bit_offset is updated to bit position after append.
appendRiceSequence(array<string|int, mixed> $int_sequence, int $modulus, string $output, int &$start_bit_offset[, int $delta_start = -1 ]) : string
Encoding is done as a difference list. If $delta_start is set to a value other than >= then the first gap is assumed to be from int $delta_start
Parameters
- $int_sequence : array<string|int, mixed>
-
int's to append
- $modulus : int
-
i in the 2^i modulus to use for Rice code
- $output : string
-
the string to append to.
- $start_bit_offset : int
-
starting bit location to begin append from
- $delta_start : int = -1
-
if >= 0 previous int to use for difference list otherwise the first integer is encoded as itself rather than a difference
Return values
string —resulting string
decodeRiceSequence()
Decodes up to $num_decode rice encoded difference list of integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers. If $delta_start >= 0 then the first int is assumed to be the difference from $delta_start;
decodeRiceSequence(string $input, int &$start_bit_offset, int $num_decode[, int $delta_start = -1 ]) : array<string|int, mixed>
Parameters
- $input : string
-
the string to decode from
- $start_bit_offset : int
-
starting bit location to decode from
- $num_decode : int
-
number of int's to decode
- $delta_start : int = -1
-
if >= 0 previous int to use for difference list otherwise the first integer is decoded as itself rather than a difference
Return values
array<string|int, mixed> —decoded int's
encodePositionList()
Encodes a list of integer positions of a term in a document. This is done as a gamma code of the first integer followed by the Rice coding of the remaining integers using a modulus based on the average gap between integers. If the number of positions is 1 or 2 then a gamma of each position only is used.
encodePositionList(array<string|int, mixed> $positions) : string
Parameters
- $positions : array<string|int, mixed>
-
integer term positions
Return values
string —encoded position list
decodePositionList()
Decodes up to $num_decode term in document position integers from string $input under the assumption $input is encoded as per
decodePositionList(string $input, int $num_decode) : array<string|int, mixed>
Parameters
- $input : string
-
string to decode from
- $num_decode : int
-
number of integer to decode
Tags
Return values
array<string|int, mixed> —decoded positions
encode255()
Recodes a string in a 1-1 fashion to a string not involving \xFF (255). I.e., it maps characters \xFE -> \xFE\FD and \xFF -> \xFE\FE
encode255(string $str) : string
Parameters
- $str : string
-
to be encoded
Return values
string —encoded string without \xFF
decode255()
Decodes a string in a 1-1 fashion from a string not involving \xFF (255). I.e., it maps characters \xFE\FE -> \xFF and \xFE\FD -> \xFF
decode255(string $str) : string
Parameters
- $str : string
-
to be frcoded
Return values
string —decoded string
encodeUnderscore()
Recodes a string in a 1-1 fashion to a string not involving underscore (_). I.e., it maps characters - -> -- and _ -> -=
encodeUnderscore(string $str) : string
Parameters
- $str : string
-
to be encoded
Return values
string —encoded string without _
decodeUnderscore()
Decodes a string in a 1-1 fashion from a string not involving underscore (_). I.e., it maps characters -= -> _ and -- -> -
decodeUnderscore(string $str) : string
Parameters
- $str : string
-
to be frcoded
Return values
string —decoded string
packEncode255()
Encodes a list of strings as their @see encode255 versions separated by \xFF's
packEncode255(array<string|int, mixed> $strs) : string
Parameters
- $strs : array<string|int, mixed>
-
strings to encode as a single string
Return values
string —encoded list
unpackDecode255()
Decodes a list of strings from a string that encoded as their @see encode255 of its elements separated by \xFF's
unpackDecode255(string $encoded_strs) : array<string|int, mixed>
Parameters
- $encoded_strs : string
-
string to decode into a list of strings
Return values
array<string|int, mixed> —decoded list
packPosting()
Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.
packPosting(int $doc_index, array<string|int, mixed> $position_list[, bool $delta = true ]) : string
Parameters
- $doc_index : int
-
index (i.e., a count of which document it is rather than a byte offset) of a document in the document string
- $position_list : array<string|int, mixed>
-
integer positions word occurred in that doc
- $delta : bool = true
-
if true then stores the position_list as a sequence of differences (a delta list)
Return values
string —a modified9 (our compression scheme) packed string containing this info.
unpackPosting()
Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurrences of a word in that document.
unpackPosting(string $posting, int &$offset[, bool $dedelta = true ]) : array<string|int, mixed>
Parameters
- $posting : string
-
a string containing a doc index position list pair coded encoded using modified9
- $offset : int
-
a offset into the string where the modified9 posting is encoded
- $dedelta : bool = true
-
if true then assumes the list is a sequence of differences (a delta list) and undoes the difference to get the original sequence
Return values
array<string|int, mixed> —consisting of integer doc_index and a subarray consisting of integer positions of word in doc.
addDocIndexPostings()
This method is used while appending one index shard to another.
addDocIndexPostings(string &$postings, int $add_offset) : string
Given a string of postings adds $add_offset add to each offset to the document map in each posting.
Parameters
- $postings : string
-
a string of index shard postings
- $add_offset : int
-
an fixed amount to add to each postings doc map offset
Return values
string —$new_postings where each doc offset has had $add_offset added to it
deltaList()
Computes the difference of a list of integers.
deltaList(array<string|int, mixed> $list) : array<string|int, mixed>
i.e., (a1, a2, a3, a4) becomes (a1, a2-a1, a3-a2, a4-a3)
Parameters
- $list : array<string|int, mixed>
-
a nondecreasing list of integers
Return values
array<string|int, mixed> —the corresponding list of differences of adjacent integers
deDeltaList()
Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function
deDeltaList(array<string|int, mixed> &$delta_list) : array<string|int, mixed>
Parameters
- $delta_list : array<string|int, mixed>
-
a list of nonegative integers
Tags
Return values
array<string|int, mixed> —a nondecreasing list of integers
encodeModified9()
Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string. NOTICE x>=1.
encodeModified9(array<string|int, mixed> $list) : string
The encoded string is a sequence of 4 byte words (packed int's). The high order 2 bits of a given word indicate whether or not to look at the next word. The codes are as follows: 11 start of encoded string, 10 continue four more bytes, 01 end of encoded, and 00 indicates whole sequence encoded in one word.
After the high order 2 bits, the next most significant bits indicate the format of the current word. There are nine possibilities: 00 - 1 28 bit number, 01 - 2 14 bit numbers, 10 - 3 9 bit numbers, 1100 - 4 6 bit numbers, 1101 - 5 5 bit numbers, 1110 6 4 bit numbers, 11110 - 7 3 bit numbers, 111110 - 12 2 bit numbers, 111111 - 24 1 bit numbers.
Parameters
- $list : array<string|int, mixed>
-
a list of positive integers satsfying above
Return values
string —encoded string
packListModified9()
Packs the contents of a single word of a sequence being encoded using Modified9.
packListModified9(int $continue_bits, int $cnt, array<string|int, mixed> $pack_list) : string
Parameters
- $continue_bits : int
-
the high order 2 bits of the word
- $cnt : int
-
the number of element that will be packed in this word
- $pack_list : array<string|int, mixed>
-
a list of positive integers to pack into word
Tags
Return values
string —encoded 4 byte string
nextPostString()
Returns the next complete posting string from $input_string being at offset.
nextPostString(string &$input_string, int &$offset) : string
Does not do any decoding.
Parameters
- $input_string : string
-
a string of postings
- $offset : int
-
an offset to this string which will be updated after call
Return values
string —undecoded posting
decodeModified9()
Decoded a sequence of positive integers from a string that has been encoded using Modified 9
decodeModified9(string $input_string, int &$offset) : array<string|int, mixed>
Parameters
- $input_string : string
-
string to decode from
- $offset : int
-
where to string in the string, after decode points to where one was after decoding.
Tags
Return values
array<string|int, mixed> —sequence of positive integers that were decoded
unpackListModified9()
Decode a single word with high two bits off according to modified 9
unpackListModified9(string $encoded_list) : array<string|int, mixed>
Parameters
- $encoded_list : string
-
four byte string to decode
Return values
array<string|int, mixed> —sequence of integers that results from the decoding.
docIndexModified9()
Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.
docIndexModified9(int $encoded_list) : int
Parameters
- $encoded_list : int
-
in the just described format
Return values
int —a doc index into an index shard document map.
unpackInt()
Unpacks an int from a 4 char string
unpackInt(string $str) : int
Parameters
- $str : string
-
where to extract int from
Return values
int —extracted integer
packInt()
Packs an int into a 4 char string
packInt(int $my_int) : string
Parameters
- $my_int : int
-
the integer to pack
Return values
string —the packed string
unpackFloat()
Unpacks a float from a 4 char string
unpackFloat(string $str) : float
Parameters
- $str : string
-
where to extract int from
Return values
float —extracted float
packFloat()
Packs an float into a four char string
packFloat(float $my_float) : string
Parameters
- $my_float : float
-
the float to pack
Return values
string —the packed string
renameSerializedObject()
Used to change the namespace of a serialized php object (assumes doesn't have nested subobjects)
renameSerializedObject(string $class_name, string $object_string) : string
Parameters
- $class_name : string
-
new fully qualified name with namespace
- $object_string : string
-
serialized object
Return values
string —serialized object with new name
getDomFromString()
Parses a provided string to make a DOM object. First tries to parse using XML and if this fails uses the more robust HTML Dom parser and manipulates the resulting DOM tree to make correspond to original tags for XML that isn't HTML
getDomFromString(string $to_parse) : DOMDocument
Parameters
- $to_parse : string
-
the string to parse a DOMDocument from
Return values
DOMDocument —computed based on the provided string
getTags()
Returns an array of DOMDocuments for the nodes that match an xpath query on $dom, a DOMDocument
getTags(DOMDocument $dom, string $query) : array<string|int, mixed>
Parameters
- $dom : DOMDocument
-
document to run xpath query on
- $query : string
-
xpath query to run
Return values
array<string|int, mixed> —of DOMDocuments one for each node matching the xpath query in the original DOMDocument
toHexString()
Converts a string to string where each char has been replaced by its hexadecimal equivalent
toHexString(string $str) : string
Parameters
- $str : string
-
what we want rewritten in hex
Return values
string —the hexified string
toIntString()
Converts a string to string where each char has been replaced by a Integer equivalent
toIntString(string $str) : string
Parameters
- $str : string
-
what we want rewritten in hex
Return values
string —the hexified string
toBinString()
Converts a string to string where each char has been replaced by its binary equivalent
toBinString(string $str) : string
Parameters
- $str : string
-
what we want rewritten in hex
Return values
string —the binary string
metricToInt()
Converts a string of the form some int followed by K, M, or G.
metricToInt(string $metric_num) : int
into its integer equivalent. For example 4K would become 4000, 16M would become 16000000, and 1G would become 1000000000 Note not using base 2 for K, M, G
Parameters
- $metric_num : string
-
metric number to convert
Return values
int —number the metric string corresponded to
intToMetric()
Converts a number to a string followed by nothing, K, M, G, T depending on whether number is < 1000, < 10^6, < 10^9, or < 10^(12)
intToMetric(int $num) : string
Parameters
- $num : int
-
number to convert
Return values
string —number the metric string corresponded to
crawlLog()
Logs a message to a logfile or the screen. The super-global field $_SERVER['LOG_TO_FILES'] determines if this will log to a file. If not, then in cli mode, will log to stdout, otherwise it will use error_log. When logging to file $_SERVER["NO_ROTATE_LOGS"] controls whether or not there will be a log file rotation. The first call to this method is typically used to set up a process to check for liveness. For example a call: crawlLog("\n\nInitialize logger..", $this->process_name, true); says $this->process_name should be checked for liveness as part of any subsequent logging activity such as a call crawlLog("Another Message"); (note subsequent call don't need to specify the process name).
crawlLog(string $msg[, string $lname = null ][, bool $check_process_handler = false ]) : mixed
Parameters
- $msg : string
-
message to log. If empty then no message written
- $lname : string = null
-
name of log file in the LOG_DIR directory, rotated logs will also use this as their basename followed by a number followed by gzipped (since they are gzipped (older versions of Yioop used bzip Some distros don't have bzip but do have gzip. Also gzip was being used elsewhere in Yioop, so to remove the dependency bzip was replaced )).
- $check_process_handler : bool = false
-
by default set to false. After the first time set to true, as long as in subsequent calls set to false, processHandler will be called to check how long the code has run since the last time processHandler called.
Return values
mixed —makeTimestamp()
Used to make a log file entry time string of format: entry number, time in r format.
makeTimestamp([int $time = -1 ]) : string
Parameters
- $time : int = -1
-
a unix timestamp
Return values
string —[line_count_in_log r_formatted_date]
crawlTimeoutLog()
Writes a log message $msg if more than LOG_TIMEOUT time has passed since the last time crawlTimeoutLog was called. Useful in loops to write a message as progress is made through the loop (but not on every iteration, but say every 30 seconds).
crawlTimeoutLog(mixed $msg) : bool
Parameters
- $msg : mixed
-
usually a string with what to be printed out after the timeout period. If $msg === true then clears the timeout cache
Return values
bool —whether a log message was written
crawlHash()
Computes an 8 byte hash of a string for use in storing documents.
crawlHash(string $string[, bool $raw = false ]) : string
An eight byte hash was chosen so that the odds of collision even for a few billion documents via the birthday problem are still reasonable. If the raw flag is set to false then an 11 byte base64 encoding of the 8 byte hash is returned. The hash is calculated as the xor of the two halves of the 16 byte md5 of the string. (8 bytes takes less storage which is useful for keeping more doc info in memory)
Parameters
- $string : string
-
the string to hash
- $raw : bool = false
-
whether to leave raw or base 64 encode
Return values
string —the hash of $string
crawlHashWord()
Used to create a 20 byte hash of a string (typically a word or phrase with a wikipedia page). Format is 8 byte crawlHash of term (md5 of term two halves XOR'd), followed by a \x00, followed by the first 11 characters from the term. If there are not enough char's to make 20 bytes, then the string is padded with \x00s to 20bytes.
crawlHashWord(string $string[, bool $raw = false ]) : string
Parameters
- $string : string
-
word to hash
- $raw : bool = false
-
whether to base64Hash the result
Return values
string —first 8 bytes of md5 of $string concatenated with \x00 to indicate the hash is of a word not a phrase concatenated with the padded to 11 byte $meta_string.
canonicalTerm()
Take a $term that might have come from adocuments and converts it to a string of 16 bytes which is either the original term padded by underscores or the first seven chars of the term followed by an underscore followed by the base64 encoding of the first 6 chars of its md5 hash.
canonicalTerm(string $term) : string
Base64 used to make this all nice and printable.
Parameters
- $term : string
-
to made into a canonical form
Return values
string —canonicalize by apbove version of term.
compareWordHashes()
Used to compare to ids for index dictionary lookup. ids are a 8 byte crawlHash together with 12 byte non-hash suffix.
compareWordHashes(string $id1, string $id2) : int
Parameters
- $id1 : string
-
20 byte word id to compare
- $id2 : string
-
20 byte word id to compare
Return values
int —negative if $id1 smaller, positive if bigger, and 0 if same
base64Hash()
Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs
base64Hash(string $string) : string
Parameters
- $string : string
-
a hash to base64 encode
Return values
string —the encoded hash
unbase64Hash()
Decodes a crawl hash number from base64 to raw ASCII
unbase64Hash(string $base64) : string
Parameters
- $base64 : string
-
a hash to decode
Return values
string —the decoded hash
webencode()
Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)
webencode(string $str) : string
Parameters
- $str : string
-
string to encode
Return values
string —encoded string
webdecode()
Decodes a string encoded by webencode
webdecode(string $str) : string
Parameters
- $str : string
-
string to encode
Return values
string —encoded string
crawlCrypt()
The crawlHash function is used to encrypt passwords stored in the database.
crawlCrypt(string $string[, int $salt = null ]) : string
It tries to use the best version the Blowfish variant of php's crypt function available on the current system.
Parameters
- $string : string
-
the string to encrypt
- $salt : int = null
-
salt value to be used (needed to verify if a password is valid)
Return values
string —the crypted string where crypting is done using crawlHash
partitionByHash()
Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling
partitionByHash(array<string|int, mixed> $table, string $field, int $num_partition, int $instance[, object $callback = null ]) : array<string|int, mixed>
Parameters
- $table : array<string|int, mixed>
-
an array of rows of associative arrays which a queue_server might need to process
- $field : string
-
column of $table whose values should be used for partitioning
- $num_partition : int
-
number of queue_servers to choose between
- $instance : int
-
the id of the particular server we are interested in
- $callback : object = null
-
function or static method that might be applied to input before deciding the responsible queue_server. For example, if input was a url we might want to get the host before deciding on the queue_server
Return values
array<string|int, mixed> —the reduced table that the $instance queue_server is responsible for
calculatePartition()
Used by a controller to say which queue_server should receive a given input
calculatePartition(string $input, int $num_partition[, object $callback = null ]) : int
Parameters
- $input : string
-
can view as a key that might be processes by a queue_server. For example, in some cases input might be a url and we want to determine which queue_server should be responsible for queuing that url
- $num_partition : int
-
number of queue_servers to choose between
- $callback : object = null
-
function or static method that might be applied to input before deciding the responsible queue_server. For example, if the input was a url we might want to get the host before deciding on the queue_server
Return values
int —id of server responsible for input
changeInMicrotime()
Measures the change in time in seconds between two timestamps to microsecond precision
changeInMicrotime(string $start[, string $end = null ]) : float
Parameters
- $start : string
-
starting time with microseconds
- $end : string = null
-
ending time with microseconds, if null use current time
Return values
float —time difference in seconds
microTimestamp()
Timestamp of current epoch with microsecond precision useful for situations where time() might cause too many collisions (account creation, etc)
microTimestamp() : string
Return values
string —timestamp to microsecond of time in second since start of current epoch
checkTimeInterval()
Checks that a timestamp is within the time interval given by a start time (HH:mm) and a duration
checkTimeInterval(string $start_time, string $duration[, int $time = -1 ]) : int
Parameters
- $start_time : string
-
string of the form (HH:mm)
- $duration : string
-
string containing an int in seconds
- $time : int = -1
-
a Unix timestamp.
Return values
int —-1 if the time of day of $time is not within the given interval. Otherwise, the Unix timestamp at which the interval will be over for the same day as $time.
convertPixels()
Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.
convertPixels(string $value) : int
Parameters
- $value : string
-
a number followed by a legal CSS unit
Return values
int —a number in pixels
countFiles()
Returns the number of files in a folder
countFiles(string $folder) : int
Parameters
- $folder : string
-
path to folder to count
Return values
int —number of files
makePath()
Creates folders along a filesystem path if they don't exist
makePath(string $path) : bool
Parameters
- $path : string
-
a file system path
Return values
bool —success or failure
deleteFileOrDir()
This is a callback function used in the process of recursively deleting a directory
deleteFileOrDir(string $file_or_dir) : mixed
Parameters
- $file_or_dir : string
-
the filename or directory name to be deleted
Tags
Return values
mixed —setWorldPermissions()
This is a callback function used in the process of recursively chmoding to 777 all files in a folder
setWorldPermissions(string $file) : mixed
Parameters
- $file : string
-
the filename or directory name to be chmod
Tags
Return values
mixed —fileInfo()
This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directory
fileInfo(string $file) : an
Parameters
- $file : string
-
a name of a file in the file system
Return values
an —array whose single element contain an associative array with the size and modification time of the file
orderCallback()
Callback function used to sort documents by a field
orderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int
Should be initialized before using in usort with a call like: orderCallback($tmp, $tmp, "field_want");
Parameters
- $word_doc_a : string
-
doc id of first document to compare
- $word_doc_b : string
-
doc id of second document to compare
- $order_field : string = null
-
which field of these associative arrays to sort by
Return values
int —-1 if first doc bigger 1 otherwise
stringOrderCallback()
Callback function used to sort documents by a field where field is assume to be a string
stringOrderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int
Should be initialized before using in usort with a call like: stringOrderCallback($tmp, $tmp, "field_want");
Parameters
- $word_doc_a : string
-
doc id of first document to compare
- $word_doc_b : string
-
doc id of second document to compare
- $order_field : string = null
-
which field of these associative arrays to sort by
Return values
int —-1 if first doc smaller 1 otherwise
stringROrderCallback()
Callback function used to sort documents by a field where field is assume to be a string
stringROrderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int
Should be initialized before using in usort with a call like: stringROrderCallback($tmp, $tmp, "field_want");
Parameters
- $word_doc_a : string
-
doc id of first document to compare
- $word_doc_b : string
-
doc id of second document to compare
- $order_field : string = null
-
which field of these associative arrays to sort by
Return values
int —-1 if first doc bigger 1 otherwise
rorderCallback()
Callback function used to sort documents by a field in reverse order
rorderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int
Should be initialized before using in usort with a call like: rorderCallback($tmp, $tmp, "field_want");
Parameters
- $word_doc_a : string
-
doc id of first document to compare
- $word_doc_b : string
-
doc id of second document to compare
- $order_field : string = null
-
which field of these associative arrays to sort by
Return values
int —1 if first doc bigger -1 otherwise
lessThan()
Callback to check if $a is less than $b
lessThan(float $a, float $b) : int
Used to help sort document results returned in PhraseModel called in IndexArchiveBundle
Parameters
- $a : float
-
first value to compare
- $b : float
-
second value to compare
Tags
Return values
int —-1 if $a is less than $b; 1 otherwise
greaterThan()
Callback to check if $a is greater than $b
greaterThan(float $a, float $b) : int
Used to help sort document results returned in PhraseModel called in IndexArchiveBundle
Parameters
- $a : float
-
first value to compare
- $b : float
-
second value to compare
Tags
Return values
int —-1 if $a is greater than $b; 1 otherwise
e()
shorthand for echo
e(string $text) : mixed
Parameters
- $text : string
-
string to send to the current output
Return values
mixed —remoteAddress()
Compute the real remote address of the incoming connection including forwarding
remoteAddress() : mixed
Return values
mixed —readInput()
Used to read a line of input from the command-line
readInput() : string
Return values
string —from the command-line
readPassword()
Used to read a line of input from the command-line (on unix machines without echoing it)
readPassword() : string
Return values
string —from the command-line
readMessage()
Used to read a several lines from the terminal up until a last line consisting of just a "."
readMessage() : string
Return values
string —from the command-line
mimeType()
Returns the mime type of the provided file name if it can be determined.
mimeType(string $file_name[, bool $use_extension = false ]) : string
Parameters
- $file_name : string
-
(name of file including path to figure out mime type for)
- $use_extension : bool = false
-
whether to just try to guess from the file extension rather than looking at the file
Return values
string —mime type or unknown if can't be determined
generalIsA()
Checks if class_1 is the same as class_2 or has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.
generalIsA(mixed $class_1, mixed $class_2) : bool
Parameters
- $class_1 : mixed
-
object or string class name to see if in class2
- $class_2 : mixed
-
object or string class name to see if contains class1
Return values
bool —equal or contains class
stripAttributes()
Given the contents of a start XML/HMTL tag strips out all the attributes non listed in $safe_attribute_list
stripAttributes(string $start_tag_contents[, array<string|int, mixed> $safe_attribute_list = [] ]) : string
Parameters
- $start_tag_contents : string
-
the contents of an HTML/XML tag. I.e., if the tag was <tag stuff> then $start_tag_contents could be stuff
- $safe_attribute_list : array<string|int, mixed> = []
-
a list of attributes which should be kept
Return values
string —containing only safe attributes and their values
parseCsv()
Used to parse into a two dimensional array a string that contains CSV data.
parseCsv(string $csv_string) : array<string|int, mixed>
Parameters
- $csv_string : string
-
string with csv data
Return values
array<string|int, mixed> —two dimensional array of elements from csv
arraytoCsv()
Converts an array of values to a comma separated value formatted string.
arraytoCsv(array<string|int, mixed> $arr) : string
Parameters
- $arr : array<string|int, mixed>
-
values to convert
Return values
string —CSV string after conversion
diff()
Computes a Unix-style diff of two strings. That is it only outputs lines which disagree between the two strings. It outputs +line if a line occurs in the second but not first string and -line if a line occurs in the first string but not the second.
diff(string $data1, string $data2[, bool $html = false ]) : string
Parameters
- $data1 : string
-
first string to compare
- $data2 : string
-
second string to compare
- $html : bool = false
-
whether to output html highlighting
Return values
string —representing info about where $data1 and $data2 don't match
computeLCS()
Computes the longest common subsequence of two arrays
computeLCS(array<string|int, mixed> $lines1, array<string|int, mixed> $lines2, int $offset) : mixed
Parameters
- $lines1 : array<string|int, mixed>
-
an array of lines to compute LCS of
- $lines2 : array<string|int, mixed>
-
an array of lines to compute LCS of
- $offset : int
-
an offset to shift over array addresses in output by
Return values
mixed —extractLCSFromTable()
Extracts from a table of longest common sequence moves (probably calculated by @see computeLCS) and a starting coordinate $i, $j in that table, a longest common subsequence
extractLCSFromTable(array<string|int, mixed> $lcs_moves, array<string|int, mixed> $lines, int $i, int $j, int $offset, array<string|int, mixed> &$lcs) : mixed
Parameters
- $lcs_moves : array<string|int, mixed>
-
a table of move computed by computeLCS
- $lines : array<string|int, mixed>
-
from first of the two arrays computing LCS of
- $i : int
-
a line number in string 1
- $j : int
-
a line number in string 2
- $offset : int
-
a number to add to each line number output into $lcs. This is useful if we have trimmed off the initially common lines from our two strings we are trying to compute the LCS of
- $lcs : array<string|int, mixed>
-
an array of triples (index_string1, index_string2, line) the indexes indicate the line number in each string, line is the line in common the two strings
Return values
mixed —tail()
Returns an array of the last $num_lines many lines our of a file
tail(string $file_name, string $num_lines) : array<string|int, mixed>
Parameters
- $file_name : string
-
name of file to return lines from
- $num_lines : string
-
number of lines to retrieve
Return values
array<string|int, mixed> —retrieved lines
lineFilter()
Given an array of lines returns a subarray of those lines containing the filter string or filter array
lineFilter(string $lines, mixed $filters[, bool $case_insensitive = true ]) : array<string|int, mixed>
Parameters
- $lines : string
-
to search
- $filters : mixed
-
either string to filter lines with or an array of strings (any of which can be present to pass the filter)
- $case_insensitive : bool = true
-
whether search should be done case insensitively or not.
Return values
array<string|int, mixed> —lines containing the string
logLineTimestamp()
Tries to extract a timestamp from a line which is presumed to come from a Yioop log file
logLineTimestamp(string $line) : int
Parameters
- $line : string
-
to search
Return values
int —timestamp of that log entry
isPositiveInteger()
Returns whether an input can be parsed to a positive integer
isPositiveInteger(mixed $input) : bool
Parameters
- $input : mixed
Return values
bool —whether $input can be parsed to a positive integer.
measureCall()
Used to measure the memory footprint in bytes and time spent calling a method of an object. It also records number of time the method has been called.
measureCall(object $object, string $method[, mixed $arguments = [] ][, string $call_name = "" ]) : mixed
Just calls the method without any recording or timing until an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to.
Parameters
- $object : object
-
name of object whose method we want to call and measure
- $method : string
-
method we're calling
- $arguments : mixed = []
- $call_name : string = ""
-
name to use when outputting stats for this call, defaults to $method.
Return values
mixed —whatever method would normally returned when called as above
measureObject()
Used to measure the memory footprint of an object in Yioop and save it to a statistics file No recording is done until an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to.
measureObject(object $object[, string $save_file = "" ][, mixed $class_name = "" ]) : mixed
Parameters
- $object : object
-
name of object whose size we want to measure
- $save_file : string = ""
-
statistics file to write info to
- $class_name : mixed = ""
Return values
mixed —measureObjectCall()
General method called by for @see measureCall and @see measureObject Used to measure the memory footprint in bytes of an object or memory and time spent calling a method of an object. It also records number of time the method has been called. When used to call a method before initialization, just calls the method without any recording or timing. To initialize, an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to should be done.
measureObjectCall(object $object, string $method[, mixed $arguments = [] ][, string $call_name = "" ]) : mixed
Parameters
- $object : object
-
name of object whose method we want to call and measure
- $method : string
-
method we're calling
- $arguments : mixed = []
- $call_name : string = ""
-
name to use when outputting stats for this call, defaults to $method.
Return values
mixed —whatever method would normally returned when called as above
variableClone()
Makes a deep copy of a variable regardless of its type
variableClone(mixed $var) : mixed
Parameters
- $var : mixed
-
variable to deep copy
Return values
mixed —the deep copy
garbageCollect()
Runs various system garbage collection functions and returns number of bytes freed.
garbageCollect() : int
Return values
int —number of bytes freed
utf8SafeSaveHtml()
The dom method saveHTML has a tendency to replace UTF-8, non-ascii characters with html entities. This is supposed to save avoiding the replacement.
utf8SafeSaveHtml(DOMDocument $dom) : string
What it does is to first save the dom, then it replaces htmlentities of the form &single_char; or &#some_number; with the UTF-8 they correspond to. It leaves all other entities as they are
Parameters
- $dom : DOMDocument
Return values
string —output of saving html
utf8WordWrap()
A UTF-8 safe version of PHP's wordwrap function that wraps a string to a given number of characters
utf8WordWrap(string $string[, int $width = 75 ][, string $break = "
" ][, bool $cut = false ]) : string
Parameters
- $string : string
-
the input string
- $width : int = 75
-
the number of characters at which the string will be wrapped
- $break : string = " "
-
string used to break a line into two
- $cut : bool = false
-
whether to always force wrap at $width characters even if word hasn't ended
Return values
string —the given string wrapped at the specified length