Yioop_V9.5_Source_Code_Documentation

library

Namespaces

archive_bundle_iterators
classifiers
compressors
index_bundle_iterators
indexing_plugins
media_jobs
processors
summarizers

Interfaces, Classes, Traits and Enums

CrawlConstants
Shared constants and enums used by components that are involved in the crawling process
MediaConstants
Shared constants and enums used by components that are involved in the media related operations
Notifier
A Notifier is an object which will be notified by a priority queue when the index in the queue viewed as array of some data item has been changed.
AnalyticsManager
Used to set and get SQL query and search query timing statistic between models and index_bundle_iterators
BloomFilterBundle
A BloomFilterBundle is a directory of BloomFilterFile.
BloomFilterFile
Code used to manage a bloom filter in-memory and in file.
BPlusTree
This class implements the B+-tree structure over existing file system
BZip2BlockIterator
This class is used to allow one to iterate through a Bzip2 file.
ComputerVision
Class used to encapsulate various methods related to computer vision that might be useful for indexing documents. These include recognizing text in images
ContextTagger
Abstract, base context tagger class.
CrawlDaemon
Used to run scripts as a daemon on *nix systems
CrawlQueueBundle
Encapsulates the data structures needed to have a queue of to crawl urls
DoubleIndexBundle
A DoubleIndexBundle encapsulates and provided methods for two IndexDocumentBundle used to store a repeating crawl. One one thse bundles is used to handle current search queries, while the other is used to store an ongoing crawl, once the crawl time has been reach the roles of the two bundles are swapped
FeedArchiveBundle
Subclass of IndexArchiveBundle with bloom filters to make it easy to check if a news feed item has been added to the bundle already before adding it
FeedDocumentBundle
Subclass of IndexDocumentBundle with bloom filters to make it easy to check if a news feed item has been added to the bundle already before adding it
FetchGitRepositoryUrls
Library of functions used to fetch Git internal urls
FetchUrl
Code used to manage HTTP or Gopher requests from one or more URLS
FileCache
Library of functions used to implement a simple file cache
HashTable
Code used to manage a memory efficient hash table Weights for the queue must be flaots
IndexArchiveBundle
Encapsulates a set of web page summaries and an inverted word-index of terms from these summaries which allow one to search for summaries containing a particular word.
IndexDictionary
Data structure used to store for entries of the form: word id, index shard generation, posting list offset, and length of posting list. It has entries for all words stored in a given IndexArchiveBundle. There might be multiple entries for a given word_id if it occurs in more than one index shard in the given IndexArchiveBundle.
IndexDocumentBundle
Encapsulates a set of web page documents and an inverted word-index of terms from these documents which allow one to search for documents containing a particular word.
IndexManager
Class used to manage open IndexArchiveBundle's while performing a query. Ensures an easy place to obtain references to these bundles and ensures only one object per bundle is instantiated in a Singleton-esque way.
IndexShard
Data structure used to store one generation worth of the word document index (inverted index). This data structure consists of three main components a word entries, word_doc entries, and document entries.
JavascriptUnitTest
Super class of all the test classes testing Javascript functions.
LinearAlgebra
Class useful for handling linear algebra operations on associative array with key => value pairs where the value is a number.
LinearHashTable
This class implements a linear hash table for storing records that use PackedTableTools for their format
MailServer
A small class for communicating with an SMTP server. Used to avoid configuration issues that might be needed with PHP's built-in mail() function. Here is an example of how one might use this class:
NamedEntityContextTagger
Machine learning based named entity recognizer.
NWordGrams
Library of functions used to create and extract n word grams
PackedTableTools
A collection of methods to encode and decode records according to a signature.
PageRuleParser
Has methods to parse user-defined page rules to apply documents to be indexed.
PartialZipArchive
Used to extract files from an initial segment or a fragment of a ZIP Archive.
PartitionDocumentBundle
A partition document bundle is a collection of partition each of which in turn can hold a concatenated sequence of compressed documents and which are managed together. It is a successor format to the earlier WebArchiveBundle of Yioop. The partition document bundle stores individual records using a record format defined via the PackedTableTools class.
PartOfSpeechContextTagger
Machine learning based Part of Speech tagger.
PersistentStructure
A PersistentStructure is a data structure which every so many operations will be saved to secondary storage (such as disk).
PhraseParser
Library of functions used to manipulate words and phrases
PriorityQueue
Code used to manage a memory efficient priority queue.
ScraperManager
Class used by html processors to detect if a page matches a particular signature such as that of a content management system, and also to provide scraping mechanisms for the content of such a page
StochasticTermSegmenter
Class for segmenting terms using Stochastic Finite State Word Segmentation
StringArray
Memory efficient implementation of persistent arrays
SuffixTree
Data structure used to maintain a suffix tree for a passage of words.
Trie
Implements a trie data structure which can be used to store terms read from a dictionary in a succinct way
UnitTest
Base class for all the SeekQuarry/Yioop engine Unit tests
UrlParser
Library of functions used to manipulate and to extract components from urls
Mod9Constants
Mini-class (so not own file) used to hold encode decode info related to Mod9 encoding (as variant of Simplified-9 specify to Yioop).
VersionManager
VersionManager can be used to create and manage versions of files in a folder so that a user can revert the files to any version desired back to the time the folder under manager was first managed. It is used by Yioop's Wiki system to handle versions of image and other media resources for a Wiki page.
WebArchive
Code used to manage web archive files
WebArchiveBundle
A web archive bundle is a collection of web archives which are managed together.It is useful to split data across several archive files rather than just store it in one, for both read efficiency and to keep filesizes from getting too big. In some places we are using 4 byte int's to store file offsets which restricts the size of the files we can use for wbe archives.
WebSite
A single file, low dependency, pure PHP web server and web routing engine class.
WebException
Exception generated when a running WebSite script calls webExit()
WikiParser
Class with methods to parse mediawiki documents, both within Yioop, and when Yioop indexes mediawiki dumps as from Wikipedia.
LRUCache
Implements a least recently used cache

Table of Contents

main()  : mixed
Command-line shell for testing the class
localesWithStopwordsList()  : array<string|int, mixed>
Returns an array of locales that have a stop words list and a stop words remover method
localeTagToIso639_2Tag()  : string
Converts a $locale_tag (major-minor) to an Iso 632-2 language name
guessLocale()  : string
Attempts to guess the user's locale based on the request, session, and user-agent data
guessLocaleFromString()  : string
Attempts to guess the user's locale based on a string sample
checkQuery()  : string
Tries to find whether query belongs to a programming language
guessLangEncoding()  : string
Tries to guess at a language tag based on the name of a character encoding
guessEncodingHtmlXml()  : mixed
Tries to guess the encoding used for an Html document
convertUtf8IfNeeded()  : mixed
Converts page data in a site associative array to UTF-8 if it is not already in UTF-8
tl()  : string
Translate the supplied arguments into the current locale.
setLocaleObject()  : mixed
Sets the language to be used for locale settings
getLocaleTag()  : string
Gets the language tag (for instance, en_US for American English) of the locale that is currently being used. This function has the side effect of setting Yioop's current locale.
getLocaleDirection()  : string
Returns the current language directions.
getLocaleQueryStatistics()  : array<string|int, mixed>
Returns the query statistics info for the current llocalt.
getBlockProgression()  : string
Returns the current locales method of writing blocks (things like divs or paragraphs).A language like English puts blocks one after another from the top of the page to the bottom. Other languages like classical Chinese list them from right to left.
getWritingMode()  : string
Returns the writing mode of the current locale. This is a combination of the locale direction and the block progression. For instance, for English the writing mode is lr-tb (left-to-right top-to-bottom).
w1256ToUTF8()  : string
Convert the string $str encoded in Windows-1256 into UTF-8
utf8chr()  : string
Given a unicode codepoint convert it to UTF-8
formatDateByLocale()  : string
Function for formatting a date string based on the locale.
upgradeLocalesCheck()  : mixed
Checks to see if the locale data of Yioop! of a locale in the work dir is older than the currently running Yioop!
upgradeLocales()  : mixed
If the locale data of Yioop! in the work directory is older than the currently running Yioop! then this function is called to at least try to copy the new strings into the old profile.
upgradePublicHelpWiki()  : mixed
Used to force push the default Public and Wiki pages into the current database
upgradeDatabaseWorkDirectoryCheck()  : mixed
Checks to see if the database data or work_dir folder of Yioop! is from an older version of Yioop! than the currently running Yioop!
upgradeDatabaseWorkDirectory()  : mixed
If the database data of Yioop is older than the version of the currently running Yioop then this function is called to try upgrade the database to the new version
updateVersionNumber()  : mixed
Update the database version number to a new number
getWikiHelpPages()  : mixed
Reads the Help articles from default db and returns the array of pages.
addActivityAtId()  : mixed
Used to insert a new activity into the database at a given activity_id
updateTranslationForStringId()  : mixed
Adds or replaces a translation for a database message string for a given IANA locale tag.
addRegexDelimiters()  : string
Adds delimiters to a regex that may or may not have them
preg_search()  : mixed
search for a pcre pattern in a subject from a given offset, return position of first match if found -1 otherwise.
preg_offset_replace()  : string
Replaces a pcre pattern with a replacement in $subject starting from some offset.
parse_ini_with_fallback()  : array<string|int, mixed>
Yioop replacement for parse_ini_file($name, true) in case parse_ini_file is on the disable_functions list. Name has underscores to match original function. This function checks if parse_ini_file is disabled on not. If not, it just calls parse_ini_file; otherwise, it simulates it enough so that configure.ini files used for string translations can be read.
getIniAssignMatch()  : mixed
Auxiliary function called from parse_ini_with_fallback to extract from the $matches array produced by the former function's preg_match what kind of assignment occurred in the ini file being parsed.
charCopy()  : mixed
Copies from $source string beginning at position $start, $length many bytes to destination string
vByteEncode()  : string
Encodes an integer using variable byte coding.
vByteDecode()  : int
Decodes from a string using variable byte coding an integer.
appendUnary()  : mixed
Appends a number re-encoded in unary to the end of an input string starting at a given bit offset into the string. Here n in unary has bit representation n-1 0's followed by a 1.
decodeUnary()  : int
Decodes a unary number froman input string at a given bit offset. Here n in unary has bit representation n-1 0's followed by a 1.
appendBits()  : string
Appends $num_bits bits from the start of the binary rep of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. If $num_bits == -1, then appends all of $number.
decodeBits()  : int
Decode $num_bits many bits from the $input string beginning at offset $start_bit_offset. The result of this operation is up $start_bit_offset by number of bits that were able to be decoded.
appendGamma()  : string
Appends gamma code of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. $start_bit_offset is updated to bit position after append.
decodeGammaList()  : array<string|int, mixed>
Decodes up to $num_decode gamma encoded integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers.
appendRiceSequence()  : string
Appends using a Rice coding a sequence of integers $int_sequence at offset $start_bit_offset to the string $output, overwriting any bits present at that location. $start_bit_offset is updated to bit position after append.
decodeRiceSequence()  : array<string|int, mixed>
Decodes up to $num_decode rice encoded difference list of integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers. If $delta_start >= 0 then the first int is assumed to be the difference from $delta_start;
encodePositionList()  : string
Encodes a list of integer positions of a term in a document. This is done as a gamma code of the first integer followed by the Rice coding of the remaining integers using a modulus based on the average gap between integers. If the number of positions is 1 or 2 then a gamma of each position only is used.
decodePositionList()  : array<string|int, mixed>
Decodes up to $num_decode term in document position integers from string $input under the assumption $input is encoded as per
encode255()  : string
Recodes a string in a 1-1 fashion to a string not involving \xFF (255). I.e., it maps characters \xFE -> \xFE\FD and \xFF -> \xFE\FE
decode255()  : string
Decodes a string in a 1-1 fashion from a string not involving \xFF (255). I.e., it maps characters \xFE\FE -> \xFF and \xFE\FD -> \xFF
encodeUnderscore()  : string
Recodes a string in a 1-1 fashion to a string not involving underscore (_). I.e., it maps characters - -> -- and _ -> -=
decodeUnderscore()  : string
Decodes a string in a 1-1 fashion from a string not involving underscore (_). I.e., it maps characters -= -> _ and -- -> -
packEncode255()  : string
Encodes a list of strings as their @see encode255 versions separated by \xFF's
unpackDecode255()  : array<string|int, mixed>
Decodes a list of strings from a string that encoded as their @see encode255 of its elements separated by \xFF's
packPosting()  : string
Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.
unpackPosting()  : array<string|int, mixed>
Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurrences of a word in that document.
addDocIndexPostings()  : string
This method is used while appending one index shard to another.
deltaList()  : array<string|int, mixed>
Computes the difference of a list of integers.
deDeltaList()  : array<string|int, mixed>
Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function
encodeModified9()  : string
Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string. NOTICE x>=1.
packListModified9()  : string
Packs the contents of a single word of a sequence being encoded using Modified9.
nextPostString()  : string
Returns the next complete posting string from $input_string being at offset.
decodeModified9()  : array<string|int, mixed>
Decoded a sequence of positive integers from a string that has been encoded using Modified 9
unpackListModified9()  : array<string|int, mixed>
Decode a single word with high two bits off according to modified 9
docIndexModified9()  : int
Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.
unpackInt()  : int
Unpacks an int from a 4 char string
packInt()  : string
Packs an int into a 4 char string
unpackFloat()  : float
Unpacks a float from a 4 char string
packFloat()  : string
Packs an float into a four char string
renameSerializedObject()  : string
Used to change the namespace of a serialized php object (assumes doesn't have nested subobjects)
getDomFromString()  : DOMDocument
Parses a provided string to make a DOM object. First tries to parse using XML and if this fails uses the more robust HTML Dom parser and manipulates the resulting DOM tree to make correspond to original tags for XML that isn't HTML
getTags()  : array<string|int, mixed>
Returns an array of DOMDocuments for the nodes that match an xpath query on $dom, a DOMDocument
toHexString()  : string
Converts a string to string where each char has been replaced by its hexadecimal equivalent
toIntString()  : string
Converts a string to string where each char has been replaced by a Integer equivalent
toBinString()  : string
Converts a string to string where each char has been replaced by its binary equivalent
metricToInt()  : int
Converts a string of the form some int followed by K, M, or G.
intToMetric()  : string
Converts a number to a string followed by nothing, K, M, G, T depending on whether number is < 1000, < 10^6, < 10^9, or < 10^(12)
crawlLog()  : mixed
Logs a message to a logfile or the screen. The super-global field $_SERVER['LOG_TO_FILES'] determines if this will log to a file. If not, then in cli mode, will log to stdout, otherwise it will use error_log. When logging to file $_SERVER["NO_ROTATE_LOGS"] controls whether or not there will be a log file rotation. The first call to this method is typically used to set up a process to check for liveness. For example a call: crawlLog("\n\nInitialize logger..", $this->process_name, true); says $this->process_name should be checked for liveness as part of any subsequent logging activity such as a call crawlLog("Another Message"); (note subsequent call don't need to specify the process name).
makeTimestamp()  : string
Used to make a log file entry time string of format: entry number, time in r format.
crawlTimeoutLog()  : bool
Writes a log message $msg if more than LOG_TIMEOUT time has passed since the last time crawlTimeoutLog was called. Useful in loops to write a message as progress is made through the loop (but not on every iteration, but say every 30 seconds).
crawlHash()  : string
Computes an 8 byte hash of a string for use in storing documents.
crawlHashWord()  : string
Used to create a 20 byte hash of a string (typically a word or phrase with a wikipedia page). Format is 8 byte crawlHash of term (md5 of term two halves XOR'd), followed by a \x00, followed by the first 11 characters from the term. If there are not enough char's to make 20 bytes, then the string is padded with \x00s to 20bytes.
canonicalTerm()  : string
Take a $term that might have come from adocuments and converts it to a string of 16 bytes which is either the original term padded by underscores or the first seven chars of the term followed by an underscore followed by the base64 encoding of the first 6 chars of its md5 hash.
compareWordHashes()  : int
Used to compare to ids for index dictionary lookup. ids are a 8 byte crawlHash together with 12 byte non-hash suffix.
base64Hash()  : string
Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs
unbase64Hash()  : string
Decodes a crawl hash number from base64 to raw ASCII
webencode()  : string
Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)
webdecode()  : string
Decodes a string encoded by webencode
crawlCrypt()  : string
The crawlHash function is used to encrypt passwords stored in the database.
partitionByHash()  : array<string|int, mixed>
Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling
calculatePartition()  : int
Used by a controller to say which queue_server should receive a given input
changeInMicrotime()  : float
Measures the change in time in seconds between two timestamps to microsecond precision
microTimestamp()  : string
Timestamp of current epoch with microsecond precision useful for situations where time() might cause too many collisions (account creation, etc)
checkTimeInterval()  : int
Checks that a timestamp is within the time interval given by a start time (HH:mm) and a duration
convertPixels()  : int
Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.
countFiles()  : int
Returns the number of files in a folder
makePath()  : bool
Creates folders along a filesystem path if they don't exist
deleteFileOrDir()  : mixed
This is a callback function used in the process of recursively deleting a directory
setWorldPermissions()  : mixed
This is a callback function used in the process of recursively chmoding to 777 all files in a folder
fileInfo()  : an
This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directory
orderCallback()  : int
Callback function used to sort documents by a field
stringOrderCallback()  : int
Callback function used to sort documents by a field where field is assume to be a string
stringROrderCallback()  : int
Callback function used to sort documents by a field where field is assume to be a string
rorderCallback()  : int
Callback function used to sort documents by a field in reverse order
lessThan()  : int
Callback to check if $a is less than $b
greaterThan()  : int
Callback to check if $a is greater than $b
e()  : mixed
shorthand for echo
remoteAddress()  : mixed
Compute the real remote address of the incoming connection including forwarding
readInput()  : string
Used to read a line of input from the command-line
readPassword()  : string
Used to read a line of input from the command-line (on unix machines without echoing it)
readMessage()  : string
Used to read a several lines from the terminal up until a last line consisting of just a "."
mimeType()  : string
Returns the mime type of the provided file name if it can be determined.
generalIsA()  : bool
Checks if class_1 is the same as class_2 or has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.
stripAttributes()  : string
Given the contents of a start XML/HMTL tag strips out all the attributes non listed in $safe_attribute_list
parseCsv()  : array<string|int, mixed>
Used to parse into a two dimensional array a string that contains CSV data.
arraytoCsv()  : string
Converts an array of values to a comma separated value formatted string.
diff()  : string
Computes a Unix-style diff of two strings. That is it only outputs lines which disagree between the two strings. It outputs +line if a line occurs in the second but not first string and -line if a line occurs in the first string but not the second.
computeLCS()  : mixed
Computes the longest common subsequence of two arrays
extractLCSFromTable()  : mixed
Extracts from a table of longest common sequence moves (probably calculated by @see computeLCS) and a starting coordinate $i, $j in that table, a longest common subsequence
tail()  : array<string|int, mixed>
Returns an array of the last $num_lines many lines our of a file
lineFilter()  : array<string|int, mixed>
Given an array of lines returns a subarray of those lines containing the filter string or filter array
logLineTimestamp()  : int
Tries to extract a timestamp from a line which is presumed to come from a Yioop log file
isPositiveInteger()  : bool
Returns whether an input can be parsed to a positive integer
measureCall()  : mixed
Used to measure the memory footprint in bytes and time spent calling a method of an object. It also records number of time the method has been called.
measureObject()  : mixed
Used to measure the memory footprint of an object in Yioop and save it to a statistics file No recording is done until an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to.
measureObjectCall()  : mixed
General method called by for @see measureCall and @see measureObject Used to measure the memory footprint in bytes of an object or memory and time spent calling a method of an object. It also records number of time the method has been called. When used to call a method before initialization, just calls the method without any recording or timing. To initialize, an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to should be done.
variableClone()  : mixed
Makes a deep copy of a variable regardless of its type
garbageCollect()  : int
Runs various system garbage collection functions and returns number of bytes freed.
utf8SafeSaveHtml()  : string
The dom method saveHTML has a tendency to replace UTF-8, non-ascii characters with html entities. This is supposed to save avoiding the replacement.
utf8WordWrap()  : string
A UTF-8 safe version of PHP's wordwrap function that wraps a string to a given number of characters
upgradeDatabaseVersion1()  : mixed
Upgrades a Version 0 version of the Yioop database to a Version 1 version
upgradeDatabaseVersion2()  : mixed
Upgrades a Version 1 version of the Yioop database to a Version 2 version
upgradeDatabaseVersion3()  : mixed
Upgrades a Version 2 version of the Yioop database to a Version 3 version
upgradeDatabaseVersion4()  : mixed
Upgrades a Version 3 version of the Yioop database to a Version 4 version
upgradeDatabaseVersion5()  : mixed
Upgrades a Version 4 version of the Yioop database to a Version 5 version
upgradeDatabaseVersion6()  : mixed
Upgrades a Version 5 version of the Yioop database to a Version 6 version
upgradeDatabaseVersion7()  : mixed
Upgrades a Version 6 version of the Yioop database to a Version 7 version
upgradeDatabaseVersion8()  : mixed
Upgrades a Version 7 version of the Yioop database to a Version 8 version
upgradeDatabaseVersion9()  : mixed
Upgrades a Version 8 version of the Yioop database to a Version 9 version
upgradeDatabaseVersion10()  : mixed
Upgrades a Version 9 version of the Yioop database to a Version 10 version
upgradeDatabaseVersion11()  : mixed
Upgrades a Version 10 version of the Yioop database to a Version 11 version
upgradeDatabaseVersion12()  : mixed
Upgrades a Version 11 version of the Yioop database to a Version 12 version
upgradeDatabaseVersion13()  : mixed
Upgrades a Version 12 version of the Yioop database to a Version 13 version
upgradeDatabaseVersion14()  : mixed
Upgrades a Version 13 version of the Yioop database to a Version 14 version
upgradeDatabaseVersion15()  : mixed
Upgrades a Version 14 version of the Yioop database to a Version 15 version
upgradeDatabaseVersion16()  : mixed
Upgrades a Version 15 version of the Yioop database to a Version 16 version
upgradeDatabaseVersion17()  : mixed
Upgrades a Version 16 version of the Yioop database to a Version 17 version
upgradeDatabaseVersion18()  : mixed
Upgrades a Version 17 version of the Yioop database to a Version 18 version
upgradeDatabaseVersion19()  : mixed
Upgrades a Version 18 version of the Yioop database to a Version 19 version This update has been superseded by the Version20 update and so its contents have been eliminated.
upgradeDatabaseVersion20()  : mixed
Upgrades a Version 19 version of the Yioop database to a Version 20 version This is a major upgrade as the user table have changed. This also acts as a cumulative since version 0.98. It involves a web form that has only been localized to English
upgradeDatabaseVersion21()  : mixed
Upgrades a Version 20 version of the Yioop database to a Version 21 version
upgradeDatabaseVersion22()  : mixed
Upgrades a Version 21 version of the Yioop database to a Version 22 version
upgradeDatabaseVersion23()  : mixed
Upgrades a Version 22 version of the Yioop database to a Version 23 version
upgradeDatabaseVersion24()  : mixed
Upgrades a Version 23 version of the Yioop database to a Version 24 version
upgradeDatabaseVersion25()  : mixed
Upgrades a Version 24 version of the Yioop database to a Version 25 version This version upgrade includes creation of Help group that holds help pages.
upgradeDatabaseVersion26()  : mixed
Upgrades a Version 25 version of the Yioop database to a Version 26 version This version upgrade includes updation fo the Help pages in the database to work with the changes to the way Hyperlinks are specified in wiki markup.
upgradeDatabaseVersion27()  : mixed
Upgrades a Version 26 version of the Yioop database to a Version 27 version
upgradeDatabaseVersion28()  : mixed
Upgrades a Version 27 version of the Yioop database to a Version 28 version
upgradeDatabaseVersion29()  : mixed
Upgrades a Version 28 version of the Yioop database to a Version 29 version
upgradeDatabaseVersion30()  : mixed
Upgrades a Version 29 version of the Yioop database to a Version 30 version
upgradeDatabaseVersion31()  : mixed
Upgrades a Version 30 version of the Yioop database to a Version 31 version
upgradeDatabaseVersion32()  : mixed
Upgrades a Version 31 version of the Yioop database to a Version 32 version
upgradeDatabaseVersion33()  : mixed
Upgrades a Version 32 version of the Yioop database to a Version 33 version
upgradeDatabaseVersion34()  : mixed
Upgrades a Version 33 version of the Yioop database to a Version 34 version
upgradeDatabaseVersion35()  : mixed
Upgrades a Version 34 version of the Yioop database to a Version 35 version
upgradeDatabaseVersion36()  : mixed
Upgrades a Version 35 version of the Yioop database to a Version 36 version
upgradeDatabaseVersion37()  : mixed
Upgrades a Version 36 version of the Yioop database to a Version 37 version
upgradeDatabaseVersion38()  : mixed
Upgrades a Version 37 version of the Yioop database to a Version 38 version
upgradeDatabaseVersion39()  : mixed
Upgrades a Version 38 version of the Yioop database to a Version 39 version
upgradeDatabaseVersion40()  : mixed
Upgrades a Version 39 version of the Yioop database to a Version 40 version
upgradeDatabaseVersion41()  : mixed
Upgrades a Version 40 version of the Yioop database to a Version 41 version
upgradeDatabaseVersion42()  : mixed
Upgrades a Version 41 version of the Yioop database to a Version 42 version
upgradeDatabaseVersion43()  : mixed
Upgrades a Version 42 version of the Yioop database to a Version 43 version
upgradeDatabaseVersion44()  : mixed
Upgrades a Version 43 version of the Yioop database to a Version 44 version
upgradeDatabaseVersion45()  : mixed
Upgrades a Version 44 version of the Yioop database to a Version 45 version
upgradeDatabaseVersion46()  : mixed
Upgrades a Version 45 version of the Yioop database to a Version 46 version
upgradeDatabaseVersion47()  : mixed
Upgrades a Version 46 version of the Yioop database to a Version 47 version
upgradeDatabaseVersion48()  : mixed
Upgrades a Version 47 version of the Yioop database to a Version 48 version
upgradeDatabaseVersion49()  : mixed
Upgrades a Version 48 version of the Yioop database to a Version 49 version
upgradeDatabaseVersion50()  : mixed
Upgrades a Version 49 version of the Yioop database to a Version 50 version
upgradeDatabaseVersion51()  : mixed
Upgrades a Version 50 version of the Yioop database to a Version 51 version
upgradeDatabaseVersion52()  : mixed
Upgrades a Version 51 version of the Yioop database to a Version 52 version
upgradeDatabaseVersion53()  : mixed
Upgrades a Version 52 version of the Yioop database to a Version 53 version
upgradeDatabaseVersion54()  : mixed
Upgrades a Version 53 version of the Yioop database to a Version 54 version
upgradeDatabaseVersion55()  : mixed
Upgrades a Version 54 version of the Yioop database to a Version 55 version
upgradeDatabaseVersion57()  : mixed
Upgrades a Version 56 version of the Yioop database to a Version 5 version
upgradeDatabaseVersion58()  : mixed
Upgrades a Version 57 version of the Yioop database to a Version 58 version
upgradeDatabaseVersion59()  : mixed
Upgrades a Version 58 version of the Yioop database to a Version 59 version
upgradeDatabaseVersion60()  : mixed
Upgrades a Version 59 version of the Yioop database to a Version 60 version
upgradeDatabaseVersion61()  : mixed
Upgrades a Version 60 version of the Yioop database to a Version 61 version
upgradeDatabaseVersion62()  : mixed
Upgrades a Version 61 version of the Yioop database to a Version 62 version
upgradeDatabaseVersion64()  : mixed
Upgrades a Version 63 version of the Yioop database to a Version 64 version
upgradeDatabaseVersion65()  : mixed
Upgrades a Version 64 version of the Yioop database to a Version 65 version
upgradeDatabaseVersion66()  : mixed
Upgrades a Version 65 version of the Yioop database to a Version 66 version
upgradeDatabaseVersion67()  : mixed
Upgrades a Version 66 version of the Yioop database to a Version 67 version
upgradeDatabaseVersion68()  : mixed
Upgrades a Version 67 version of the Yioop database to a Version 68 version
upgradeDatabaseVersion69()  : mixed
Upgrades a Version 68 version of the Yioop database to a Version 69 version
upgradeDatabaseVersion70()  : mixed
Upgrades a Version 69 version of the Yioop database to a Version 70 version
upgradeDatabaseVersion71()  : mixed
Upgrades a Version 70 version of the Yioop database to a Version 71 version
upgradeDatabaseVersion72()  : mixed
Upgrades a Version 71 version of the Yioop database to a Version 72 version
upgradeDatabaseVersion73()  : mixed
Upgrades a Version 72 version of the Yioop database to a Version 73 version
upgradeDatabaseVersion74()  : mixed
Upgrades a Version 73 version of the Yioop database to a Version 74 version
upgradeDatabaseVersion75()  : mixed
Upgrades a Version 74 version of the Yioop database to a Version 75 version
upgradeDatabaseVersion76()  : mixed
Upgrades a Version 75 version of the Yioop database to a Version 76 version
upgradeDatabaseVersion77()  : mixed
Upgrades a Version 76 version of the Yioop database to a Version 77 version
upgradeDatabaseVersion78()  : mixed
Upgrades a Version 77 version of the Yioop database to a Version 78 version
upgradeDatabaseVersion79()  : mixed
Upgrades a Version 78 version of the Yioop database to a Version 79 version
upgradeDatabaseVersion80()  : mixed
Upgrades a Version 79 version of the Yioop database to a Version 80 version
upgradeDatabaseVersion81()  : mixed
Upgrades a Version 80 version of the Yioop database to a Version 81 version
webExit()  : mixed
Function to call instead of exit() to indicate that the script processing the current web page is done processing. Use this rather that exit(), as exit() will also terminate WebSite.
makeTableCallback()  : mixed
Callback used by a preg_replace_callback in nextPage to make a table
citeCallback()  : string
Used to convert {{cite }} to a numbered link to a citation
fixLinksCallback()  : string
Used to changes spaces to underscores in links generated from our earlier matching rules
base64EncodeCallback()  : string
Callback used to base64 encode the contents of nowiki tags so they won't be manipulated by wiki replacements.
spaceEncodeCallback()  : string
Callback used to encode the contents of pre tags so they won't accidentally get sub-pre tags because a bunch of leading lines have spaces
spanEncodeCallback()  : string
Callback used to encode the contents of span tags so they newlines within them don't accidentally get treated as new wiki paragraphs
base64DecodeCallback()  : string
Callback used to base64 decode the contents of previously base64 encoded (@see base64EncodeCallback) nowiki tags after all mediawiki substitutions have been done
spaceDecodeCallback()  : string
Cleans up pre tags after other wiki rules applied

Functions

main()

Command-line shell for testing the class

main() : mixed
Return values
mixed

localesWithStopwordsList()

Returns an array of locales that have a stop words list and a stop words remover method

localesWithStopwordsList() : array<string|int, mixed>
Return values
array<string|int, mixed>

list of locales that have a stopwords list;

localeTagToIso639_2Tag()

Converts a $locale_tag (major-minor) to an Iso 632-2 language name

localeTagToIso639_2Tag(string $locale_tag) : string
Parameters
$locale_tag : string

want to convert

Return values
string

corresponding Iso 632-2 language tag

guessLocale()

Attempts to guess the user's locale based on the request, session, and user-agent data

guessLocale() : string
Return values
string

IANA language tag of the guessed locale

guessLocaleFromString()

Attempts to guess the user's locale based on a string sample

guessLocaleFromString(string $phrase_string[, string $locale_tag = null ]) : string
Parameters
$phrase_string : string

used to make guess

$locale_tag : string = null

language tag to use if can't guess -- if not provided uses current locale's value

Return values
string

IANA language tag of the guessed locale

checkQuery()

Tries to find whether query belongs to a programming language

checkQuery(string $query) : string
Parameters
$query : string

query entered by user

Return values
string

$lang programming language for the the query provided

guessLangEncoding()

Tries to guess at a language tag based on the name of a character encoding

guessLangEncoding(string $encoding) : string
Parameters
$encoding : string

a character encoding name

Return values
string

guessed language tag

guessEncodingHtmlXml()

Tries to guess the encoding used for an Html document

guessEncodingHtmlXml(string $html[, string $return_loc_info = false ]) : mixed
Parameters
$html : string

a character encoding name

$return_loc_info : string = false

if meta http-equiv info was used to find the encoding, then if $return_loc_info is true, we return the location of charset substring. This allows converting to UTF-8 later so cached pages will display correctly and redirects without char encoding won't be given a different hash.

Return values
mixed

either string or array if string then guessed encoding, if array guessed encoding, start_pos of where charset info came from, length

convertUtf8IfNeeded()

Converts page data in a site associative array to UTF-8 if it is not already in UTF-8

convertUtf8IfNeeded(array<string|int, mixed> &$site, string $page_field, string $encoding_field[, function $log_function = "" ]) : mixed
Parameters
$site : array<string|int, mixed>

an associative of info about a web site

$page_field : string

the field in the associative array that contains the $site's web page as a string.

$encoding_field : string

the field in the associative array that contains the character encoding the page is currently in

$log_function : function = ""

a callback function used to write log messages with, if desired.

Return values
mixed

tl()

Translate the supplied arguments into the current locale.

tl() : string

This function takes a variable number of arguments. The first being an identifier to translate. Additional arguments are used to interpolate values in for %s's in the translation.

Return values
string

translated string

setLocaleObject()

Sets the language to be used for locale settings

setLocaleObject(string $locale_tag) : mixed
Parameters
$locale_tag : string

the tag of the language to use to determine locale settings

Return values
mixed

getLocaleTag()

Gets the language tag (for instance, en_US for American English) of the locale that is currently being used. This function has the side effect of setting Yioop's current locale.

getLocaleTag() : string
Return values
string

the tag of the language currently being used for locale settings

getLocaleDirection()

Returns the current language directions.

getLocaleDirection() : string
Return values
string

ltr or rtl depending on if the language is left-to-right or right-to-left

getLocaleQueryStatistics()

Returns the query statistics info for the current llocalt.

getLocaleQueryStatistics() : array<string|int, mixed>
Return values
array<string|int, mixed>

consisting of queries and elapses times for locale computations

getBlockProgression()

Returns the current locales method of writing blocks (things like divs or paragraphs).A language like English puts blocks one after another from the top of the page to the bottom. Other languages like classical Chinese list them from right to left.

getBlockProgression() : string
Return values
string

tb lr rl depending on the current locales block progression

getWritingMode()

Returns the writing mode of the current locale. This is a combination of the locale direction and the block progression. For instance, for English the writing mode is lr-tb (left-to-right top-to-bottom).

getWritingMode() : string
Return values
string

the locales writing mode

w1256ToUTF8()

Convert the string $str encoded in Windows-1256 into UTF-8

w1256ToUTF8(string $str) : string
Parameters
$str : string

Windows-1256 string to convert

Return values
string

the UTF-8 equivalent

utf8chr()

Given a unicode codepoint convert it to UTF-8

utf8chr(int $code) : string
Parameters
$code : int

the codepoint to convert

Return values
string

the corresponding UTF-8 string

formatDateByLocale()

Function for formatting a date string based on the locale.

formatDateByLocale( $timestamp,  $locale_tag) : string
Parameters
$timestamp :

is the crawl time

$locale_tag :

is the tag for locale

Return values
string

formatted date string

upgradeLocalesCheck()

Checks to see if the locale data of Yioop! of a locale in the work dir is older than the currently running Yioop!

upgradeLocalesCheck(string $locale_tag) : mixed
Parameters
$locale_tag : string

locale to check directory of

Return values
mixed

upgradeLocales()

If the locale data of Yioop! in the work directory is older than the currently running Yioop! then this function is called to at least try to copy the new strings into the old profile.

upgradeLocales() : mixed
Return values
mixed

upgradePublicHelpWiki()

Used to force push the default Public and Wiki pages into the current database

upgradePublicHelpWiki(resource &$db) : mixed
Parameters
$db : resource

datasource to use to upgrade

Return values
mixed

upgradeDatabaseWorkDirectoryCheck()

Checks to see if the database data or work_dir folder of Yioop! is from an older version of Yioop! than the currently running Yioop!

upgradeDatabaseWorkDirectoryCheck() : mixed
Return values
mixed

upgradeDatabaseWorkDirectory()

If the database data of Yioop is older than the version of the currently running Yioop then this function is called to try upgrade the database to the new version

upgradeDatabaseWorkDirectory() : mixed
Return values
mixed

updateVersionNumber()

Update the database version number to a new number

updateVersionNumber(object &$db, int $number) : mixed
Parameters
$db : object

datasource for Yioop database

$number : int

the new database number

Return values
mixed

getWikiHelpPages()

Reads the Help articles from default db and returns the array of pages.

getWikiHelpPages() : mixed
Return values
mixed

addActivityAtId()

Used to insert a new activity into the database at a given activity_id

addActivityAtId(resource &$db, string $string_id, string $method_name, int $activity_id) : mixed

Inserting at an ID rather than at the end is useful since activities are displayed in admin panel in order of increasing id.

Parameters
$db : resource

database handle where Yioop database stored

$string_id : string

message identifier to give translations for for activity

$method_name : string

admin_controller method to be called to perform this activity

$activity_id : int

the id location at which to create this activity activity at and below this location will be shifted down by 1.

Return values
mixed

updateTranslationForStringId()

Adds or replaces a translation for a database message string for a given IANA locale tag.

updateTranslationForStringId(resource &$db, string $string_id, string $locale_tag, string $translation) : mixed
Parameters
$db : resource

database handle where Yioop database stored

$string_id : string

message identifier to give translation for

$locale_tag : string

the IANA language tag to update the strings of

$translation : string

the translation for $string_id in the language $locale_tag

Return values
mixed

addRegexDelimiters()

Adds delimiters to a regex that may or may not have them

addRegexDelimiters(string $expression) : string
Parameters
$expression : string

a regex

Return values
string

rgex with delimiters if not there

search for a pcre pattern in a subject from a given offset, return position of first match if found -1 otherwise.

preg_search(string $pattern, string $subject, int $offset[, bool $return_match = false ]) : mixed
Parameters
$pattern : string

a Perl compatible regular expression

$subject : string

to search for pattern in

$offset : int

character offset into $subject to begin searching from

$return_match : bool = false

whether to return as well what the match was for the pattern

Return values
mixed

if $return_match is false then the integer position of first match, otherwise, it returns the ordered pair [$pos, $match].

preg_offset_replace()

Replaces a pcre pattern with a replacement in $subject starting from some offset.

preg_offset_replace(string $pattern, string $replacement, string $subject, int $offset) : string
Parameters
$pattern : string

a Perl compatible regular expression

$replacement : string

what to replace the pattern with

$subject : string

to search for pattern in

$offset : int

character offset into $subject to begin searching from

Return values
string

result of the replacements

parse_ini_with_fallback()

Yioop replacement for parse_ini_file($name, true) in case parse_ini_file is on the disable_functions list. Name has underscores to match original function. This function checks if parse_ini_file is disabled on not. If not, it just calls parse_ini_file; otherwise, it simulates it enough so that configure.ini files used for string translations can be read.

parse_ini_with_fallback(string $file) : array<string|int, mixed>
Parameters
$file : string

filename of ini data to parse into an array

Return values
array<string|int, mixed>

data parse from file

getIniAssignMatch()

Auxiliary function called from parse_ini_with_fallback to extract from the $matches array produced by the former function's preg_match what kind of assignment occurred in the ini file being parsed.

getIniAssignMatch(string $matches) : mixed
Parameters
$matches : string

produced by a preg_match in parse_ini_with_fallback

Return values
mixed

value of ini file assignment

charCopy()

Copies from $source string beginning at position $start, $length many bytes to destination string

charCopy(string $source, string &$destination, int $start, int $length[, string $timeout_msg = "" ]) : mixed
Parameters
$source : string

string to copy from

$destination : string

string to copy to

$start : int

starting offset

$length : int

number of bytes to copy

$timeout_msg : string = ""

message to print if taking more than 30 seconds

Return values
mixed

vByteEncode()

Encodes an integer using variable byte coding.

vByteEncode(int $pos_int) : string
Parameters
$pos_int : int

integer to encode

Return values
string

a string of 1-5 chars depending on how bit $pos_int was

vByteDecode()

Decodes from a string using variable byte coding an integer.

vByteDecode(string $str, int &$offset) : int
Parameters
$str : string

string to use for decoding

$offset : int

byte offset into string when var int stored

Return values
int

the decoded integer

appendUnary()

Appends a number re-encoded in unary to the end of an input string starting at a given bit offset into the string. Here n in unary has bit representation n-1 0's followed by a 1.

appendUnary(int $number, mixed $input, mixed &$start_bit_offset[, mixed $just_bit_offset = false ]) : mixed
Parameters
$number : int

number to append

$input : mixed
$start_bit_offset : mixed
$just_bit_offset : mixed = false
Return values
mixed

either the resulting string or its length

decodeUnary()

Decodes a unary number froman input string at a given bit offset. Here n in unary has bit representation n-1 0's followed by a 1.

decodeUnary(string $input, int &$start_bit_offset) : int
Parameters
$input : string

the string that we want to decode a unary number from

$start_bit_offset : int

the starting bit offset in $input to start decoding from. After the call it will be the position after the decode

Return values
int

the decoded unary number

appendBits()

Appends $num_bits bits from the start of the binary rep of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. If $num_bits == -1, then appends all of $number.

appendBits(int $number, string $input, int &$start_bit_offset[,  $num_bits = -1 ]) : string
Parameters
$number : int

to append

$input : string

the string to append to.

$start_bit_offset : int

starting location to begin append from

$num_bits : = -1

number of bits of $input to append.

Return values
string

resulting string

decodeBits()

Decode $num_bits many bits from the $input string beginning at offset $start_bit_offset. The result of this operation is up $start_bit_offset by number of bits that were able to be decoded.

decodeBits(string $input, int &$start_bit_offset, int $num_bits) : int
Parameters
$input : string

string to decode bits from

$start_bit_offset : int

bit offset to start decoding from in $input

$num_bits : int

number of bits tot try to decode

Return values
int

the number decoded

appendGamma()

Appends gamma code of $number beginning at offset $start_bit_offset of $input string overwriting any bits present. $start_bit_offset is updated to bit position after append.

appendGamma(int $number, string $input, int &$start_bit_offset) : string
Parameters
$number : int

to append

$input : string

the string to append to.

$start_bit_offset : int

starting bit location to begin append from

Return values
string

resulting string

decodeGammaList()

Decodes up to $num_decode gamma encoded integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers.

decodeGammaList(string $input, int &$start_bit_offset, int $num_decode) : array<string|int, mixed>
Parameters
$input : string

the string to decode from

$start_bit_offset : int

starting bit location to decode from

$num_decode : int

number of int's to decode

Return values
array<string|int, mixed>

decoded int's

appendRiceSequence()

Appends using a Rice coding a sequence of integers $int_sequence at offset $start_bit_offset to the string $output, overwriting any bits present at that location. $start_bit_offset is updated to bit position after append.

appendRiceSequence(array<string|int, mixed> $int_sequence, int $modulus, string $output, int &$start_bit_offset[, int $delta_start = -1 ]) : string

Encoding is done as a difference list. If $delta_start is set to a value other than >= then the first gap is assumed to be from int $delta_start

Parameters
$int_sequence : array<string|int, mixed>

int's to append

$modulus : int

i in the 2^i modulus to use for Rice code

$output : string

the string to append to.

$start_bit_offset : int

starting bit location to begin append from

$delta_start : int = -1

if >= 0 previous int to use for difference list otherwise the first integer is encoded as itself rather than a difference

Return values
string

resulting string

decodeRiceSequence()

Decodes up to $num_decode rice encoded difference list of integers beginning at $start_bit_offset. $start_bit_offset is updated to the bit position after the decoded integers. If $delta_start >= 0 then the first int is assumed to be the difference from $delta_start;

decodeRiceSequence(string $input, int &$start_bit_offset, int $num_decode[, int $delta_start = -1 ]) : array<string|int, mixed>
Parameters
$input : string

the string to decode from

$start_bit_offset : int

starting bit location to decode from

$num_decode : int

number of int's to decode

$delta_start : int = -1

if >= 0 previous int to use for difference list otherwise the first integer is decoded as itself rather than a difference

Return values
array<string|int, mixed>

decoded int's

encodePositionList()

Encodes a list of integer positions of a term in a document. This is done as a gamma code of the first integer followed by the Rice coding of the remaining integers using a modulus based on the average gap between integers. If the number of positions is 1 or 2 then a gamma of each position only is used.

encodePositionList(array<string|int, mixed> $positions) : string
Parameters
$positions : array<string|int, mixed>

integer term positions

Return values
string

encoded position list

decodePositionList()

Decodes up to $num_decode term in document position integers from string $input under the assumption $input is encoded as per

decodePositionList(string $input, int $num_decode) : array<string|int, mixed>
Parameters
$input : string

string to decode from

$num_decode : int

number of integer to decode

Tags
see
encodePositionList

.

Return values
array<string|int, mixed>

decoded positions

encode255()

Recodes a string in a 1-1 fashion to a string not involving \xFF (255). I.e., it maps characters \xFE -> \xFE\FD and \xFF -> \xFE\FE

encode255(string $str) : string
Parameters
$str : string

to be encoded

Return values
string

encoded string without \xFF

decode255()

Decodes a string in a 1-1 fashion from a string not involving \xFF (255). I.e., it maps characters \xFE\FE -> \xFF and \xFE\FD -> \xFF

decode255(string $str) : string
Parameters
$str : string

to be frcoded

Return values
string

decoded string

encodeUnderscore()

Recodes a string in a 1-1 fashion to a string not involving underscore (_). I.e., it maps characters - -> -- and _ -> -=

encodeUnderscore(string $str) : string
Parameters
$str : string

to be encoded

Return values
string

encoded string without _

decodeUnderscore()

Decodes a string in a 1-1 fashion from a string not involving underscore (_). I.e., it maps characters -= -> _ and -- -> -

decodeUnderscore(string $str) : string
Parameters
$str : string

to be frcoded

Return values
string

decoded string

packEncode255()

Encodes a list of strings as their @see encode255 versions separated by \xFF's

packEncode255(array<string|int, mixed> $strs) : string
Parameters
$strs : array<string|int, mixed>

strings to encode as a single string

Return values
string

encoded list

unpackDecode255()

Decodes a list of strings from a string that encoded as their @see encode255 of its elements separated by \xFF's

unpackDecode255(string $encoded_strs) : array<string|int, mixed>
Parameters
$encoded_strs : string

string to decode into a list of strings

Return values
array<string|int, mixed>

decoded list

packPosting()

Makes an packed integer string from a docindex and the number of occurrences of a word in the document with that docindex.

packPosting(int $doc_index, array<string|int, mixed> $position_list[, bool $delta = true ]) : string
Parameters
$doc_index : int

index (i.e., a count of which document it is rather than a byte offset) of a document in the document string

$position_list : array<string|int, mixed>

integer positions word occurred in that doc

$delta : bool = true

if true then stores the position_list as a sequence of differences (a delta list)

Return values
string

a modified9 (our compression scheme) packed string containing this info.

unpackPosting()

Given a packed integer string, uses the top three bytes to calculate a doc_index of a document in the shard, and uses the low order byte to computer a number of occurrences of a word in that document.

unpackPosting(string $posting, int &$offset[, bool $dedelta = true ]) : array<string|int, mixed>
Parameters
$posting : string

a string containing a doc index position list pair coded encoded using modified9

$offset : int

a offset into the string where the modified9 posting is encoded

$dedelta : bool = true

if true then assumes the list is a sequence of differences (a delta list) and undoes the difference to get the original sequence

Return values
array<string|int, mixed>

consisting of integer doc_index and a subarray consisting of integer positions of word in doc.

addDocIndexPostings()

This method is used while appending one index shard to another.

addDocIndexPostings(string &$postings, int $add_offset) : string

Given a string of postings adds $add_offset add to each offset to the document map in each posting.

Parameters
$postings : string

a string of index shard postings

$add_offset : int

an fixed amount to add to each postings doc map offset

Return values
string

$new_postings where each doc offset has had $add_offset added to it

deltaList()

Computes the difference of a list of integers.

deltaList(array<string|int, mixed> $list) : array<string|int, mixed>

i.e., (a1, a2, a3, a4) becomes (a1, a2-a1, a3-a2, a4-a3)

Parameters
$list : array<string|int, mixed>

a nondecreasing list of integers

Return values
array<string|int, mixed>

the corresponding list of differences of adjacent integers

deDeltaList()

Given an array of differences of integers reconstructs the original list. This computes the inverse of the deltaList function

deDeltaList(array<string|int, mixed> &$delta_list) : array<string|int, mixed>
Parameters
$delta_list : array<string|int, mixed>

a list of nonegative integers

Tags
see
deltaList
Return values
array<string|int, mixed>

a nondecreasing list of integers

encodeModified9()

Encodes a sequence of integers x, such that 1 <= x <= 2<<28-1 as a string. NOTICE x>=1.

encodeModified9(array<string|int, mixed> $list) : string

The encoded string is a sequence of 4 byte words (packed int's). The high order 2 bits of a given word indicate whether or not to look at the next word. The codes are as follows: 11 start of encoded string, 10 continue four more bytes, 01 end of encoded, and 00 indicates whole sequence encoded in one word.

After the high order 2 bits, the next most significant bits indicate the format of the current word. There are nine possibilities: 00 - 1 28 bit number, 01 - 2 14 bit numbers, 10 - 3 9 bit numbers, 1100 - 4 6 bit numbers, 1101 - 5 5 bit numbers, 1110 6 4 bit numbers, 11110 - 7 3 bit numbers, 111110 - 12 2 bit numbers, 111111 - 24 1 bit numbers.

Parameters
$list : array<string|int, mixed>

a list of positive integers satsfying above

Return values
string

encoded string

packListModified9()

Packs the contents of a single word of a sequence being encoded using Modified9.

packListModified9(int $continue_bits, int $cnt, array<string|int, mixed> $pack_list) : string
Parameters
$continue_bits : int

the high order 2 bits of the word

$cnt : int

the number of element that will be packed in this word

$pack_list : array<string|int, mixed>

a list of positive integers to pack into word

Tags
see
encodeModified9
Return values
string

encoded 4 byte string

nextPostString()

Returns the next complete posting string from $input_string being at offset.

nextPostString(string &$input_string, int &$offset) : string

Does not do any decoding.

Parameters
$input_string : string

a string of postings

$offset : int

an offset to this string which will be updated after call

Return values
string

undecoded posting

decodeModified9()

Decoded a sequence of positive integers from a string that has been encoded using Modified 9

decodeModified9(string $input_string, int &$offset) : array<string|int, mixed>
Parameters
$input_string : string

string to decode from

$offset : int

where to string in the string, after decode points to where one was after decoding.

Tags
see
encodeModified9
Return values
array<string|int, mixed>

sequence of positive integers that were decoded

unpackListModified9()

Decode a single word with high two bits off according to modified 9

unpackListModified9(string $encoded_list) : array<string|int, mixed>
Parameters
$encoded_list : string

four byte string to decode

Return values
array<string|int, mixed>

sequence of integers that results from the decoding.

docIndexModified9()

Given an int encoding encoding a doc_index followed by a position list using Modified 9, extracts just the doc_index.

docIndexModified9(int $encoded_list) : int
Parameters
$encoded_list : int

in the just described format

Return values
int

a doc index into an index shard document map.

unpackInt()

Unpacks an int from a 4 char string

unpackInt(string $str) : int
Parameters
$str : string

where to extract int from

Return values
int

extracted integer

packInt()

Packs an int into a 4 char string

packInt(int $my_int) : string
Parameters
$my_int : int

the integer to pack

Return values
string

the packed string

unpackFloat()

Unpacks a float from a 4 char string

unpackFloat(string $str) : float
Parameters
$str : string

where to extract int from

Return values
float

extracted float

packFloat()

Packs an float into a four char string

packFloat(float $my_float) : string
Parameters
$my_float : float

the float to pack

Return values
string

the packed string

renameSerializedObject()

Used to change the namespace of a serialized php object (assumes doesn't have nested subobjects)

renameSerializedObject(string $class_name, string $object_string) : string
Parameters
$class_name : string

new fully qualified name with namespace

$object_string : string

serialized object

Return values
string

serialized object with new name

getDomFromString()

Parses a provided string to make a DOM object. First tries to parse using XML and if this fails uses the more robust HTML Dom parser and manipulates the resulting DOM tree to make correspond to original tags for XML that isn't HTML

getDomFromString(string $to_parse) : DOMDocument
Parameters
$to_parse : string

the string to parse a DOMDocument from

Return values
DOMDocument

computed based on the provided string

getTags()

Returns an array of DOMDocuments for the nodes that match an xpath query on $dom, a DOMDocument

getTags(DOMDocument $dom, string $query) : array<string|int, mixed>
Parameters
$dom : DOMDocument

document to run xpath query on

$query : string

xpath query to run

Return values
array<string|int, mixed>

of DOMDocuments one for each node matching the xpath query in the original DOMDocument

toHexString()

Converts a string to string where each char has been replaced by its hexadecimal equivalent

toHexString(string $str) : string
Parameters
$str : string

what we want rewritten in hex

Return values
string

the hexified string

toIntString()

Converts a string to string where each char has been replaced by a Integer equivalent

toIntString(string $str) : string
Parameters
$str : string

what we want rewritten in hex

Return values
string

the hexified string

toBinString()

Converts a string to string where each char has been replaced by its binary equivalent

toBinString(string $str) : string
Parameters
$str : string

what we want rewritten in hex

Return values
string

the binary string

metricToInt()

Converts a string of the form some int followed by K, M, or G.

metricToInt(string $metric_num) : int

into its integer equivalent. For example 4K would become 4000, 16M would become 16000000, and 1G would become 1000000000 Note not using base 2 for K, M, G

Parameters
$metric_num : string

metric number to convert

Return values
int

number the metric string corresponded to

intToMetric()

Converts a number to a string followed by nothing, K, M, G, T depending on whether number is < 1000, < 10^6, < 10^9, or < 10^(12)

intToMetric(int $num) : string
Parameters
$num : int

number to convert

Return values
string

number the metric string corresponded to

crawlLog()

Logs a message to a logfile or the screen. The super-global field $_SERVER['LOG_TO_FILES'] determines if this will log to a file. If not, then in cli mode, will log to stdout, otherwise it will use error_log. When logging to file $_SERVER["NO_ROTATE_LOGS"] controls whether or not there will be a log file rotation. The first call to this method is typically used to set up a process to check for liveness. For example a call: crawlLog("\n\nInitialize logger..", $this->process_name, true); says $this->process_name should be checked for liveness as part of any subsequent logging activity such as a call crawlLog("Another Message"); (note subsequent call don't need to specify the process name).

crawlLog(string $msg[, string $lname = null ][, bool $check_process_handler = false ]) : mixed
Parameters
$msg : string

message to log. If empty then no message written

$lname : string = null

name of log file in the LOG_DIR directory, rotated logs will also use this as their basename followed by a number followed by gzipped (since they are gzipped (older versions of Yioop used bzip Some distros don't have bzip but do have gzip. Also gzip was being used elsewhere in Yioop, so to remove the dependency bzip was replaced )).

$check_process_handler : bool = false

by default set to false. After the first time set to true, as long as in subsequent calls set to false, processHandler will be called to check how long the code has run since the last time processHandler called.

Return values
mixed

makeTimestamp()

Used to make a log file entry time string of format: entry number, time in r format.

makeTimestamp([int $time = -1 ]) : string
Parameters
$time : int = -1

a unix timestamp

Return values
string

[line_count_in_log r_formatted_date]

crawlTimeoutLog()

Writes a log message $msg if more than LOG_TIMEOUT time has passed since the last time crawlTimeoutLog was called. Useful in loops to write a message as progress is made through the loop (but not on every iteration, but say every 30 seconds).

crawlTimeoutLog(mixed $msg) : bool
Parameters
$msg : mixed

usually a string with what to be printed out after the timeout period. If $msg === true then clears the timeout cache

Return values
bool

whether a log message was written

crawlHash()

Computes an 8 byte hash of a string for use in storing documents.

crawlHash(string $string[, bool $raw = false ]) : string

An eight byte hash was chosen so that the odds of collision even for a few billion documents via the birthday problem are still reasonable. If the raw flag is set to false then an 11 byte base64 encoding of the 8 byte hash is returned. The hash is calculated as the xor of the two halves of the 16 byte md5 of the string. (8 bytes takes less storage which is useful for keeping more doc info in memory)

Parameters
$string : string

the string to hash

$raw : bool = false

whether to leave raw or base 64 encode

Return values
string

the hash of $string

crawlHashWord()

Used to create a 20 byte hash of a string (typically a word or phrase with a wikipedia page). Format is 8 byte crawlHash of term (md5 of term two halves XOR'd), followed by a \x00, followed by the first 11 characters from the term. If there are not enough char's to make 20 bytes, then the string is padded with \x00s to 20bytes.

crawlHashWord(string $string[, bool $raw = false ]) : string
Parameters
$string : string

word to hash

$raw : bool = false

whether to base64Hash the result

Return values
string

first 8 bytes of md5 of $string concatenated with \x00 to indicate the hash is of a word not a phrase concatenated with the padded to 11 byte $meta_string.

canonicalTerm()

Take a $term that might have come from adocuments and converts it to a string of 16 bytes which is either the original term padded by underscores or the first seven chars of the term followed by an underscore followed by the base64 encoding of the first 6 chars of its md5 hash.

canonicalTerm(string $term) : string

Base64 used to make this all nice and printable.

Parameters
$term : string

to made into a canonical form

Return values
string

canonicalize by apbove version of term.

compareWordHashes()

Used to compare to ids for index dictionary lookup. ids are a 8 byte crawlHash together with 12 byte non-hash suffix.

compareWordHashes(string $id1, string $id2) : int
Parameters
$id1 : string

20 byte word id to compare

$id2 : string

20 byte word id to compare

Return values
int

negative if $id1 smaller, positive if bigger, and 0 if same

base64Hash()

Converts a crawl hash number to something closer to base64 coded but so doesn't get confused in urls or DBs

base64Hash(string $string) : string
Parameters
$string : string

a hash to base64 encode

Return values
string

the encoded hash

unbase64Hash()

Decodes a crawl hash number from base64 to raw ASCII

unbase64Hash(string $base64) : string
Parameters
$base64 : string

a hash to decode

Return values
string

the decoded hash

webencode()

Encodes a string in a format suitable for post data (mainly, base64, but str_replace data that might mess up post in result)

webencode(string $str) : string
Parameters
$str : string

string to encode

Return values
string

encoded string

webdecode()

Decodes a string encoded by webencode

webdecode(string $str) : string
Parameters
$str : string

string to encode

Return values
string

encoded string

crawlCrypt()

The crawlHash function is used to encrypt passwords stored in the database.

crawlCrypt(string $string[, int $salt = null ]) : string

It tries to use the best version the Blowfish variant of php's crypt function available on the current system.

Parameters
$string : string

the string to encrypt

$salt : int = null

salt value to be used (needed to verify if a password is valid)

Return values
string

the crypted string where crypting is done using crawlHash

partitionByHash()

Used by a controller to take a table and return those rows in the table that a given queue_server would be responsible for handling

partitionByHash(array<string|int, mixed> $table, string $field, int $num_partition, int $instance[, object $callback = null ]) : array<string|int, mixed>
Parameters
$table : array<string|int, mixed>

an array of rows of associative arrays which a queue_server might need to process

$field : string

column of $table whose values should be used for partitioning

$num_partition : int

number of queue_servers to choose between

$instance : int

the id of the particular server we are interested in

$callback : object = null

function or static method that might be applied to input before deciding the responsible queue_server. For example, if input was a url we might want to get the host before deciding on the queue_server

Return values
array<string|int, mixed>

the reduced table that the $instance queue_server is responsible for

calculatePartition()

Used by a controller to say which queue_server should receive a given input

calculatePartition(string $input, int $num_partition[, object $callback = null ]) : int
Parameters
$input : string

can view as a key that might be processes by a queue_server. For example, in some cases input might be a url and we want to determine which queue_server should be responsible for queuing that url

$num_partition : int

number of queue_servers to choose between

$callback : object = null

function or static method that might be applied to input before deciding the responsible queue_server. For example, if the input was a url we might want to get the host before deciding on the queue_server

Return values
int

id of server responsible for input

changeInMicrotime()

Measures the change in time in seconds between two timestamps to microsecond precision

changeInMicrotime(string $start[, string $end = null ]) : float
Parameters
$start : string

starting time with microseconds

$end : string = null

ending time with microseconds, if null use current time

Return values
float

time difference in seconds

microTimestamp()

Timestamp of current epoch with microsecond precision useful for situations where time() might cause too many collisions (account creation, etc)

microTimestamp() : string
Return values
string

timestamp to microsecond of time in second since start of current epoch

checkTimeInterval()

Checks that a timestamp is within the time interval given by a start time (HH:mm) and a duration

checkTimeInterval(string $start_time, string $duration[, int $time = -1 ]) : int
Parameters
$start_time : string

string of the form (HH:mm)

$duration : string

string containing an int in seconds

$time : int = -1

a Unix timestamp.

Return values
int

-1 if the time of day of $time is not within the given interval. Otherwise, the Unix timestamp at which the interval will be over for the same day as $time.

convertPixels()

Converts a CSS unit string into its equivalent in pixels. This is used by @see SvgProcessor.

convertPixels(string $value) : int
Parameters
$value : string

a number followed by a legal CSS unit

Return values
int

a number in pixels

countFiles()

Returns the number of files in a folder

countFiles(string $folder) : int
Parameters
$folder : string

path to folder to count

Return values
int

number of files

makePath()

Creates folders along a filesystem path if they don't exist

makePath(string $path) : bool
Parameters
$path : string

a file system path

Return values
bool

success or failure

deleteFileOrDir()

This is a callback function used in the process of recursively deleting a directory

deleteFileOrDir(string $file_or_dir) : mixed
Parameters
$file_or_dir : string

the filename or directory name to be deleted

Tags
see
DatasourceManager::unlinkRecursive()
Return values
mixed

setWorldPermissions()

This is a callback function used in the process of recursively chmoding to 777 all files in a folder

setWorldPermissions(string $file) : mixed
Parameters
$file : string

the filename or directory name to be chmod

Tags
see
DatasourceManager::setWorldPermissionsRecursive()
Return values
mixed

fileInfo()

This is a callback function used in the process of recursively calculating an array of file modification times and files sizes for a directory

fileInfo(string $file) : an
Parameters
$file : string

a name of a file in the file system

Return values
an

array whose single element contain an associative array with the size and modification time of the file

orderCallback()

Callback function used to sort documents by a field

orderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int

Should be initialized before using in usort with a call like: orderCallback($tmp, $tmp, "field_want");

Parameters
$word_doc_a : string

doc id of first document to compare

$word_doc_b : string

doc id of second document to compare

$order_field : string = null

which field of these associative arrays to sort by

Return values
int

-1 if first doc bigger 1 otherwise

stringOrderCallback()

Callback function used to sort documents by a field where field is assume to be a string

stringOrderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int

Should be initialized before using in usort with a call like: stringOrderCallback($tmp, $tmp, "field_want");

Parameters
$word_doc_a : string

doc id of first document to compare

$word_doc_b : string

doc id of second document to compare

$order_field : string = null

which field of these associative arrays to sort by

Return values
int

-1 if first doc smaller 1 otherwise

stringROrderCallback()

Callback function used to sort documents by a field where field is assume to be a string

stringROrderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int

Should be initialized before using in usort with a call like: stringROrderCallback($tmp, $tmp, "field_want");

Parameters
$word_doc_a : string

doc id of first document to compare

$word_doc_b : string

doc id of second document to compare

$order_field : string = null

which field of these associative arrays to sort by

Return values
int

-1 if first doc bigger 1 otherwise

rorderCallback()

Callback function used to sort documents by a field in reverse order

rorderCallback(string $word_doc_a, string $word_doc_b[, string $order_field = null ]) : int

Should be initialized before using in usort with a call like: rorderCallback($tmp, $tmp, "field_want");

Parameters
$word_doc_a : string

doc id of first document to compare

$word_doc_b : string

doc id of second document to compare

$order_field : string = null

which field of these associative arrays to sort by

Return values
int

1 if first doc bigger -1 otherwise

lessThan()

Callback to check if $a is less than $b

lessThan(float $a, float $b) : int

Used to help sort document results returned in PhraseModel called in IndexArchiveBundle

Parameters
$a : float

first value to compare

$b : float

second value to compare

Tags
see
IndexArchiveBundle::getSelectiveWords()
see
PhraseModel::getPhrasePageResults()
Return values
int

-1 if $a is less than $b; 1 otherwise

greaterThan()

Callback to check if $a is greater than $b

greaterThan(float $a, float $b) : int

Used to help sort document results returned in PhraseModel called in IndexArchiveBundle

Parameters
$a : float

first value to compare

$b : float

second value to compare

Tags
see
IndexArchiveBundle::getSelectiveWords()
see
PhraseModel::getTopPhrases()
Return values
int

-1 if $a is greater than $b; 1 otherwise

e()

shorthand for echo

e(string $text) : mixed
Parameters
$text : string

string to send to the current output

Return values
mixed

remoteAddress()

Compute the real remote address of the incoming connection including forwarding

remoteAddress() : mixed
Return values
mixed

readInput()

Used to read a line of input from the command-line

readInput() : string
Return values
string

from the command-line

readPassword()

Used to read a line of input from the command-line (on unix machines without echoing it)

readPassword() : string
Return values
string

from the command-line

readMessage()

Used to read a several lines from the terminal up until a last line consisting of just a "."

readMessage() : string
Return values
string

from the command-line

mimeType()

Returns the mime type of the provided file name if it can be determined.

mimeType(string $file_name[, bool $use_extension = false ]) : string
Parameters
$file_name : string

(name of file including path to figure out mime type for)

$use_extension : bool = false

whether to just try to guess from the file extension rather than looking at the file

Return values
string

mime type or unknown if can't be determined

generalIsA()

Checks if class_1 is the same as class_2 or has class_2 as a parent Behaves like 3 param version (last param true) of PHP is_a function that came into being with Version 5.3.9.

generalIsA(mixed $class_1, mixed $class_2) : bool
Parameters
$class_1 : mixed

object or string class name to see if in class2

$class_2 : mixed

object or string class name to see if contains class1

Return values
bool

equal or contains class

stripAttributes()

Given the contents of a start XML/HMTL tag strips out all the attributes non listed in $safe_attribute_list

stripAttributes(string $start_tag_contents[, array<string|int, mixed> $safe_attribute_list = [] ]) : string
Parameters
$start_tag_contents : string

the contents of an HTML/XML tag. I.e., if the tag was <tag stuff> then $start_tag_contents could be stuff

$safe_attribute_list : array<string|int, mixed> = []

a list of attributes which should be kept

Return values
string

containing only safe attributes and their values

parseCsv()

Used to parse into a two dimensional array a string that contains CSV data.

parseCsv(string $csv_string) : array<string|int, mixed>
Parameters
$csv_string : string

string with csv data

Return values
array<string|int, mixed>

two dimensional array of elements from csv

arraytoCsv()

Converts an array of values to a comma separated value formatted string.

arraytoCsv(array<string|int, mixed> $arr) : string
Parameters
$arr : array<string|int, mixed>

values to convert

Return values
string

CSV string after conversion

diff()

Computes a Unix-style diff of two strings. That is it only outputs lines which disagree between the two strings. It outputs +line if a line occurs in the second but not first string and -line if a line occurs in the first string but not the second.

diff(string $data1, string $data2[, bool $html = false ]) : string
Parameters
$data1 : string

first string to compare

$data2 : string

second string to compare

$html : bool = false

whether to output html highlighting

Return values
string

representing info about where $data1 and $data2 don't match

computeLCS()

Computes the longest common subsequence of two arrays

computeLCS(array<string|int, mixed> $lines1, array<string|int, mixed> $lines2, int $offset) : mixed
Parameters
$lines1 : array<string|int, mixed>

an array of lines to compute LCS of

$lines2 : array<string|int, mixed>

an array of lines to compute LCS of

$offset : int

an offset to shift over array addresses in output by

Return values
mixed

extractLCSFromTable()

Extracts from a table of longest common sequence moves (probably calculated by @see computeLCS) and a starting coordinate $i, $j in that table, a longest common subsequence

extractLCSFromTable(array<string|int, mixed> $lcs_moves, array<string|int, mixed> $lines, int $i, int $j, int $offset, array<string|int, mixed> &$lcs) : mixed
Parameters
$lcs_moves : array<string|int, mixed>

a table of move computed by computeLCS

$lines : array<string|int, mixed>

from first of the two arrays computing LCS of

$i : int

a line number in string 1

$j : int

a line number in string 2

$offset : int

a number to add to each line number output into $lcs. This is useful if we have trimmed off the initially common lines from our two strings we are trying to compute the LCS of

$lcs : array<string|int, mixed>

an array of triples (index_string1, index_string2, line) the indexes indicate the line number in each string, line is the line in common the two strings

Return values
mixed

tail()

Returns an array of the last $num_lines many lines our of a file

tail(string $file_name, string $num_lines) : array<string|int, mixed>
Parameters
$file_name : string

name of file to return lines from

$num_lines : string

number of lines to retrieve

Return values
array<string|int, mixed>

retrieved lines

lineFilter()

Given an array of lines returns a subarray of those lines containing the filter string or filter array

lineFilter(string $lines, mixed $filters[, bool $case_insensitive = true ]) : array<string|int, mixed>
Parameters
$lines : string

to search

$filters : mixed

either string to filter lines with or an array of strings (any of which can be present to pass the filter)

$case_insensitive : bool = true

whether search should be done case insensitively or not.

Return values
array<string|int, mixed>

lines containing the string

logLineTimestamp()

Tries to extract a timestamp from a line which is presumed to come from a Yioop log file

logLineTimestamp(string $line) : int
Parameters
$line : string

to search

Return values
int

timestamp of that log entry

isPositiveInteger()

Returns whether an input can be parsed to a positive integer

isPositiveInteger(mixed $input) : bool
Parameters
$input : mixed
Return values
bool

whether $input can be parsed to a positive integer.

measureCall()

Used to measure the memory footprint in bytes and time spent calling a method of an object. It also records number of time the method has been called.

measureCall(object $object, string $method[, mixed $arguments = [] ][, string $call_name = "" ]) : mixed

Just calls the method without any recording or timing until an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to.

Parameters
$object : object

name of object whose method we want to call and measure

$method : string

method we're calling

$arguments : mixed = []
$call_name : string = ""

name to use when outputting stats for this call, defaults to $method.

Return values
mixed

whatever method would normally returned when called as above

measureObject()

Used to measure the memory footprint of an object in Yioop and save it to a statistics file No recording is done until an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to.

measureObject(object $object[, string $save_file = "" ][, mixed $class_name = "" ]) : mixed
Parameters
$object : object

name of object whose size we want to measure

$save_file : string = ""

statistics file to write info to

$class_name : mixed = ""
Return values
mixed

measureObjectCall()

General method called by for @see measureCall and @see measureObject Used to measure the memory footprint in bytes of an object or memory and time spent calling a method of an object. It also records number of time the method has been called. When used to call a method before initialization, just calls the method without any recording or timing. To initialize, an initial call to the function measureCall(null, save_statistics_file) where save_statistics_file is the name of the file you won't to store statistics to should be done.

measureObjectCall(object $object, string $method[, mixed $arguments = [] ][, string $call_name = "" ]) : mixed
Parameters
$object : object

name of object whose method we want to call and measure

$method : string

method we're calling

$arguments : mixed = []
$call_name : string = ""

name to use when outputting stats for this call, defaults to $method.

Return values
mixed

whatever method would normally returned when called as above

variableClone()

Makes a deep copy of a variable regardless of its type

variableClone(mixed $var) : mixed
Parameters
$var : mixed

variable to deep copy

Return values
mixed

the deep copy

garbageCollect()

Runs various system garbage collection functions and returns number of bytes freed.

garbageCollect() : int
Return values
int

number of bytes freed

utf8SafeSaveHtml()

The dom method saveHTML has a tendency to replace UTF-8, non-ascii characters with html entities. This is supposed to save avoiding the replacement.

utf8SafeSaveHtml(DOMDocument $dom) : string

What it does is to first save the dom, then it replaces htmlentities of the form &single_char; or &#some_number; with the UTF-8 they correspond to. It leaves all other entities as they are

Parameters
$dom : DOMDocument
Return values
string

output of saving html

utf8WordWrap()

A UTF-8 safe version of PHP's wordwrap function that wraps a string to a given number of characters

utf8WordWrap(string $string[, int $width = 75 ][, string $break = " " ][, bool $cut = false ]) : string
Parameters
$string : string

the input string

$width : int = 75

the number of characters at which the string will be wrapped

$break : string = " "

string used to break a line into two

$cut : bool = false

whether to always force wrap at $width characters even if word hasn't ended

Return values
string

the given string wrapped at the specified length

upgradeDatabaseVersion1()

Upgrades a Version 0 version of the Yioop database to a Version 1 version

upgradeDatabaseVersion1(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion2()

Upgrades a Version 1 version of the Yioop database to a Version 2 version

upgradeDatabaseVersion2(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion3()

Upgrades a Version 2 version of the Yioop database to a Version 3 version

upgradeDatabaseVersion3(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion4()

Upgrades a Version 3 version of the Yioop database to a Version 4 version

upgradeDatabaseVersion4(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion5()

Upgrades a Version 4 version of the Yioop database to a Version 5 version

upgradeDatabaseVersion5(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion6()

Upgrades a Version 5 version of the Yioop database to a Version 6 version

upgradeDatabaseVersion6(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion7()

Upgrades a Version 6 version of the Yioop database to a Version 7 version

upgradeDatabaseVersion7(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion8()

Upgrades a Version 7 version of the Yioop database to a Version 8 version

upgradeDatabaseVersion8(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion9()

Upgrades a Version 8 version of the Yioop database to a Version 9 version

upgradeDatabaseVersion9(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion10()

Upgrades a Version 9 version of the Yioop database to a Version 10 version

upgradeDatabaseVersion10(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion11()

Upgrades a Version 10 version of the Yioop database to a Version 11 version

upgradeDatabaseVersion11(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion12()

Upgrades a Version 11 version of the Yioop database to a Version 12 version

upgradeDatabaseVersion12(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion13()

Upgrades a Version 12 version of the Yioop database to a Version 13 version

upgradeDatabaseVersion13(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion14()

Upgrades a Version 13 version of the Yioop database to a Version 14 version

upgradeDatabaseVersion14(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion15()

Upgrades a Version 14 version of the Yioop database to a Version 15 version

upgradeDatabaseVersion15(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion16()

Upgrades a Version 15 version of the Yioop database to a Version 16 version

upgradeDatabaseVersion16(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion17()

Upgrades a Version 16 version of the Yioop database to a Version 17 version

upgradeDatabaseVersion17(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion18()

Upgrades a Version 17 version of the Yioop database to a Version 18 version

upgradeDatabaseVersion18(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion19()

Upgrades a Version 18 version of the Yioop database to a Version 19 version This update has been superseded by the Version20 update and so its contents have been eliminated.

upgradeDatabaseVersion19(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion20()

Upgrades a Version 19 version of the Yioop database to a Version 20 version This is a major upgrade as the user table have changed. This also acts as a cumulative since version 0.98. It involves a web form that has only been localized to English

upgradeDatabaseVersion20(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion21()

Upgrades a Version 20 version of the Yioop database to a Version 21 version

upgradeDatabaseVersion21(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion22()

Upgrades a Version 21 version of the Yioop database to a Version 22 version

upgradeDatabaseVersion22(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion23()

Upgrades a Version 22 version of the Yioop database to a Version 23 version

upgradeDatabaseVersion23(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion24()

Upgrades a Version 23 version of the Yioop database to a Version 24 version

upgradeDatabaseVersion24(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion25()

Upgrades a Version 24 version of the Yioop database to a Version 25 version This version upgrade includes creation of Help group that holds help pages.

upgradeDatabaseVersion25(object &$db) : mixed

Help Group is created with GROUP_ID=HELP_GROUP_ID. If a Group with Group_ID=HELP_GROUP_ID already exists, then that GROUP is moved to the end of the GROUPS table(Max group id is used).

Parameters
$db : object

data source to use to upgrade

Return values
mixed

upgradeDatabaseVersion26()

Upgrades a Version 25 version of the Yioop database to a Version 26 version This version upgrade includes updation fo the Help pages in the database to work with the changes to the way Hyperlinks are specified in wiki markup.

upgradeDatabaseVersion26(object &$db) : mixed

The changes were implemented to point all articles with page names containing %20 to be able to work with '_' and vice versa.

Parameters
$db : object

data source to use to upgrade

Return values
mixed

upgradeDatabaseVersion27()

Upgrades a Version 26 version of the Yioop database to a Version 27 version

upgradeDatabaseVersion27(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion28()

Upgrades a Version 27 version of the Yioop database to a Version 28 version

upgradeDatabaseVersion28(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion29()

Upgrades a Version 28 version of the Yioop database to a Version 29 version

upgradeDatabaseVersion29(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion30()

Upgrades a Version 29 version of the Yioop database to a Version 30 version

upgradeDatabaseVersion30(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion31()

Upgrades a Version 30 version of the Yioop database to a Version 31 version

upgradeDatabaseVersion31(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion32()

Upgrades a Version 31 version of the Yioop database to a Version 32 version

upgradeDatabaseVersion32(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion33()

Upgrades a Version 32 version of the Yioop database to a Version 33 version

upgradeDatabaseVersion33(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion34()

Upgrades a Version 33 version of the Yioop database to a Version 34 version

upgradeDatabaseVersion34(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion35()

Upgrades a Version 34 version of the Yioop database to a Version 35 version

upgradeDatabaseVersion35(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion36()

Upgrades a Version 35 version of the Yioop database to a Version 36 version

upgradeDatabaseVersion36(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion37()

Upgrades a Version 36 version of the Yioop database to a Version 37 version

upgradeDatabaseVersion37(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion38()

Upgrades a Version 37 version of the Yioop database to a Version 38 version

upgradeDatabaseVersion38(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion39()

Upgrades a Version 38 version of the Yioop database to a Version 39 version

upgradeDatabaseVersion39(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion40()

Upgrades a Version 39 version of the Yioop database to a Version 40 version

upgradeDatabaseVersion40(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion41()

Upgrades a Version 40 version of the Yioop database to a Version 41 version

upgradeDatabaseVersion41(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion42()

Upgrades a Version 41 version of the Yioop database to a Version 42 version

upgradeDatabaseVersion42(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion43()

Upgrades a Version 42 version of the Yioop database to a Version 43 version

upgradeDatabaseVersion43(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion44()

Upgrades a Version 43 version of the Yioop database to a Version 44 version

upgradeDatabaseVersion44(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion45()

Upgrades a Version 44 version of the Yioop database to a Version 45 version

upgradeDatabaseVersion45(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion46()

Upgrades a Version 45 version of the Yioop database to a Version 46 version

upgradeDatabaseVersion46(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion47()

Upgrades a Version 46 version of the Yioop database to a Version 47 version

upgradeDatabaseVersion47(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion48()

Upgrades a Version 47 version of the Yioop database to a Version 48 version

upgradeDatabaseVersion48(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion49()

Upgrades a Version 48 version of the Yioop database to a Version 49 version

upgradeDatabaseVersion49(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion50()

Upgrades a Version 49 version of the Yioop database to a Version 50 version

upgradeDatabaseVersion50(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion51()

Upgrades a Version 50 version of the Yioop database to a Version 51 version

upgradeDatabaseVersion51(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion52()

Upgrades a Version 51 version of the Yioop database to a Version 52 version

upgradeDatabaseVersion52(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion53()

Upgrades a Version 52 version of the Yioop database to a Version 53 version

upgradeDatabaseVersion53(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion54()

Upgrades a Version 53 version of the Yioop database to a Version 54 version

upgradeDatabaseVersion54(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion55()

Upgrades a Version 54 version of the Yioop database to a Version 55 version

upgradeDatabaseVersion55(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion57()

Upgrades a Version 56 version of the Yioop database to a Version 5 version

upgradeDatabaseVersion57(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion58()

Upgrades a Version 57 version of the Yioop database to a Version 58 version

upgradeDatabaseVersion58(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion59()

Upgrades a Version 58 version of the Yioop database to a Version 59 version

upgradeDatabaseVersion59(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion60()

Upgrades a Version 59 version of the Yioop database to a Version 60 version

upgradeDatabaseVersion60(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion61()

Upgrades a Version 60 version of the Yioop database to a Version 61 version

upgradeDatabaseVersion61(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion62()

Upgrades a Version 61 version of the Yioop database to a Version 62 version

upgradeDatabaseVersion62(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion64()

Upgrades a Version 63 version of the Yioop database to a Version 64 version

upgradeDatabaseVersion64(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion65()

Upgrades a Version 64 version of the Yioop database to a Version 65 version

upgradeDatabaseVersion65(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion66()

Upgrades a Version 65 version of the Yioop database to a Version 66 version

upgradeDatabaseVersion66(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion67()

Upgrades a Version 66 version of the Yioop database to a Version 67 version

upgradeDatabaseVersion67(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion68()

Upgrades a Version 67 version of the Yioop database to a Version 68 version

upgradeDatabaseVersion68(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion69()

Upgrades a Version 68 version of the Yioop database to a Version 69 version

upgradeDatabaseVersion69(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion70()

Upgrades a Version 69 version of the Yioop database to a Version 70 version

upgradeDatabaseVersion70(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade.

Return values
mixed

upgradeDatabaseVersion71()

Upgrades a Version 70 version of the Yioop database to a Version 71 version

upgradeDatabaseVersion71(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion72()

Upgrades a Version 71 version of the Yioop database to a Version 72 version

upgradeDatabaseVersion72(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion73()

Upgrades a Version 72 version of the Yioop database to a Version 73 version

upgradeDatabaseVersion73(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion74()

Upgrades a Version 73 version of the Yioop database to a Version 74 version

upgradeDatabaseVersion74(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion75()

Upgrades a Version 74 version of the Yioop database to a Version 75 version

upgradeDatabaseVersion75(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion76()

Upgrades a Version 75 version of the Yioop database to a Version 76 version

upgradeDatabaseVersion76(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion77()

Upgrades a Version 76 version of the Yioop database to a Version 77 version

upgradeDatabaseVersion77(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion78()

Upgrades a Version 77 version of the Yioop database to a Version 78 version

upgradeDatabaseVersion78(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion79()

Upgrades a Version 78 version of the Yioop database to a Version 79 version

upgradeDatabaseVersion79(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion80()

Upgrades a Version 79 version of the Yioop database to a Version 80 version

upgradeDatabaseVersion80(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

upgradeDatabaseVersion81()

Upgrades a Version 80 version of the Yioop database to a Version 81 version

upgradeDatabaseVersion81(object &$db) : mixed
Parameters
$db : object

datasource to use to upgrade

Return values
mixed

webExit()

Function to call instead of exit() to indicate that the script processing the current web page is done processing. Use this rather that exit(), as exit() will also terminate WebSite.

webExit([string $err_msg = "" ]) : mixed
Parameters
$err_msg : string = ""

error message to send on exiting

Tags
throws
WebException
Return values
mixed

makeTableCallback()

Callback used by a preg_replace_callback in nextPage to make a table

makeTableCallback(array<string|int, mixed> $matches) : mixed
Parameters
$matches : array<string|int, mixed>

of table cells

Return values
mixed

citeCallback()

Used to convert {{cite }} to a numbered link to a citation

citeCallback(array<string|int, mixed> $matches[, int $init = -1 ]) : string
Parameters
$matches : array<string|int, mixed>

from regular expression to check for {{cite }}

$init : int = -1

used to initialize counter for citations

Return values
string

a HTML link to citation in current document

fixLinksCallback()

Used to changes spaces to underscores in links generated from our earlier matching rules

fixLinksCallback(array<string|int, mixed> $matches) : string
Parameters
$matches : array<string|int, mixed>

from regular expression to check for links

Return values
string

result of correcting link

base64EncodeCallback()

Callback used to base64 encode the contents of nowiki tags so they won't be manipulated by wiki replacements.

base64EncodeCallback(array<string|int, mixed> $matches) : string
Parameters
$matches : array<string|int, mixed>

$matches[1] should contain the contents of a nowiki tag

Return values
string

base 64 encoded contents surrounded by an escaped nowiki tag.

spaceEncodeCallback()

Callback used to encode the contents of pre tags so they won't accidentally get sub-pre tags because a bunch of leading lines have spaces

spaceEncodeCallback(array<string|int, mixed> $matches) : string
Parameters
$matches : array<string|int, mixed>

$matches[1] should contain the contents of a pre tag

Return values
string

encoded contents surrounded by an escaped pre tag.

spanEncodeCallback()

Callback used to encode the contents of span tags so they newlines within them don't accidentally get treated as new wiki paragraphs

spanEncodeCallback(array<string|int, mixed> $matches) : string
Parameters
$matches : array<string|int, mixed>

$matches[1] should contain the contents of a span tag

Return values
string

encoded contents surrounded by an escaped pre tag.

base64DecodeCallback()

Callback used to base64 decode the contents of previously base64 encoded (@see base64EncodeCallback) nowiki tags after all mediawiki substitutions have been done

base64DecodeCallback(array<string|int, mixed> $matches) : string
Parameters
$matches : array<string|int, mixed>

$matches[1] should contain the contents of a nowiki tag

Return values
string

base 64 decoded, entity decoded contents.

spaceDecodeCallback()

Cleans up pre tags after other wiki rules applied

spaceDecodeCallback(array<string|int, mixed> $matches) : string
Parameters
$matches : array<string|int, mixed>

$matches[1] should contain the contents of a pre tag

Return values
string

cleaned contents surrounded by a pre-formatted tag.

Search results