SuffixTree
in package
Data structure used to maintain a suffix tree for a passage of words.
The suffix tree is constructed using the linear time algorithm of Ukkonen, E. (1995). "On-line construction of suffix trees". Algorithmica 14 (3): 249–260.
Tags
Table of Contents
- INFTY = 2000000000
- Upper bound on the length of any path in the tree
- $active_edge_index : int
- Index into $this->text of starting word of active edge
- $active_index : int
- Node which represents the left hand the start of the active edge This is the edge that contains the last suffix inserted
- $active_len : int
- How many words from the start of the active edge label to get the last suffix. If active edge label was: "a black cat a black" and $active_len was 2, then would have "a black" from the first two chars.
- $last_added : int
- Index of last node added to the suffix tree in the array used to hold the suffix tree data structures
- $need_sym_link : int
- If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. This variable keeps track of the index of the first node so we can do this.
- $pos : int
- Position in the $this->text up to which we have created a suffix tree so far
- $remainder : int
- At a given stage in building the suffix tree how many new suffixes we need to insert
- $root : array<string|int, mixed>
- The root node of the suffix trees
- $size : int
- Number of elements in $this->text. i.e., count($this->text)
- $text : array<string|int, mixed>
- The sequence of terms, one array entry per term, that a suffix tree is to be made from
- $tree : array<string|int, mixed>
- Used to hold the suffix tree data structure (represented as a sequence of nodes)
- __construct() : mixed
- Initializes a suffix tree based on the supplied array of terms.
- addSuffixLink() : mixed
- If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. If in the current step it is necessary to add a sym_link this method sets the $this->need_sym_link node's "sym_link" field to $index which is supposed be the index of the second created node.
- buildTree() : mixed
- Builds the complete suffix tree for the text currently stored in $this->text. If you change this text and call this method again, it build a new tree based on the new text. Uses Ukkonen
- edgeLength() : mixed
- The number of elements out of $this->text that this node is currently responsible for
- makeNode() : mixed
- Makes a new node for the suffix tree structure. This node is inserted at the end of the tree so far. A node is associative array consisting of the fields "start" whose value is the starting location in $this->text for this node, "end" location in $this->text up to which this node is responsible, "sym_link" is a link to an isomorphic subtree for the purposes of building the suffix tree, and "next" is an array of next children in the tree.
- outputMaximal() : mixed
- Recursive function used to compute the maximal phrases in a document as well as their conditional maximal subphrases.
- suffixTreeExtend() : mixed
- Given a suffix tree of the array of terms in $this->text up to $this->pos, adds one to pos and build the suffix tree up to this new value. i.e., the text with one more term added.
- walkDown() : if
- Used to set the active point to the node given by $index
Constants
INFTY
Upper bound on the length of any path in the tree
public
mixed
INFTY
= 2000000000
Properties
$active_edge_index
Index into $this->text of starting word of active edge
public
int
$active_edge_index
$active_index
Node which represents the left hand the start of the active edge This is the edge that contains the last suffix inserted
public
int
$active_index
$active_len
How many words from the start of the active edge label to get the last suffix. If active edge label was: "a black cat a black" and $active_len was 2, then would have "a black" from the first two chars.
public
int
$active_len
$last_added
Index of last node added to the suffix tree in the array used to hold the suffix tree data structures
public
int
$last_added
$need_sym_link
If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. This variable keeps track of the index of the first node so we can do this.
public
int
$need_sym_link
$pos
Position in the $this->text up to which we have created a suffix tree so far
public
int
$pos
$remainder
At a given stage in building the suffix tree how many new suffixes we need to insert
public
int
$remainder
$root
The root node of the suffix trees
public
array<string|int, mixed>
$root
$size
Number of elements in $this->text. i.e., count($this->text)
public
int
$size
$text
The sequence of terms, one array entry per term, that a suffix tree is to be made from
public
array<string|int, mixed>
$text
$tree
Used to hold the suffix tree data structure (represented as a sequence of nodes)
public
array<string|int, mixed>
$tree
Methods
__construct()
Initializes a suffix tree based on the supplied array of terms.
public
__construct(array<string|int, mixed> $text) : mixed
Parameters
- $text : array<string|int, mixed>
-
a sequence of terms to build the suffix tree for
Return values
mixed —addSuffixLink()
If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. If in the current step it is necessary to add a sym_link this method sets the $this->need_sym_link node's "sym_link" field to $index which is supposed be the index of the second created node.
public
addSuffixLink(int $index) : mixed
Parameters
- $index : int
-
the index of the a created node in a given step. ($this->need_sym_link will be greater than 0 if it is the second created node of the step)
Return values
mixed —buildTree()
Builds the complete suffix tree for the text currently stored in $this->text. If you change this text and call this method again, it build a new tree based on the new text. Uses Ukkonen
public
buildTree() : mixed
Return values
mixed —edgeLength()
The number of elements out of $this->text that this node is currently responsible for
public
edgeLength(array<string|int, mixed> &$node) : mixed
Parameters
- $node : array<string|int, mixed>
-
the node to compute the length of
Return values
mixed —makeNode()
Makes a new node for the suffix tree structure. This node is inserted at the end of the tree so far. A node is associative array consisting of the fields "start" whose value is the starting location in $this->text for this node, "end" location in $this->text up to which this node is responsible, "sym_link" is a link to an isomorphic subtree for the purposes of building the suffix tree, and "next" is an array of next children in the tree.
public
makeNode(int $start[, int $end = self::INFTY ]) : mixed
Parameters
- $start : int
-
what to use as the start value mentioned above
- $end : int = self::INFTY
-
what to use as the start value mentioned above
Return values
mixed —outputMaximal()
Recursive function used to compute the maximal phrases in a document as well as their conditional maximal subphrases.
public
outputMaximal(int $index, string $path, int $len, array<string|int, mixed> &$maximal) : mixed
Parameters
- $index : int
-
a node in the suffix tree
- $path : string
-
from root to current node
- $len : int
-
number of nodes from root to current node in suffix tree
- $maximal : array<string|int, mixed>
-
assoc array of phrase => (cond_max => pos of conditional maximal subphrase, [0] => pos_1st_occurrence of phrase, [1]=>pos_2nd_occurrence of phrase, etc)
Return values
mixed —suffixTreeExtend()
Given a suffix tree of the array of terms in $this->text up to $this->pos, adds one to pos and build the suffix tree up to this new value. i.e., the text with one more term added.
public
suffixTreeExtend() : mixed
Return values
mixed —walkDown()
Used to set the active point to the node given by $index
public
walkDown(int $index) : if
Parameters
- $index : int
-
which node to use for setting
Return values
if —the current active edge is longer than $index's edge length then don't update and return false; otherwise, return true