Yioop_V9.5_Source_Code_Documentation

SuffixTree
in package

Data structure used to maintain a suffix tree for a passage of words.

The suffix tree is constructed using the linear time algorithm of Ukkonen, E. (1995). "On-line construction of suffix trees". Algorithmica 14 (3): 249–260.

Tags
author

Chris Pollett

Table of Contents

INFTY  = 2000000000
Upper bound on the length of any path in the tree
$active_edge_index  : int
Index into $this->text of starting word of active edge
$active_index  : int
Node which represents the left hand the start of the active edge This is the edge that contains the last suffix inserted
$active_len  : int
How many words from the start of the active edge label to get the last suffix. If active edge label was: "a black cat a black" and $active_len was 2, then would have "a black" from the first two chars.
$last_added  : int
Index of last node added to the suffix tree in the array used to hold the suffix tree data structures
$need_sym_link  : int
If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. This variable keeps track of the index of the first node so we can do this.
$pos  : int
Position in the $this->text up to which we have created a suffix tree so far
$remainder  : int
At a given stage in building the suffix tree how many new suffixes we need to insert
$root  : array<string|int, mixed>
The root node of the suffix trees
$size  : int
Number of elements in $this->text. i.e., count($this->text)
$text  : array<string|int, mixed>
The sequence of terms, one array entry per term, that a suffix tree is to be made from
$tree  : array<string|int, mixed>
Used to hold the suffix tree data structure (represented as a sequence of nodes)
__construct()  : mixed
Initializes a suffix tree based on the supplied array of terms.
addSuffixLink()  : mixed
If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. If in the current step it is necessary to add a sym_link this method sets the $this->need_sym_link node's "sym_link" field to $index which is supposed be the index of the second created node.
buildTree()  : mixed
Builds the complete suffix tree for the text currently stored in $this->text. If you change this text and call this method again, it build a new tree based on the new text. Uses Ukkonen
edgeLength()  : mixed
The number of elements out of $this->text that this node is currently responsible for
makeNode()  : mixed
Makes a new node for the suffix tree structure. This node is inserted at the end of the tree so far. A node is associative array consisting of the fields "start" whose value is the starting location in $this->text for this node, "end" location in $this->text up to which this node is responsible, "sym_link" is a link to an isomorphic subtree for the purposes of building the suffix tree, and "next" is an array of next children in the tree.
outputMaximal()  : mixed
Recursive function used to compute the maximal phrases in a document as well as their conditional maximal subphrases.
suffixTreeExtend()  : mixed
Given a suffix tree of the array of terms in $this->text up to $this->pos, adds one to pos and build the suffix tree up to this new value. i.e., the text with one more term added.
walkDown()  : if
Used to set the active point to the node given by $index

Constants

INFTY

Upper bound on the length of any path in the tree

public mixed INFTY = 2000000000

Properties

$active_edge_index

Index into $this->text of starting word of active edge

public int $active_edge_index

$active_index

Node which represents the left hand the start of the active edge This is the edge that contains the last suffix inserted

public int $active_index

$active_len

How many words from the start of the active edge label to get the last suffix. If active edge label was: "a black cat a black" and $active_len was 2, then would have "a black" from the first two chars.

public int $active_len

$last_added

Index of last node added to the suffix tree in the array used to hold the suffix tree data structures

public int $last_added

If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. This variable keeps track of the index of the first node so we can do this.

public int $need_sym_link

$pos

Position in the $this->text up to which we have created a suffix tree so far

public int $pos

$remainder

At a given stage in building the suffix tree how many new suffixes we need to insert

public int $remainder

$root

The root node of the suffix trees

public array<string|int, mixed> $root

$size

Number of elements in $this->text. i.e., count($this->text)

public int $size

$text

The sequence of terms, one array entry per term, that a suffix tree is to be made from

public array<string|int, mixed> $text

$tree

Used to hold the suffix tree data structure (represented as a sequence of nodes)

public array<string|int, mixed> $tree

Methods

__construct()

Initializes a suffix tree based on the supplied array of terms.

public __construct(array<string|int, mixed> $text) : mixed
Parameters
$text : array<string|int, mixed>

a sequence of terms to build the suffix tree for

Return values
mixed

If in a given step in constructing the suffix tree we split the active edge and insert a new node and then have to do this again in the same step, then we need to create a sym_link between the suffix trees represented by these new nodes. If in the current step it is necessary to add a sym_link this method sets the $this->need_sym_link node's "sym_link" field to $index which is supposed be the index of the second created node.

public addSuffixLink(int $index) : mixed
Parameters
$index : int

the index of the a created node in a given step. ($this->need_sym_link will be greater than 0 if it is the second created node of the step)

Return values
mixed

buildTree()

Builds the complete suffix tree for the text currently stored in $this->text. If you change this text and call this method again, it build a new tree based on the new text. Uses Ukkonen

public buildTree() : mixed
Return values
mixed

edgeLength()

The number of elements out of $this->text that this node is currently responsible for

public edgeLength(array<string|int, mixed> &$node) : mixed
Parameters
$node : array<string|int, mixed>

the node to compute the length of

Return values
mixed

makeNode()

Makes a new node for the suffix tree structure. This node is inserted at the end of the tree so far. A node is associative array consisting of the fields "start" whose value is the starting location in $this->text for this node, "end" location in $this->text up to which this node is responsible, "sym_link" is a link to an isomorphic subtree for the purposes of building the suffix tree, and "next" is an array of next children in the tree.

public makeNode(int $start[, int $end = self::INFTY ]) : mixed
Parameters
$start : int

what to use as the start value mentioned above

$end : int = self::INFTY

what to use as the start value mentioned above

Return values
mixed

outputMaximal()

Recursive function used to compute the maximal phrases in a document as well as their conditional maximal subphrases.

public outputMaximal(int $index, string $path, int $len, array<string|int, mixed> &$maximal) : mixed
Parameters
$index : int

a node in the suffix tree

$path : string

from root to current node

$len : int

number of nodes from root to current node in suffix tree

$maximal : array<string|int, mixed>

assoc array of phrase => (cond_max => pos of conditional maximal subphrase, [0] => pos_1st_occurrence of phrase, [1]=>pos_2nd_occurrence of phrase, etc)

Return values
mixed

suffixTreeExtend()

Given a suffix tree of the array of terms in $this->text up to $this->pos, adds one to pos and build the suffix tree up to this new value. i.e., the text with one more term added.

public suffixTreeExtend() : mixed
Return values
mixed

walkDown()

Used to set the active point to the node given by $index

public walkDown(int $index) : if
Parameters
$index : int

which node to use for setting

Return values
if

the current active edge is longer than $index's edge length then don't update and return false; otherwise, return true


        

Search results