WeightedFeatures
extends Features
in package
A concrete Features subclass that represents a document as a vector of feature weights, where weights are computed using a modified form of TF * IDF. This feature mapping is experimental, and may not work correctly.
Tags
Table of Contents
- $D : int
- Number of trainin examples
- $feature_map : array<string|int, mixed>
- Maps old feature indices to new ones when a feature subset operation has been applied to restrict the number of features.
- $label_freqs : array<string|int, mixed>
- Maps labels to the number of documents they're assigned to.
- $n : int
- Number of elements in Vocabulary
- $top_terms : array<string|int, mixed>
- A list of the top terms according to the last feature subset operation, if any.
- $var_freqs : array<string|int, mixed>
- Maps terms to how often they occur in documents by label.
- $vocab : array<string|int, mixed>
- Maps terms to their feature indices, which start at 1.
- addExample() : array<string|int, mixed>
- Maps a new example to a feature vector, adding any new terms to the vocabulary, and updating term and label statistics. The example should be an array of terms and their counts, and the output simply replaces terms with feature indices.
- labelStats() : array<string|int, mixed>
- Returns the positive and negative label counts for the training set.
- mapDocument() : array<string|int, mixed>
- {@inheritDocs}
- mapToRestrictedFeatures() : array<string|int, mixed>
- Maps the indices of a feature vector to those used by a restricted feature set, dropping and features that aren't in the map. If this Features instance isn't restricted, then the passed-in features are returned unmodified.
- mapTrainingSet() : object
- {@inheritDocs}
- numFeatures() : int
- Returns the number of features, not including the intercept term represented by feature zero. For example, if we had features 0..10, this function would return 10.
- restrict() : object
- Given a FeatureSelection instance, return a new clone of this Features instance using a restricted feature subset. The new Features instance is augmented with a feature map that it can use to convert feature indices from the larger feature set to indices for the reduced set.
- updateExampleLabel() : mixed
- Updates the label and term statistics to reflect a label change for an example from the training set. A new label of 0 indicates that the example is being removed entirely. Note that term statistics only count one occurrence of a term per example.
- varStats() : array<string|int, mixed>
- Returns the statistics for a particular feature and label in the training set. The statistics are counts of how often the term appears or fails to appear in examples with or without the target label. They are returned in a flat array, in the following order:
Properties
$D
Number of trainin examples
public
int
$D
= 0
$feature_map
Maps old feature indices to new ones when a feature subset operation has been applied to restrict the number of features.
public
array<string|int, mixed>
$feature_map
$label_freqs
Maps labels to the number of documents they're assigned to.
public
array<string|int, mixed>
$label_freqs
= [-1 => 0, 1 => 0]
$n
Number of elements in Vocabulary
public
int
$n
= []
$top_terms
A list of the top terms according to the last feature subset operation, if any.
public
array<string|int, mixed>
$top_terms
= []
$var_freqs
Maps terms to how often they occur in documents by label.
public
array<string|int, mixed>
$var_freqs
= []
$vocab
Maps terms to their feature indices, which start at 1.
public
array<string|int, mixed>
$vocab
= []
Methods
addExample()
Maps a new example to a feature vector, adding any new terms to the vocabulary, and updating term and label statistics. The example should be an array of terms and their counts, and the output simply replaces terms with feature indices.
public
addExample(array<string|int, mixed> $terms, int $label) : array<string|int, mixed>
Parameters
- $terms : array<string|int, mixed>
-
array of terms mapped to the number of times they occur in the example
- $label : int
-
label for this example, either -1 or 1
Return values
array<string|int, mixed> —input example with terms replaced by feature indices
labelStats()
Returns the positive and negative label counts for the training set.
public
labelStats() : array<string|int, mixed>
Return values
array<string|int, mixed> —positive and negative label counts indexed by label, either 1 or -1
mapDocument()
{@inheritDocs}
public
mapDocument(array<string|int, mixed> $tokens) : array<string|int, mixed>
Parameters
- $tokens : array<string|int, mixed>
-
associative array of terms mapped to their within-document counts
Return values
array<string|int, mixed> —feature vector corresponding to the tokens, mapped according to the implementation of a particular Features subclass
mapToRestrictedFeatures()
Maps the indices of a feature vector to those used by a restricted feature set, dropping and features that aren't in the map. If this Features instance isn't restricted, then the passed-in features are returned unmodified.
public
mapToRestrictedFeatures(array<string|int, mixed> $features) : array<string|int, mixed>
Parameters
- $features : array<string|int, mixed>
-
feature vector mapping feature indices to frequencies
Return values
array<string|int, mixed> —original feature vector with indices mapped according to the feature_map property, and any features that don't occur in feature_map dropped
mapTrainingSet()
{@inheritDocs}
public
mapTrainingSet(array<string|int, mixed> $docs) : object
Parameters
- $docs : array<string|int, mixed>
-
array of training examples represented as feature vectors where the values are per-example counts
Return values
object —SparseMatrix instance whose rows are the transformed feature vectors
numFeatures()
Returns the number of features, not including the intercept term represented by feature zero. For example, if we had features 0..10, this function would return 10.
public
numFeatures() : int
Return values
int —the number of features in the training set
restrict()
Given a FeatureSelection instance, return a new clone of this Features instance using a restricted feature subset. The new Features instance is augmented with a feature map that it can use to convert feature indices from the larger feature set to indices for the reduced set.
public
restrict(object $fs) : object
Parameters
- $fs : object
-
FeatureSelection instance to be used to select the most informative terms
Return values
object —new Features instance using the restricted feature set
updateExampleLabel()
Updates the label and term statistics to reflect a label change for an example from the training set. A new label of 0 indicates that the example is being removed entirely. Note that term statistics only count one occurrence of a term per example.
public
updateExampleLabel(array<string|int, mixed> $features, int $old_label, int $new_label) : mixed
Parameters
- $features : array<string|int, mixed>
-
feature vector from when the example was originally added
- $old_label : int
-
old example label in {-1, 1}
- $new_label : int
-
new example label in {-1, 0, 1}, where 0 indicates that the example should be removed entirely
Return values
mixed —varStats()
Returns the statistics for a particular feature and label in the training set. The statistics are counts of how often the term appears or fails to appear in examples with or without the target label. They are returned in a flat array, in the following order:
public
varStats(int $j, int $label) : array<string|int, mixed>
0 => # examples where feature present, label matches 1 => # examples where feature present, label doesn't match 2 => # examples where feature absent, label matches 3 => # examples where feature absent, label doesn't match
Parameters
- $j : int
-
feature index
- $label : int
-
target label
Return values
array<string|int, mixed> —feature statistics in 4-element flat array