NaiveBayes
extends ClassifierAlgorithm
in package
Implements the Naive Bayes text classification algorithm.
This class also provides a method to sample a beta vector from a dataset, making it easy to generate several slightly-different classifiers for the same dataset in order to form classifier committees.
Tags
Table of Contents
- $beta : array<string|int, mixed>
- Beta vector of feature weights resulting from the training phase. The dot product of this vector with a feature vector yields the log likelihood that the feature vector describes a document belonging to the trained-for class.
- $debug : int
- Flag used to control level of debug messages for now 0 == no messages, anything else causes messages to be output
- $epsilon : float
- Parameter used to weight negative examples.
- $gamma : float
- Parameter used to weight positive examples.
- classify() : mixed
- Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.
- log() : mixed
- Write a message to log file depending on debug level for this subpackage
- logit() : float
- Computes the log odds of a numerator and denominator, corresponding to the number of positive and negative examples exhibiting some feature.
- sampleBeta() : mixed
- Constructs beta by sampling from the Gamma distribution for each feature, parameterized by the number of times the feature appears in positive examples, with a scale/rate of 1. This function is used to construct classifier committees.
- sampleGammaDeviate() : float
- Computes a Gamma deviate with beta = 1 and integral, small alpha. With these assumptions, the deviate is just the sum of alpha exponential deviates. Each exponential deviate is just the negative log of a uniform deviate, so the sum of the logs is just the negative log of the products of the uniform deviates.
- train() : mixed
- Computes the beta vector from the given examples and labels. The examples are represented as a sparse matrix where each row is an example and each column a feature, and the labels as an array where each value is either 1 or -1, corresponding to a positive or negative example. Note that the first feature (column 0) corresponds to an intercept term, and is equal to 1 for every example.
Properties
$beta
Beta vector of feature weights resulting from the training phase. The dot product of this vector with a feature vector yields the log likelihood that the feature vector describes a document belonging to the trained-for class.
public
array<string|int, mixed>
$beta
$debug
Flag used to control level of debug messages for now 0 == no messages, anything else causes messages to be output
public
int
$debug
= 0
$epsilon
Parameter used to weight negative examples.
public
float
$epsilon
= 1.0
$gamma
Parameter used to weight positive examples.
public
float
$gamma
= 1.0
Methods
classify()
Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.
public
classify(array<string|int, mixed> $x) : mixed
Parameters
- $x : array<string|int, mixed>
-
feature vector represented by an associative array mapping features to their weights
Return values
mixed —log()
Write a message to log file depending on debug level for this subpackage
public
log(string $message) : mixed
Parameters
- $message : string
-
what to write to the log
Return values
mixed —logit()
Computes the log odds of a numerator and denominator, corresponding to the number of positive and negative examples exhibiting some feature.
public
logit(int $pos, int $neg) : float
Parameters
- $pos : int
-
count of positive examples exhibiting some feature
- $neg : int
-
count of negative examples
Return values
float —log odds of seeing the feature in a positive example
sampleBeta()
Constructs beta by sampling from the Gamma distribution for each feature, parameterized by the number of times the feature appears in positive examples, with a scale/rate of 1. This function is used to construct classifier committees.
public
sampleBeta(object $features) : mixed
Parameters
- $features : object
-
Features instance for the training set, used to determine how often a given feature occurs in positive and negative examples
Return values
mixed —sampleGammaDeviate()
Computes a Gamma deviate with beta = 1 and integral, small alpha. With these assumptions, the deviate is just the sum of alpha exponential deviates. Each exponential deviate is just the negative log of a uniform deviate, so the sum of the logs is just the negative log of the products of the uniform deviates.
public
sampleGammaDeviate(int $alpha) : float
Parameters
- $alpha : int
-
parameter to Gamma distribution (in practice, a count of occurrences of some feature)
Return values
float —a deviate from the Gamma distribution parameterized by $alpha
train()
Computes the beta vector from the given examples and labels. The examples are represented as a sparse matrix where each row is an example and each column a feature, and the labels as an array where each value is either 1 or -1, corresponding to a positive or negative example. Note that the first feature (column 0) corresponds to an intercept term, and is equal to 1 for every example.
public
train(object $X, array<string|int, mixed> $y) : mixed
Parameters
- $X : object
-
SparseMatrix of training examples
- $y : array<string|int, mixed>
-
example labels