Yioop_V9.5_Source_Code_Documentation

NaiveBayes extends ClassifierAlgorithm
in package

Implements the Naive Bayes text classification algorithm.

This class also provides a method to sample a beta vector from a dataset, making it easy to generate several slightly-different classifiers for the same dataset in order to form classifier committees.

Tags
author

Shawn Tice

Table of Contents

$beta  : array<string|int, mixed>
Beta vector of feature weights resulting from the training phase. The dot product of this vector with a feature vector yields the log likelihood that the feature vector describes a document belonging to the trained-for class.
$debug  : int
Flag used to control level of debug messages for now 0 == no messages, anything else causes messages to be output
$epsilon  : float
Parameter used to weight negative examples.
$gamma  : float
Parameter used to weight positive examples.
classify()  : mixed
Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.
log()  : mixed
Write a message to log file depending on debug level for this subpackage
logit()  : float
Computes the log odds of a numerator and denominator, corresponding to the number of positive and negative examples exhibiting some feature.
sampleBeta()  : mixed
Constructs beta by sampling from the Gamma distribution for each feature, parameterized by the number of times the feature appears in positive examples, with a scale/rate of 1. This function is used to construct classifier committees.
sampleGammaDeviate()  : float
Computes a Gamma deviate with beta = 1 and integral, small alpha. With these assumptions, the deviate is just the sum of alpha exponential deviates. Each exponential deviate is just the negative log of a uniform deviate, so the sum of the logs is just the negative log of the products of the uniform deviates.
train()  : mixed
Computes the beta vector from the given examples and labels. The examples are represented as a sparse matrix where each row is an example and each column a feature, and the labels as an array where each value is either 1 or -1, corresponding to a positive or negative example. Note that the first feature (column 0) corresponds to an intercept term, and is equal to 1 for every example.

Properties

$beta

Beta vector of feature weights resulting from the training phase. The dot product of this vector with a feature vector yields the log likelihood that the feature vector describes a document belonging to the trained-for class.

public array<string|int, mixed> $beta

$debug

Flag used to control level of debug messages for now 0 == no messages, anything else causes messages to be output

public int $debug = 0

$epsilon

Parameter used to weight negative examples.

public float $epsilon = 1.0

$gamma

Parameter used to weight positive examples.

public float $gamma = 1.0

Methods

classify()

Returns the pseudo-probability that a new instance is a positive example of the class the beta vector was trained to recognize. It only makes sense to try classification after at least some training has been done on a dataset that includes both positive and negative examples of the target class.

public classify(array<string|int, mixed> $x) : mixed
Parameters
$x : array<string|int, mixed>

feature vector represented by an associative array mapping features to their weights

Return values
mixed

log()

Write a message to log file depending on debug level for this subpackage

public log(string $message) : mixed
Parameters
$message : string

what to write to the log

Return values
mixed

logit()

Computes the log odds of a numerator and denominator, corresponding to the number of positive and negative examples exhibiting some feature.

public logit(int $pos, int $neg) : float
Parameters
$pos : int

count of positive examples exhibiting some feature

$neg : int

count of negative examples

Return values
float

log odds of seeing the feature in a positive example

sampleBeta()

Constructs beta by sampling from the Gamma distribution for each feature, parameterized by the number of times the feature appears in positive examples, with a scale/rate of 1. This function is used to construct classifier committees.

public sampleBeta(object $features) : mixed
Parameters
$features : object

Features instance for the training set, used to determine how often a given feature occurs in positive and negative examples

Return values
mixed

sampleGammaDeviate()

Computes a Gamma deviate with beta = 1 and integral, small alpha. With these assumptions, the deviate is just the sum of alpha exponential deviates. Each exponential deviate is just the negative log of a uniform deviate, so the sum of the logs is just the negative log of the products of the uniform deviates.

public sampleGammaDeviate(int $alpha) : float
Parameters
$alpha : int

parameter to Gamma distribution (in practice, a count of occurrences of some feature)

Return values
float

a deviate from the Gamma distribution parameterized by $alpha

train()

Computes the beta vector from the given examples and labels. The examples are represented as a sparse matrix where each row is an example and each column a feature, and the labels as an array where each value is either 1 or -1, corresponding to a positive or negative example. Note that the first feature (column 0) corresponds to an intercept term, and is equal to 1 for every example.

public train(object $X, array<string|int, mixed> $y) : mixed
Parameters
$X : object

SparseMatrix of training examples

$y : array<string|int, mixed>

example labels

Return values
mixed

        

Search results