BloomFilterFile
extends PersistentStructure
in package
Code used to manage a bloom filter in-memory and in file.
A Bloom filter is used to store a set of objects. It can support inserts into the set and it can also be used to check membership in the set.
Tags
Table of Contents
- DEFAULT_SAVE_FREQUENCY = 50000
- If not specified in the constructor, this will be the number of operations between saves
- $count : int
- Number of items currently stored in this filter
- $filename : string
- Name of the file in which to store the PersistentStructure
- $filter : string
- Packed string used to store the Bloom filters
- $filter_size : int
- Size in bits of the packed string array used to store the filter's contents
- $max_gram_len : int
- Maximum length for an n-gram (only used where bloom filter used to store n-grams)
- $num_keys : int
- Number of bit positions in the Bloom filter used to say an item is in the filter
- $save_frequency : int
- Number of operation between saves. If == -1 never save using checkSave
- $unsaved_operations : int
- Number of operations since the last save
- __construct() : mixed
- Initializes the fields of the BloomFilter and its base PersistentStructure.
- add() : mixed
- Inserts the provided item into the Bloomfilter
- checkSave() : mixed
- Add one to the unsaved_operations count. If this goes above the save_frquency then save the PersistentStructure to secondary storage
- contains() : bool
- Checks if the BloomFilter contains the provided $value
- getBit() : bool
- Looks up the value of the ith bit position in the filter
- getHashBitPositionArray() : int
- Hashes $value to a bit position in the BloomFilter
- load() : object
- Load a PersistentStructure from a file
- save() : mixed
- Save the PersistentStructure to its filename This method is generic but super memory inefficient, so reimplement for subclasses is needed
- setBit() : mixed
- Sets to true the ith bit position in the filter.
Constants
DEFAULT_SAVE_FREQUENCY
If not specified in the constructor, this will be the number of operations between saves
public
int
DEFAULT_SAVE_FREQUENCY
= 50000
Properties
$count
Number of items currently stored in this filter
public
int
$count
$filename
Name of the file in which to store the PersistentStructure
public
string
$filename
$filter
Packed string used to store the Bloom filters
public
string
$filter
$filter_size
Size in bits of the packed string array used to store the filter's contents
public
int
$filter_size
$max_gram_len
Maximum length for an n-gram (only used where bloom filter used to store n-grams)
public
int
$max_gram_len
$num_keys
Number of bit positions in the Bloom filter used to say an item is in the filter
public
int
$num_keys
$save_frequency
Number of operation between saves. If == -1 never save using checkSave
public
int
$save_frequency
$unsaved_operations
Number of operations since the last save
public
int
$unsaved_operations
Methods
__construct()
Initializes the fields of the BloomFilter and its base PersistentStructure.
public
__construct(string $fname, int $num_values[, int $save_frequency = self::DEFAULT_SAVE_FREQUENCY ]) : mixed
Parameters
- $fname : string
-
name of the file to store the BloomFilter data in
- $num_values : int
-
the maximum number of values that will be stored in the BloomFilter. Filter will be sized so the odds of a false positive are roughly one over this value
- $save_frequency : int = self::DEFAULT_SAVE_FREQUENCY
-
how often to store the BloomFilter to disk
Return values
mixed —add()
Inserts the provided item into the Bloomfilter
public
add(string $value) : mixed
Parameters
- $value : string
-
item to add to filter
Return values
mixed —checkSave()
Add one to the unsaved_operations count. If this goes above the save_frquency then save the PersistentStructure to secondary storage
public
checkSave() : mixed
Return values
mixed —contains()
Checks if the BloomFilter contains the provided $value
public
contains(string $value) : bool
Parameters
- $value : string
-
item to check if is in the BloomFilter
Return values
bool —whether $value was in the filter or not
getBit()
Looks up the value of the ith bit position in the filter
public
getBit(int $i) : bool
Parameters
- $i : int
-
the position to look up
Return values
bool —the value of the looked up position
getHashBitPositionArray()
Hashes $value to a bit position in the BloomFilter
public
getHashBitPositionArray(string $value, int $num_keys) : int
Parameters
- $value : string
-
value to map to a bit position in the filter
- $num_keys : int
-
number of bit positions in the Bloom filter used to say an item isin the filter
Return values
int —the bit position mapped to
load()
Load a PersistentStructure from a file
public
static load(string $fname) : object
Parameters
- $fname : string
-
the name of the file to load the PersistentStructure from
Return values
object —the PersistentStructure loaded
save()
Save the PersistentStructure to its filename This method is generic but super memory inefficient, so reimplement for subclasses is needed
public
save() : mixed
Return values
mixed —setBit()
Sets to true the ith bit position in the filter.
public
setBit(int $i) : mixed
Parameters
- $i : int
-
the position to set to true