summarizers
Interfaces, Classes, Traits and Enums
- CentroidSummarizer
- Class which may be used by TextProcessors to get a summary for a text
document that may later be used for indexing. This is done by
the @see getSummmary method. getSummary does this splitting
the document into sentences and computing inverse sentence frequency
(should be ISL, but we call IDF) scores for each term. It then computes
an average document vector (we call centroid) with components
(total number of occurrences of term) * (IDF score of term).
- CentroidWeightedSummarizer
- Class which may be used by TextProcessors to get a summary for a text
document that may later be used for indexing. This is done by
the @see getSummmary method. To generate a summary a normalized
term frequency vector is computed for each sentence. An average
vector is then computed by summing these and renormalizing the result.
- GraphBasedSummarizer
- Class which may be used by TextProcessors to get a summary for a text
document that may later be used for indexing. The method @see getSummary
is used to obtain such a summary. In GraphBasedSummarizer's implementation
of this method sentences are ranks using a page rank style algorithm
based on sentence adjacencies calculated using a distortion score between
pair of sentence (@see LinearAlgebra::distortion for details on this).
- ScrapeSummarizer
- Class which may be used by TextProcessors to get a summary for a text
document that may later be used for indexing.
- Summarizer
- Base class for all summarizers. Summarizers chief method is
getSummary which is supposed to take a text or XML
document and produces a summary of that document up to
PageProcessor::$max_description_len many characters. Summarizers
also contain various methods to generate word cloud from such a summary