archive_bundle_iterators
Interfaces, Classes, Traits and Enums
- ArcArchiveBundleIterator
- Used to iterate through the records of a collection of arc files stored in
a WebArchiveBundle folder. Arc is the file format of the Internet Archive
http://www.archive.org/web/researcher/ArcFileFormat.php. Iteration would be
for the purpose making an index of these records
- ArchiveBundleIterator
- Abstract class used to model iterating documents indexed in
an WebArchiveBundle or set of such bundles.
- DatabaseBundleIterator
- Used to iterate through the records that result from an SQL query to a
database
- MediaWikiArchiveBundleIterator
- Used to iterate through a collection of .xml.bz2 media wiki files
stored in a WebArchiveBundle folder. Here these media wiki files contain the
kinds of documents used by wikipedia. Iteration would be
for the purpose making an index of these records
- MixArchiveBundleIterator
- Used to do an archive crawl based on the results of a crawl mix.
- OdpRdfArchiveBundleIterator
- Used to iterate through the records of a collection of one or more open
directory RDF files stored in a WebArchiveBundle folder. Open Directory
file can be found at http://rdf.dmoz.org/ . Iteration would be
for the purpose making an index of these records
- TextArchiveBundleIterator
- Used to iterate through the records of a collection of text or compressed
text-oriented records
- WarcArchiveBundleIterator
- Used to iterate through the records of a collection of warc files stored in
a WebArchiveBundle folder. Warc is the newer file format of the
Internet Archive and other for digital preservation:
http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
http://archive-access.sourceforge.net/warc/
Iteration is done for the purpose making an index of these records
- WebArchiveBundleIterator
- Class used to model iterating documents indexed in
an WebArchiveBundle. This would typically be for the purpose
of re-indexing these documents.