Package Martel :: Module Iterator
[hide private]
[frames] | no frames]

Module Iterator

source code

Iterate over records of a XML parse tree.

The standard parser is callback based over all the elements of a file. If the file contains records, many people would like to be able to iterate over each record and only use the callback parser to analyze the record.

If the expression is a 'ParseRecords', then the code to do this is easy; use its make_reader to grab records and its record_expression to parse them. However, this isn't general enough. The use of a ParseRecords in the format definition should be strictly a implementation decision for better memory use. So there needs to be an API which allows both full and record oriented parsers.

Here's an example use of the API: >>> import sys >>> import swissprot38 # one is in Martel/test/testformats >>> from xml.dom import pulldom >>> iterator = swissprot38.format.make_iterator("swissprot38_record") >>> text = open("sample.swissprot").read() >>> for record in iterator.iterateString(text, pulldom.SAX2DOM()): .. print "Read a record with the following AC numbers:" ... for acc in record.document.getElementsByTagName("ac_number"): ... acc.writexml(sys.stdout) ... sys.stdout.write(" ") ...

There are several parts to this API. First is the 'Iterator

There are two parts to the API. One is the EventStream. This contains a single method called "next()" which returns a list of SAX events in the 2-ple (event_name, args). It is called multiple times to return successive event lists and returns None if no events are available.

The other is the Iterator

Sean McGrath has a RAX parser (Record API for XML) which uses a concept similar to this.

Classes [hide private]
  StoreEvents
  EventStream
  Iterator
  RecordEventStream
  IteratorRecords
  HeaderFooterEventStream
  IteratorHeaderFooter
  Iterate
Functions [hide private]
 
_get_next_text(reader) source code