Sometimes the DOM tree output is just too large to fit reasonably into
memory. In that case (and if you don't expect to save back the XML document
loaded using libxml), it's better to use the SAX interface of libxml. SAX is
a callback-based interface to the parser. Before parsing,
the application layer registers a customized set of callbacks which are
called by the library as it progresses through the XML input.
To get more detailed step-by-step guidance on using the SAX interface of
libxml, see the nice
documentation.written by James
Henstridge.
You can debug the SAX behaviour by using the testSAX
program located in the gnome-xml module (it's usually not shipped in the
binary packages of libxml, but you can find it in the tar source
distribution). Here is the sequence of callbacks that would be reported by
testSAX when parsing the example XML document shown earlier:
SAX.setDocumentLocator()
SAX.startDocument()
SAX.getEntity(amp)
SAX.startElement(EXAMPLE, prop1='gnome is great', prop2='& linux too')
SAX.characters( , 3)
SAX.startElement(head)
SAX.characters( , 4)
SAX.startElement(title)
SAX.characters(Welcome to Gnome, 16)
SAX.endElement(title)
SAX.characters( , 3)
SAX.endElement(head)
SAX.characters( , 3)
SAX.startElement(chapter)
SAX.characters( , 4)
SAX.startElement(title)
SAX.characters(The Linux adventure, 19)
SAX.endElement(title)
SAX.characters( , 4)
SAX.startElement(p)
SAX.characters(bla bla bla ..., 15)
SAX.endElement(p)
SAX.characters( , 4)
SAX.startElement(image, href='linus.gif')
SAX.endElement(image)
SAX.characters( , 4)
SAX.startElement(p)
SAX.characters(..., 3)
SAX.endElement(p)
SAX.characters( , 3)
SAX.endElement(chapter)
SAX.characters( , 1)
SAX.endElement(EXAMPLE)
SAX.endDocument()
Most of the other interfaces of libxml are based on the DOM tree-building
facility, so nearly everything up to the end of this document presupposes the
use of the standard DOM tree build. Note that the DOM tree itself is built by
a set of registered default callbacks, without internal specific
interface.
Daniel Veillard
|