The XML C parser and toolkit of Gnome

The parser interfaces

Developer Menu

API Indexes

Related links

This section is directly intended to help programmers getting bootstrappedusing the XML tollkit from the C language. It is not intended to beextensive. I hope the automatically generated documents will provide thecompleteness required, but as a separate set of documents. The interfaces ofthe XML parser are by principle low level, Those interested in a higher levelAPI should look at DOM.

The parser interfaces for XMLareseparated from the HTML parserinterfaces. Let's have a look at how the XML parser can be called:

Invoking the parser : the pull method

Usually, the first thing to do is to read an XML input. The parser acceptsdocuments either from in-memory strings or from files. The functions aredefined in "parser.h":

xmlDocPtr xmlParseMemory(char *buffer, int size);: Parse a null-terminated string containing the document.

xmlDocPtr xmlParseFile(const char *filename);: Parse an XML document contained in a (possibly compressed)file.

The parser returns a pointer to the document structure (or NULL in case offailure).

Invoking the parser: the push method

In order for the application to keep the control when the document isbeing fetched (which is common for GUI based programs) libxml2 provides apush interface, too, as of version 1.8.3. Here are the interfacefunctions:

xmlParserCtxtPtr xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax,
                                         void *user_data,
                                         const char *chunk,
                                         int size,
                                         const char *filename);
int              xmlParseChunk          (xmlParserCtxtPtr ctxt,
                                         const char *chunk,
                                         int size,
                                         int terminate);

and here is a simple example showing how to use the interface:

            FILE *f;

            f = fopen(filename, "r");
            if (f != NULL) {
                int res, size = 1024;
                char chars[1024];
                xmlParserCtxtPtr ctxt;

                res = fread(chars, 1, 4, f);
                if (res > 0) {
                    ctxt = xmlCreatePushParserCtxt(NULL, NULL,
                                chars, res, filename);
                    while ((res = fread(chars, 1, size, f)) > 0) {
                        xmlParseChunk(ctxt, chars, res, 0);
                    }
                    xmlParseChunk(ctxt, chars, 0, 1);
                    doc = ctxt->myDoc;
                    xmlFreeParserCtxt(ctxt);
                }
            }

The HTML parser embedded into libxml2 also has a push interface; thefunctions are just prefixed by "html" rather than "xml".

Invoking the parser: the SAX interface

The tree-building interface makes the parser memory-hungry, first loadingthe document in memory and then building the tree itself. Reading a documentwithout building the tree is possible using the SAX interfaces (see SAX.h andJamesHenstridge's documentation). Note also that the push interface can belimited to SAX: just use the two first arguments ofxmlCreatePushParserCtxt().

Building a tree from scratch

The other way to get an XML tree in memory is by building it. Basicallythere is a set of functions dedicated to building new elements. (These arealso described in <libxml/tree.h>.) For example, here is a piece ofcode that produces the XML document used in the previous examples:

    #include <libxml/tree.h>
    xmlDocPtr doc;
    xmlNodePtr tree, subtree;

    doc = xmlNewDoc("1.0");
    doc->children = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
    xmlSetProp(doc->children, "prop1", "gnome is great");
    xmlSetProp(doc->children, "prop2", "& linux too");
    tree = xmlNewChild(doc->children, NULL, "head", NULL);
    subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
    tree = xmlNewChild(doc->children, NULL, "chapter", NULL);
    subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
    subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
    subtree = xmlNewChild(tree, NULL, "image", NULL);
    xmlSetProp(subtree, "href", "linus.gif");

Not really rocket science ...

Traversing the tree

Basically by including "tree.h"yourcode has access to the internal structure of all the elements of the tree.The names should be somewhat simple like parent,children, next, prev,properties, etc... For example, still with the previousexample:

doc->children->children->children

points to the title element,

doc->children->children->next->children->children

points to the text node containing the chapter title "The Linuxadventure".

NOTE: XML allows PIs and commentsto bepresent before the document root, so doc->childrenmay pointto an element which is not the document Root Element; a functionxmlDocGetRootElement()was added for this purpose.

Modifying the tree

Functions are provided for reading and writing the document content. Hereis an excerpt from the tree API:

xmlAttrPtr xmlSetProp(xmlNodePtr node, const xmlChar *name, constxmlChar *value);: This sets (or changes) an attribute carried by an ELEMENT node.The value can be NULL.

const xmlChar *xmlGetProp(xmlNodePtr node, const xmlChar*name);: This function returns a pointer to new copy of the propertycontent. Note that the user must deallocate the result.

Two functions are provided for reading and writing the text associatedwith elements:

xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const xmlChar*value);: This function takes an "external" string and converts it to onetext node or possibly to a list of entity and text nodes. Allnon-predefined entity references like &Gnome; will be storedinternally as entity nodes, hence the result of the function may not bea single node.

xmlChar *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, intinLine);: This function is the inverse ofxmlStringGetNodeList(). It generates a new stringcontaining the content of the text and entity nodes. Note the extraargument inLine. If this argument is set to 1, the function will expandentity references. For example, instead of returning the &Gnome;XML encoding in the string, it will substitute it with its value (say,"GNU Network Object Model Environment").

Saving a tree

Basically 3 options are possible:

void xmlDocDumpMemory(xmlDocPtr cur, xmlChar**mem, int*size);: Returns a buffer into which the document has been saved.

extern void xmlDocDump(FILE *f, xmlDocPtr doc);: Dumps a document to an open file descriptor.

int xmlSaveFile(const char *filename, xmlDocPtr cur);: Saves the document to a file. In this case, the compressioninterface is triggered if it has been turned on.

Compression

The library transparently handles compression when doing file-basedaccesses. The level of compression on saves can be turned on either globallyor individually for one file:

int xmlGetDocCompressMode (xmlDocPtr doc);: Gets the document compression ratio (0-9).

void xmlSetDocCompressMode (xmlDocPtr doc, int mode);: Sets the document compression ratio.

int xmlGetCompressMode(void);: Gets the default compression ratio.

void xmlSetCompressMode(int mode);: Sets the default compression ratio.

Daniel Veillard