HTMLparser

Gnome XML Library Reference Manual
<<< Previous Page	Home	Up	Next Page >>>

Details

htmlParserCtxt

typedef xmlParserCtxt htmlParserCtxt;

htmlParserCtxtPtr

typedef xmlParserCtxtPtr htmlParserCtxtPtr;

htmlParserNodeInfo

typedef xmlParserNodeInfo htmlParserNodeInfo;

htmlSAXHandler

typedef xmlSAXHandler htmlSAXHandler;

htmlSAXHandlerPtr

typedef xmlSAXHandlerPtr htmlSAXHandlerPtr;

htmlParserInput

typedef xmlParserInput htmlParserInput;

htmlParserInputPtr

typedef xmlParserInputPtr htmlParserInputPtr;

htmlDocPtr

typedef xmlDocPtr htmlDocPtr;

htmlNodePtr

typedef xmlNodePtr htmlNodePtr;

struct htmlElemDesc

struct htmlElemDesc {
    const char *name;	/* The tag name */
    char startTag;      /* Whether the start tag can be implied */
    char endTag;        /* Whether the end tag can be implied */
    char saveEndTag;    /* Whether the end tag should be saved */
    char empty;         /* Is this an empty element ? */
    char depr;          /* Is this a deprecated element ? */
    char dtd;           /* 1: only in Loose DTD, 2: only Frameset one */
    char isinline;      /* is this a block 0 or inline 1 element */
    const char *desc;   /* the description */
};

htmlElemDescPtr

typedef htmlElemDesc *htmlElemDescPtr;

struct htmlEntityDesc

struct htmlEntityDesc {
    unsigned int value;	/* the UNICODE value for the character */
    const char *name;	/* The entity name */
    const char *desc;   /* the description */
};

htmlEntityDescPtr

typedef htmlEntityDesc *htmlEntityDescPtr;

htmlTagLookup ()

const htmlElemDesc* htmlTagLookup           (const xmlChar *tag);

Lookup the HTML tag in the ElementTable

`tag` :	The tag name in lowercase
Returns :	the related htmlElemDescPtr or NULL if not found.

htmlEntityLookup ()

const htmlEntityDesc* htmlEntityLookup      (const xmlChar *name);

Lookup the given entity in EntitiesTable

TODO: the linear scan is really ugly, an hash table is really needed.

`name` :	the entity name
Returns :	the associated htmlEntityDescPtr if found, NULL otherwise.

htmlEntityValueLookup ()

const htmlEntityDesc* htmlEntityValueLookup (unsigned int value);

Lookup the given entity in EntitiesTable

TODO: the linear scan is really ugly, an hash table is really needed.

`value` :	the entity's unicode value
Returns :	the associated htmlEntityDescPtr if found, NULL otherwise.

htmlIsAutoClosed ()

int         htmlIsAutoClosed                (htmlDocPtr doc,
                                             htmlNodePtr elem);

The HTML DTD allows a tag to implicitly close other tags. The list is kept in htmlStartClose array. This function checks if a tag is autoclosed by one of it's child

`doc` :	the HTML document
`elem` :	the HTML element
Returns :	1 if autoclosed, 0 otherwise

htmlAutoCloseTag ()

int         htmlAutoCloseTag                (htmlDocPtr doc,
                                             const xmlChar *name,
                                             htmlNodePtr elem);

The HTML DTD allows a tag to implicitly close other tags. The list is kept in htmlStartClose array. This function checks if the element or one of it's children would autoclose the given tag.

`doc` :	the HTML document
`name` :	The tag name
`elem` :	the HTML element
Returns :	1 if autoclose, 0 otherwise

htmlParseEntityRef ()

const htmlEntityDesc* htmlParseEntityRef    (htmlParserCtxtPtr ctxt,
                                             xmlChar **str);

parse an HTML ENTITY references

[68] EntityRef ::= '&' Name ';'

`ctxt` :	an HTML parser context
`str` :	location to store the entity name
Returns :	the associated htmlEntityDescPtr if found, or NULL otherwise, if non-NULL *str will have to be freed by the caller.

htmlParseCharRef ()

int         htmlParseCharRef                (htmlParserCtxtPtr ctxt);

parse Reference declarations

[66] CharRef ::= '&#' [0-9]+ ';' | '&x' [0-9a-fA-F]+ ';'

`ctxt` :	an HTML parser context
Returns :	the value parsed (as an int)

htmlParseElement ()

void        htmlParseElement                (htmlParserCtxtPtr ctxt);

parse an HTML element, this is highly recursive

[39] element ::= EmptyElemTag | STag content ETag

[41] Attribute ::= Name Eq AttValue

ctxt : an HTML parser context

htmlParseDocument ()

int         htmlParseDocument               (htmlParserCtxtPtr ctxt);

parse an HTML document (and build a tree if using the standard SAX interface).

`ctxt` :	an HTML parser context
Returns :	0, -1 in case of error. the parser context is augmented as a result of the parsing.

htmlSAXParseDoc ()

htmlDocPtr  htmlSAXParseDoc                 (xmlChar *cur,
                                             const char *encoding,
                                             htmlSAXHandlerPtr sax,
                                             void *userData);

Parse an HTML in-memory document. If sax is not NULL, use the SAX callbacks to handle parse events. If sax is NULL, fallback to the default DOM behavior and return a tree.

`cur` :	a pointer to an array of xmlChar
`encoding` :	a free form C string describing the HTML document encoding, or NULL
`sax` :	the SAX handler block
`userData` :	if using SAX, this pointer will be provided on callbacks.
Returns :	the resulting document tree unless SAX is NULL or the document is not well formed.

htmlParseDoc ()

htmlDocPtr  htmlParseDoc                    (xmlChar *cur,
                                             const char *encoding);

parse an HTML in-memory document and build a tree.

`cur` :	a pointer to an array of xmlChar
`encoding` :	a free form C string describing the HTML document encoding, or NULL
Returns :	the resulting document tree

htmlSAXParseFile ()

htmlDocPtr  htmlSAXParseFile                (const char *filename,
                                             const char *encoding,
                                             htmlSAXHandlerPtr sax,
                                             void *userData);

parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time. It use the given SAX function block to handle the parsing callback. If sax is NULL, fallback to the default DOM tree building routines.

`filename` :	the filename
`encoding` :	a free form C string describing the HTML document encoding, or NULL
`sax` :	the SAX handler block
`userData` :	if using SAX, this pointer will be provided on callbacks.
Returns :	the resulting document tree unless SAX is NULL or the document is not well formed.

htmlParseFile ()

htmlDocPtr  htmlParseFile                   (const char *filename,
                                             const char *encoding);

parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time.

`filename` :	the filename
`encoding` :	a free form C string describing the HTML document encoding, or NULL
Returns :	the resulting document tree

UTF8ToHtml ()

int         UTF8ToHtml                      (unsigned char *out,
                                             int *outlen,
                                             unsigned char *in,
                                             int *inlen);

Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.

`out` :	a pointer to an array of bytes to store the result
`outlen` :	the length of `out`
`in` :	a pointer to an array of UTF-8 chars
`inlen` :	the length of `in`
Returns :	0 if success, -2 if the transcoding fails, or -1 otherwise The value of `inlen` after return is the number of octets consumed as the return value is positive, else unpredictable. The value of `outlen` after return is the number of octets consumed.

htmlEncodeEntities ()

int         htmlEncodeEntities              (unsigned char *out,
                                             int *outlen,
                                             unsigned char *in,
                                             int *inlen,
                                             int quoteChar);