Details
enum xmlCharEncoding
typedef enum {
XML_CHAR_ENCODING_ERROR= -1, /* No char encoding detected */
XML_CHAR_ENCODING_NONE= 0, /* No char encoding detected */
XML_CHAR_ENCODING_UTF8= 1, /* UTF-8 */
XML_CHAR_ENCODING_UTF16LE= 2, /* UTF-16 little endian */
XML_CHAR_ENCODING_UTF16BE= 3, /* UTF-16 big endian */
XML_CHAR_ENCODING_UCS4LE= 4, /* UCS-4 little endian */
XML_CHAR_ENCODING_UCS4BE= 5, /* UCS-4 big endian */
XML_CHAR_ENCODING_EBCDIC= 6, /* EBCDIC uh! */
XML_CHAR_ENCODING_UCS4_2143=7, /* UCS-4 unusual ordering */
XML_CHAR_ENCODING_UCS4_3412=8, /* UCS-4 unusual ordering */
XML_CHAR_ENCODING_UCS2= 9, /* UCS-2 */
XML_CHAR_ENCODING_8859_1= 10,/* ISO-8859-1 ISO Latin 1 */
XML_CHAR_ENCODING_8859_2= 11,/* ISO-8859-2 ISO Latin 2 */
XML_CHAR_ENCODING_8859_3= 12,/* ISO-8859-3 */
XML_CHAR_ENCODING_8859_4= 13,/* ISO-8859-4 */
XML_CHAR_ENCODING_8859_5= 14,/* ISO-8859-5 */
XML_CHAR_ENCODING_8859_6= 15,/* ISO-8859-6 */
XML_CHAR_ENCODING_8859_7= 16,/* ISO-8859-7 */
XML_CHAR_ENCODING_8859_8= 17,/* ISO-8859-8 */
XML_CHAR_ENCODING_8859_9= 18,/* ISO-8859-9 */
XML_CHAR_ENCODING_2022_JP= 19,/* ISO-2022-JP */
XML_CHAR_ENCODING_SHIFT_JIS=20,/* Shift_JIS */
XML_CHAR_ENCODING_EUC_JP= 21,/* EUC-JP */
XML_CHAR_ENCODING_ASCII= 22 /* pure ASCII */
} xmlCharEncoding; |
Predefined values for some standard encodings
Libxml don't do beforehand translation on UTF8, ISOLatinX
It also support UTF16 (LE and BE) by default.
Anything else would have to be translated to UTF8 before being
given to the parser itself. The BOM for UTF16 and the encoding
declaration are looked at and a converter is looked for at that
point. If not found the parser stops here as asked by the XML REC
Converter can be registered by the user using xmlRegisterCharEncodingHandler
but the currentl form doesn't allow stateful transcoding (a serious
problem agreed !). If iconv has been found it will be used
automatically and allow stateful transcoding, the simplest is then
to be sure to enable icon and to provide iconv libs for the encoding
support needed.
xmlCharEncodingInputFunc ()
int (*xmlCharEncodingInputFunc) (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen); |
Take a block of chars in the original encoding and try to convert
it to an UTF-8 block of chars out.
xmlCharEncodingOutputFunc ()
int (*xmlCharEncodingOutputFunc) (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen); |
Take a block of UTF-8 chars in and try to convert it to an other
encoding.
Note: a first call designed to produce heading info is called with
in = NULL. If stateful this should also initialize the encoder state
struct xmlCharEncodingHandler
struct xmlCharEncodingHandler {
char *name;
xmlCharEncodingInputFunc input;
xmlCharEncodingOutputFunc output;
#ifdef LIBXML_ICONV_ENABLED
iconv_t iconv_in;
iconv_t iconv_out;
#endif /* LIBXML_ICONV_ENABLED */
}; |
xmlCharEncodingHandlerPtr
typedef xmlCharEncodingHandler *xmlCharEncodingHandlerPtr; |
xmlInitCharEncodingHandlers ()
void xmlInitCharEncodingHandlers (void); |
Initialize the char encoding support, it registers the default
encoding supported.
NOTE: while public, this function usually doesn't need to be called
in normal processing.
xmlCleanupCharEncodingHandlers ()
void xmlCleanupCharEncodingHandlers (void); |
Cleanup the memory allocated for the char encoding support, it
unregisters all the encoding handlers and the aliases.
xmlRegisterCharEncodingHandler ()
Register the char encoding handler, surprizing, isn't it ?
xmlGetCharEncodingHandler ()
Search in the registrered set the handler able to read/write that encoding.
xmlFindCharEncodingHandler ()
Search in the registrered set the handler able to read/write that encoding.
xmlAddEncodingAlias ()
int xmlAddEncodingAlias (const char *name,
const char *alias); |
Registers and alias alias for an encoding named name. Existing alias
will be overwritten.
xmlDelEncodingAlias ()
int xmlDelEncodingAlias (const char *alias); |
Unregisters an encoding alias alias
xmlGetEncodingAlias ()
const char* xmlGetEncodingAlias (const char *alias); |
Lookup an encoding name for the given alias.
xmlCleanupEncodingAliases ()
void xmlCleanupEncodingAliases (void); |
Unregisters all aliases
xmlParseCharEncoding ()
Conpare the string to the known encoding schemes already known. Note
that the comparison is case insensitive accordingly to the section
[XML] 4.3.3 Character Encoding in Entities.
xmlGetCharEncodingName ()
The "canonical" name for XML encoding.
C.f. http://www.w3.org/TR/REC-xmlcharencoding
Section 4.3.3 Character Encoding in Entities
xmlDetectCharEncoding ()
Guess the encoding of the entity using the first bytes of the entity content
accordingly of the non-normative appendix F of the XML-1.0 recommendation.
xmlCharEncOutFunc ()
Generic front-end for the encoding handler output function
a first call with in == NULL has to be made firs to initiate the
output in case of non-stateless encoding needing to initiate their
state or the output (like the BOM in UTF16).
In case of UTF8 sequence conversion errors for the given encoder,
the content will be automatically remapped to a CharRef sequence.
xmlCharEncInFunc ()
Generic front-end for the encoding handler input function
xmlCharEncFirstLine ()
Front-end for the encoding handler input function, but handle only
the very first line, i.e. limit itself to 45 chars.
xmlCharEncCloseFunc ()
Generic front-end for hencoding handler close function
UTF8Toisolat1 ()
int UTF8Toisolat1 (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen); |
Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1
block of chars out.
isolat1ToUTF8 ()
int isolat1ToUTF8 (unsigned char *out,
int *outlen,
unsigned char *in,
int *inlen); |
Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8
block of chars out.
xmlCheckUTF8 ()
int xmlCheckUTF8 (unsigned char *utf); |
Checks utf for being valid utf-8. utf is assumed to be
null-terminated. This function is not super-strict, as it will
allow longer utf-8 sequences than necessary. Note that Java is
capable of producing these sequences if provoked. Also note, this
routine checks for the 4-byte maxiumum size, but does not check for
0x10ffff maximum value.
xmlUTF8Strsize ()
int xmlUTF8Strsize (const xmlChar *utf,
int len); |
storage size of an UTF8 string
xmlUTF8Strndup ()
a strndup for array of UTF8's
xmlUTF8Strpos ()
a function to provide the equivalent of fetching a
character from a string array
xmlUTF8Strloc ()
a function to provide relative location of a UTF8 char
xmlUTF8Strsub ()
Note: positions are given in units of UTF-8 chars
xmlUTF8Strlen ()
int xmlUTF8Strlen (const xmlChar *utf); |
compute the length of an UTF8 string, it doesn't do a full UTF8
checking of the content of the string.