1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 20:25:14 +03:00
Commit Graph

21 Commits

Author SHA1 Message Date
Nick Wellnhofer
8d0aaf4b95 parser: Remove xmlErrEncoding
Use xmlFatalErr or xmlCtxtErrIO.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
c2bbeed1fd io: Fix memory lifetime issue with input buffers
xmlParserInputBufferCreateMem must make a copy of the buffer.

This fixes a regression from 2.11 which could cause reads from freed
memory depending on the use case.

Undeprecate xmlParserInputBufferCreateStatic which can avoid copying
the whole buffer.
2023-12-12 23:51:32 +01:00
Nick Wellnhofer
61034116d0 error: Make more xmlError structs constant
Prepare for future changes, see 45470611.
2023-10-24 15:02:36 +02:00
Nick Wellnhofer
5221fcd42d tests: Also test xmlNextChar in testchar.c 2023-10-22 16:32:54 +02:00
Nick Wellnhofer
028566745c parser: Remove redundant IS_CHAR check in xmlCurrentChar 2023-10-22 16:32:54 +02:00
Nick Wellnhofer
a0462e2d54 test: Add push parser test with overridden encoding
After recent changes, it should work to call xmlSwitchEncoding to
override the encoding for the push parser. This was never properly
supported, so Chromium and WebKit added a hack to reset the encoding in
the startDocument SAX handler.
2023-08-08 15:19:49 +02:00
Nick Wellnhofer
ec7be50662 parser: Rework encoding detection
Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set
when xmlSwitchEncoding is called. The parser can use the flag to
reliably detect whether an encoding was already set via user override,
BOM or other auto-detection. In this case, the encoding declaration
won't be used to switch the encoding.

Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding
and ctxt->input->buf->encoder was used.

Introduce private helper functions to switch encodings used by both the
XML and HTML parser:

- xmlDetectEncoding which skips over the BOM, allowing to remove the
  BOM checks from other encoding functions.
- xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns
  about encoding mismatches.

If users override the encoding, store the declared instead of the actual
encoding in xmlDoc. In this case, the actual encoding is known and the
raw value from the doc is more useful.

Also use the input flags to store the ISO-8859-1 fallback state.
Restrict the fallback to cases where no encoding was specified. (The
fallback is only useful in recovery mode and these days broken UTF-8 is
probably more likely than ISO-8859-1, so it might eventually be removed
completely.)

The 'charset' member of xmlParserCtxt is now unused. The 'encoding'
member of xmlParserInput is now unused.

The 'standalone' member of xmlParserInput is renamed to 'flags'.

A new parser state XML_PARSER_XML_DECL is added for the push parser.
2023-08-08 15:19:46 +02:00
Nick Wellnhofer
981093abd1 test: Add push parser tests for split UTF-8 sequences 2023-05-18 19:35:16 +02:00
Nick Wellnhofer
886bf4e63b Stop calling xmlMemoryDump
This was used to check for memory leaks but could potentially create a
.memdump file. These days, there are better ways to check for memory
leaks.
2023-04-30 15:48:41 +02:00
Nick Wellnhofer
3ffcc03b16 parser: Deprecate more internal functions 2023-04-26 20:23:23 +02:00
Nick Wellnhofer
b51b99ef83 testchar: Fix return value in testUserEncoding 2023-04-21 02:56:10 +02:00
Nick Wellnhofer
eca1116b81 testchar: Add test for memory pull parser with encoding 2023-04-20 15:31:20 +02:00
Nick Wellnhofer
59b3366178 error: Limit number of parser errors
Reporting errors is expensive and some abusive test cases can generate
an error for each invalid input byte. This causes the parser to spend
most of the time with error handling. Limit the number of errors and
warnings to 100.
2022-12-27 14:41:19 +01:00
Nick Wellnhofer
2059df5358 buf: Deprecate static/immutable buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Tony Tascioglu
41a1943057 Make testchar return an error on failure 2022-04-03 17:54:23 +02:00
Nick Wellnhofer
5948abfe99 Add explicit casts in testchar.c
Avoids integer conversion warnings with UBSan.
2022-01-25 01:59:03 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
145477d8ab Swicth the test program for characters to new input buffers
it was manipulating the buffer content and structures directly
this cleans it up
2012-07-23 14:24:27 +08:00
Daniel Veillard
abade01334 add a new regression test program for testing character ranges and UTF8
* Makefile.am testchar.c Makefile.tests README.tests: add a
  new regression test program for testing character ranges and
  UTF8 encoding/decoding
Daniel

svn path=/trunk/; revision=3754
2008-07-24 15:05:38 +00:00