libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-04-09 14:50:07 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	bbd918b2e7	parser: Fix detection of null bytes Also suppress misleading extra errors. Fixes #122.	2023-08-29 18:43:10 +02:00
Nick Wellnhofer	c6083a32d6	parser: Improve error handling in push parser - Report errors earlier - Align error messages with pull parser	2023-08-29 18:41:05 +02:00
Nick Wellnhofer	1edae30f82	parser: Don't check inputNr in xmlParseTryOrFinish There's no apparent reason for this check. inputNr should always be 1 here.	2023-08-29 18:17:14 +02:00
Nick Wellnhofer	e48f2695fe	parser: Remove push parser debugging code	2023-08-29 18:17:09 +02:00
Nick Wellnhofer	cde4499778	SAX2: Allow multiple top-level elements When parsing with HTML_PARSE_NOIMPLIED, the result document can contain multiple top-level elements. Rework xmlSAX2StartElement to simply add the element as a child of ctxt->node or ctxt->myDoc. Don't invoke xmlAddSibling for non-element parents. The context node should always be an element node. Fixes #584.	2023-08-27 16:35:23 +02:00
Nick Wellnhofer	d39f78069d	tree: Fix copying of DTDs - Don't create multiple DTD nodes. - Fix UAF if malloc fails. - Skip DTD nodes if tree module is disabled. Fixes #583.	2023-08-23 20:43:14 +02:00
Nick Wellnhofer	4e4c89a4bc	doc: Improve documentation of configuration options	2023-08-21 11:13:33 +02:00
Nick Wellnhofer	778cca386d	legacy: Add stubs for disabled modules When legacy support is requested, always enable stubs for FTP and XPointer location modules which were removed from the standard configuration. Going forward, the --with-legacy configuration option should be used to provide maximum ABI compatibility. Fixes #433.	2023-08-20 23:16:12 +02:00
Nick Wellnhofer	ed3bd05284	parser: Allow to set maximum amplification factor	2023-08-20 20:49:16 +02:00
Nick Wellnhofer	9d80a2b134	entities: Don't change doc when encoding entities doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.	2023-08-17 12:47:14 +02:00
Nick Wellnhofer	f1c1f5c6b4	parser: Revert change to doc->encoding Fixes #579.	2023-08-17 12:47:14 +02:00
Nick Wellnhofer	61b8e097b9	parser: Never use UTF-8 encoding handler	2023-08-16 19:50:36 +02:00
Nick Wellnhofer	507f11edf0	encoding: Remove debugging code	2023-08-16 19:50:36 +02:00
Nick Wellnhofer	138213acdf	python: Fix tests on MinGW Add the directory containing libxml2.dll with os.add_dll_directory to make tests work on MinGW. This has changed in Python 3.8 but for some reason, the issue only turned up with Python 3.11 on MinGW. Contrary to documentation, copying libxml2.dll into the directory containing the .pyd file doesn't work.	2023-08-15 12:55:35 +02:00
Nick Wellnhofer	e2ab48b9b5	malloc-fail: Fix unsigned integer overflow in xmlTextReaderPushData Return immediately if xmlParserInputBufferRead fails. Found by OSS-Fuzz, see #344.	2023-08-14 15:06:31 +02:00
Nick Wellnhofer	0d24fc0a47	html: Remove encoding hack in htmlCreateFileParserCtxt Switch encoding directly instead of calling htmlCheckEncoding with faked content.	2023-08-14 12:53:49 +02:00
Nick Wellnhofer	5db5a704eb	html: Fix UAF in htmlCurrentChar Short-lived regression found by OSS-Fuzz.	2023-08-09 18:40:25 +02:00
Nick Wellnhofer	b973ceaf2f	parser: Fix mistake in xmlDetectEncoding Short-lived regression.	2023-08-09 18:40:25 +02:00
Nick Wellnhofer	cb717d7e02	parser: Update line number after coalescing text nodes This should make the line number of text nodes deterministic. Before, it depended on the callback sequence which depends on the size of chunks fed to the parser.	2023-08-09 16:58:33 +02:00
Nick Wellnhofer	855818bd2b	parser: Check for truncated multi-byte sequences When decoding input data, check whether the "raw" buffer is empty after parsing the document. Otherwise, the input ends with a truncated multi-byte sequence which shouldn't be silently ignored.	2023-08-08 15:21:37 +02:00
Nick Wellnhofer	95e81a360c	parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.	2023-08-08 15:21:31 +02:00
Nick Wellnhofer	834b8123ef	parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.	2023-08-08 15:21:28 +02:00
Nick Wellnhofer	5aff27ae78	parser: Optimize xmlLoadEntityContent Load entity content via xmlParserInputBufferGrow, avoiding a copy. This also fixes an entity size accounting error.	2023-08-08 15:21:25 +02:00
Nick Wellnhofer	facc2a06da	parser: Don't overwrite EOF parser state	2023-08-08 15:21:21 +02:00
Nick Wellnhofer	59fa0bb383	parser: Simplify input pointer updates The base member always points to the beginning of the buffer.	2023-08-08 15:21:14 +02:00
Nick Wellnhofer	c88ab7e329	parser: Don't reinitialize parser input members The parser input struct should already be initialized.	2023-08-08 15:19:54 +02:00
Nick Wellnhofer	4ee0815514	encoding: Move rawconsumed accounting to xmlCharEncInput	2023-08-08 15:19:51 +02:00
Nick Wellnhofer	a0462e2d54	test: Add push parser test with overridden encoding After recent changes, it should work to call xmlSwitchEncoding to override the encoding for the push parser. This was never properly supported, so Chromium and WebKit added a hack to reset the encoding in the startDocument SAX handler.	2023-08-08 15:19:49 +02:00
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00
Nick Wellnhofer	d38e73f91e	parser: Always create UTF-8 in xmlParseReference It seems that this code path could only be triggered after an encoding error in recovery mode. Creating char-ref nodes is unnecessary and typically unexpected.	2023-08-08 15:19:44 +02:00
Nick Wellnhofer	131d0dc0a7	parser: Don't use 'standalone' member of xmlParserInput The standalone declaration is only parsed in the main input stream.	2023-08-08 15:19:39 +02:00
Nick Wellnhofer	d9ec182b65	parser: Don't detect encoding in xmlCtxtResetPush The encoding will be detected in xmlParseTryOrFinish.	2023-08-08 15:19:36 +02:00
Nick Wellnhofer	3a64f39448	html: Remove some debugging code in htmlParseTryOrFinish	2023-08-08 15:19:25 +02:00
Nick Wellnhofer	58de9d31da	valid: Fix c1->parent pointer in xmlCopyDocElementContent Fixes #572.	2023-08-03 12:00:55 +02:00
Nick Wellnhofer	7569328138	malloc-fail: Fix memory leak in xmlCompileAttributeTest Found by OSS-Fuzz, see #344.	2023-07-21 14:50:30 +02:00
Nick Wellnhofer	90bcbcfcc7	parser: Fix potential use-after-free in xmlParseCharDataInternal Return immediately if a SAX handler stops the parser. Fixes #569.	2023-07-20 21:40:57 +02:00
Nick Wellnhofer	8844744772	parser: Fix typo in previous commit	2023-06-23 23:04:30 +02:00
Nick Wellnhofer	9d0541dd2f	parser: Make xmlSwitchEncoding always skip the BOM Chromium calls xmlSwitchEncoding from the start document handler and relies on this function to skip the BOM. Commit 98840d40 changed the behavior when switching to UTF-16 since inspecting the input buffer at this point is fragile. Revert part of the commit to also skip a potential (decoded UTF-8) BOM when switching to UTF-16. Make sure that we do this only at the start of an input stream to avoid U-FEFF characters being lost. BOM handling should ultimately be moved to the parsing code to avoid such bugs. See https://bugs.chromium.org/p/chromium/issues/detail?id=1451026	2023-06-22 18:22:32 +02:00
Christoph Reiter	2473b4855e	autotools: fix Python module file ext for cygwin/msys2 both use .dll, not .pyd	2023-06-21 14:38:38 +02:00
David Kilzer	5f54bac9eb	testapi: test_xmlSAXDefaultVersion() leaves xmlSAX2DefaultVersionValue set to 1 with LIBXML_SAX1_ENABLED Add code to save and to restore the default value of xmlSAX2DefaultVersionValue. Fixes #554.	2023-06-10 10:55:38 -07:00
Nick Wellnhofer	b236b7a588	parser: Halt parser when growing buffer results in OOM Fix short-lived regression from previous commit. It might be safer to make xmlBufSetInputBaseCur use the original buffer even in case of errors. Found by OSS-Fuzz.	2023-06-08 21:59:20 +02:00
Nick Wellnhofer	20f5c73457	parser: Recover more input from encoding errors Don't halt the parser in xmlParserGrow to allow more input to be recovered in case of encoding errors. Fixes #543.	2023-06-07 14:05:34 +02:00
Nick Wellnhofer	db21cd5db9	malloc-fail: Handle malloc failures in xmlAddEncodingAlias Avoid memory errors if an allocation fails. See #344. Fixes #553.	2023-06-06 14:25:30 +02:00
Nick Wellnhofer	305a75ccbe	malloc-fail: Fix null-deref with xmllint --copy See #344. Fixes #552.	2023-06-06 13:15:46 +02:00
Nick Wellnhofer	6273df6c6d	xpath: Ignore entity ref nodes when computing node hash XPath queries only work reliably if entities are substituted. Nevertheless, it's possible to query a document with entity reference nodes. xmllint even deletes entities when the `--dropdtd` option is passed, resulting in dangling pointers, so it's best to skip entity reference nodes to avoid a use-after-free. Fixes #550.	2023-05-30 12:30:27 +02:00
Nick Wellnhofer	e2f21c22d3	win32: Deprecate old Windows build system	2023-05-30 12:03:45 +02:00
Nick Wellnhofer	1e8ab6977d	gitlab-ci: Lower _XOPEN_SOURCE value	2023-05-25 03:25:48 +02:00
Nick Wellnhofer	cb8ccb1078	testapi: Don't set http_proxy environment variable We already disable network access, so this has no effect.	2023-05-25 03:17:45 +02:00
Nick Wellnhofer	9fd57df815	autotools: Improve iconv check Use a custom test program which includes iconv.h, so we can check whether the possibly redefined symbols in this header file match the symbols in the iconv library. Should fix #547.	2023-05-25 02:47:27 +02:00
Nick Wellnhofer	c3c6cc6202	runtest: Fix compilation without LIBXML_HTML_ENABLED Fixes #545.	2023-05-24 20:08:56 +02:00

... 3 4 5 6 7 ...

6202 Commits