libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-04-11 22:50:08 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	4365a5e115	xmlreader: Fix xmlTextReaderConstEncoding Regression from commit f1c1f5c6. Fixes #697.	2024-02-26 16:02:52 +01:00
Nick Wellnhofer	f76ee97a81	parser: Fix invalid free in xmlParseBalancedChunkMemoryRecover Set the dictionary for newDoc in xmlParseBalancedChunkMemoryRecover. This is a long-standing bug which was masked by - xmlParseBalancedChunkMemoryRecover changing the document of the root node. This is a really bad idea, resulting in a mismatch between ctxt->myDoc and ctxt->node->doc. - SAX2.c preferring ctxt->node->doc over ctxt->myDoc until commit a31e1b06. Fixes #641.	2023-12-01 20:20:07 +01:00
Nick Wellnhofer	a31e1b0665	SAX2: Fix quadratic behavior in xmlSAX2AttributeNs The last missing piece to make parsing of attributes O(n).	2023-11-04 20:21:54 +01:00
Nick Wellnhofer	e0dd330b8f	parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.	2023-09-29 12:43:22 +02:00
Nick Wellnhofer	da274bfa55	build: Fix build when certain modules are disabled	2023-09-21 02:26:43 +02:00
Nick Wellnhofer	9b5cce7a71	include: Remove more unnecessary includes	2023-09-21 01:50:53 +02:00
Nick Wellnhofer	699299cae3	globals: Stop including globals.h	2023-09-20 22:07:40 +02:00
Nick Wellnhofer	a77f9ab84c	globals: Don't include SAX2.h from globals.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	4e1c13ebfd	debug: Remove debugging code This is barely useful these days and only clutters the code base.	2023-09-19 17:35:09 +02:00
Nick Wellnhofer	cde4499778	SAX2: Allow multiple top-level elements When parsing with HTML_PARSE_NOIMPLIED, the result document can contain multiple top-level elements. Rework xmlSAX2StartElement to simply add the element as a child of ctxt->node or ctxt->myDoc. Don't invoke xmlAddSibling for non-element parents. The context node should always be an element node. Fixes #584.	2023-08-27 16:35:23 +02:00
Nick Wellnhofer	f1c1f5c6b4	parser: Revert change to doc->encoding Fixes #579.	2023-08-17 12:47:14 +02:00
Nick Wellnhofer	cb717d7e02	parser: Update line number after coalescing text nodes This should make the line number of text nodes deterministic. Before, it depended on the callback sequence which depends on the size of chunks fed to the parser.	2023-08-09 16:58:33 +02:00
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00
Nick Wellnhofer	d38e73f91e	parser: Always create UTF-8 in xmlParseReference It seems that this code path could only be triggered after an encoding error in recovery mode. Creating char-ref nodes is unnecessary and typically unexpected.	2023-08-08 15:19:44 +02:00
Nick Wellnhofer	b8961df65d	SAX: Always validate xml:ids The behavior shouldn't depend on mostly random configuration options.	2023-05-09 03:25:24 +02:00
Nick Wellnhofer	235b15a590	SAX: Always initialize SAX1 element handlers Follow-up to commit d0c3f01e. A parser context will be initialized to SAX version 2, but this can be overridden with XML_PARSE_SAX1 later, so we must initialize the SAX1 element handlers as well. Change the check in xmlDetectSAX2 to only look for XML_SAX2_MAGIC, so we don't switch to SAX1 if the SAX2 element handlers are NULL.	2023-05-08 19:15:44 +02:00
Nick Wellnhofer	250faf3c83	parser: Fix regression in xmlParserNodeInfo accounting Commit 62150ed2 broke begin_pos and begin_line when extra node info was recorded. Fixes #523.	2023-04-20 15:38:00 +02:00
Nick Wellnhofer	d7d0bc6581	SAX2: Ignore namespaces in HTML documents In commit 21ca8829, we started to ignore namespaces in HTML element names but we still called xmlSplitQName, effectively stripping the namespace prefix. This would cause elements like <o:p> being parsed as <p>. Now we leave the name untouched. Fixes #508.	2023-03-31 17:08:43 +02:00
Nick Wellnhofer	cb4334b7ab	malloc-fail: Fix memory leak in xmlSAX2StartElementNs Found with libFuzzer, see #344.	2023-02-17 17:16:51 +01:00
Nick Wellnhofer	0c5f40b788	malloc-fail: Fix null deref in xmlSAX2AttributeInternal Found with libFuzzer, see #344.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	b3b53dcce4	malloc-fail: Fix null deref in xmlSAX2Text Found with libFuzzer, see #344.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	463bbeeca1	entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	cecd364dd2	parser: Don't call *DefaultSAXHandlerInit from xmlInitParser Change the default handler definitions to match the result after calling the initialization functions. This makes sure that no thread-local variables are accessed when calling xmlInitParser.	2022-11-25 15:02:04 +01:00
Nick Wellnhofer	68a6518c45	parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.	2022-11-20 21:27:08 +01:00
Nick Wellnhofer	7ceaee9430	malloc-fail: Fix memory leak in xmlSAX2ExternalSubset Found with libFuzzer, see #344.	2022-11-02 16:05:05 +01:00
Nick Wellnhofer	81621b1fe4	Fix compiler warnings in SAX2.c	2022-09-02 18:44:59 +02:00
Nick Wellnhofer	ad338ca737	Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.	2022-09-01 02:33:57 +02:00
Nick Wellnhofer	aeb69fd357	Fix overflow check in SAX2.c	2022-09-01 02:33:57 +02:00
Nick Wellnhofer	0f568c0b73	Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.	2022-08-26 02:11:56 +02:00
Nick Wellnhofer	0e49f8826a	Mark most SAX1 functions as deprecated No compiler warnings generated yet.	2022-08-24 14:07:57 +02:00
Nick Wellnhofer	4b184240be	Remove htmlDefaultSAXHandler from non-SAX1 build This matches long-standing behavior of the XML counterpart.	2022-08-22 14:24:25 +02:00
Nick Wellnhofer	3e7b4f37aa	Avoid calling xmlSetTreeDoc Create text nodes with xmlNewDocText or set the document directly to avoid xmlSetTreeDoc being called when the node is inserted.	2022-06-20 01:49:39 +02:00
Nick Wellnhofer	40483d0ce2	Deprecate module init and cleanup functions These functions shouldn't be part of the public API. Most init functions are only thread-safe when called from xmlInitParser. Global variables should only be cleaned up by calling xmlCleanupParser.	2022-03-06 15:59:43 +01:00
Nick Wellnhofer	4a8c71eb7c	Remove DOCBparser This code has been broken and deprecated since version 2.6.0, released in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never compiled since 2012. I couldn't find a Debian package using any of its symbols, so it seems safe to remove this module.	2022-03-04 22:56:21 +01:00
Nick Wellnhofer	c41bc10da3	Fix unused variable warnings with disabled features	2022-02-22 19:57:12 +01:00
Nick Wellnhofer	346c3a930c	Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	e03590c9ad	Don't add IDs containing unexpanded entity references When parsing without entity substitution, IDs or IDREFs containing unexpanded entity reference like "abc&x;def" could be created. We could try to expand these entities like in validation mode, but it seems safer to honor the request not to expand entities. We silently ignore such IDs for now.	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	d7cb33cf44	Rework validation context flags Use a bitmask instead of magic values to - keep track whether the validation context is part of a parser context - keep track whether xmlValidateDtdFinal was called This allows to add addtional flags later. Note that this deliberately changes the name of a public struct member, assuming that this was always private data never to be used by client code.	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	a647e43025	Fix casting of line numbers in SAX2.c The line member is an unsigned short. Avoids integer conversion warnings with UBSan. Also use USHRT_MAX instead of hard-coded constant.	2022-01-25 03:20:28 +01:00
David King	92bce68c0d	Fix memory leak in xmlSAX2AttributeDecl Found by Coverity. https://bugzilla.redhat.com/show_bug.cgi?id=1938806	2022-01-16 14:11:28 +01:00
Nick Wellnhofer	acb3566739	Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.	2021-02-03 13:57:26 +01:00
Nick Wellnhofer	21ca8829a7	Don't try to handle namespaces when building HTML documents Don't try to resolve namespace in xmlSAX2StartElement when parsing HTML documents. This useless operation could slow down the parser considerably. Found by OSS-Fuzz.	2020-07-25 17:57:29 +02:00
Nick Wellnhofer	20c60886e4	Fix typos Resolves #133.	2020-03-08 17:41:53 +01:00
Nick Wellnhofer	eddfbc38fa	Don't load external entity from xmlSAX2GetEntity Despite the comment, I can't see a reason why external entities must be loaded in the SAX handler. For external entities, the handler is typically first invoked via xmlParseReference which will later load the entity on its own if it wasn't loaded yet. The old code also lead to duplicated SAX events which makes it basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser. See the change to the expected test output. Note that xmlSAX2GetEntity was loading the entity via xmlParseCtxtExternalEntity while xmlParseReference uses xmlParseExternalEntityPrivate. In the previous commit, the two functions were merged, trying to compensate for some slight differences between the two mostly identical implementations. But the more urgent reason for this change is that xmlParseReference has the facility to abort early when recursive entities are detected, avoiding what could practically amount to an infinite loop. If you want to backport this change, note that the previous three commits are required as well: f9ea1a24 Fix copying of entities in xmlParseReference 5c7e0a9a Copy some XMLReader option flags to parser context 1a3e584a Merge code paths loading external entities Found by OSS-Fuzz.	2020-02-11 17:35:42 +01:00
Jared Yanovich	2a350ee9b4	Large batch of typo fixes Closes #109.	2019-09-30 18:04:38 +02:00
Nick Wellnhofer	6b49db2cb2	Fix memory leak in xmlSAX2StartElement Introduced by a recent commit. Only happens if max depth is exceeded in SAX1 mode. Found by OSS-Fuzz.	2019-01-07 18:07:00 +01:00
Nick Wellnhofer	1567b55b72	Set doc on element obtained from freeElems In commit 8c9daf79, a call to xmlFreeNode was added in xmlSAX2StartElementNs. If a node was obtained from the freeElems list, make sure to set the doc, otherwise xmlFreeNode wouldn't realize that the node name might be in the dictionary, causing an invalid free. Note that the issue fixed in commit 8c9daf79 requires commit 0ed6addb and this one to work properly. Found by OSS-Fuzz.	2018-11-22 16:28:46 +01:00
Nick Wellnhofer	0ed6addb8f	Unlink node before freeing it in xmlSAX2StartElement The node may have been added to the document already, so it must be unlinked first. Thanks to David Kilzer for spotting this.	2018-09-22 15:41:01 +02:00
Nick Wellnhofer	8c9daf790a	Check return value of nodePush in xmlSAX2StartElement If the maximum depth is exceeded, nodePush halts the parser which results in freeing the input buffer since the previous commit. This invalidates the attribute pointers, so the error condition must be checked. Found by OSS-Fuzz.	2018-09-12 13:52:47 +02:00
Nick Wellnhofer	d422b954be	Fix pointer/int cast warnings on 64-bit Windows On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer. Switch to ptrdiff_t instead which should be the same size as a pointer on every somewhat sane platform without requiring C99 types like intptr_t. Fixes bug 788312. Thanks to J. Peter Mugaas for the report and initial patch.	2017-10-09 13:47:49 +02:00

1 2 3

144 Commits