libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 12:25:09 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	1e5375c1b4	SAX2: Check return value of xmlPushInput Fix null deref in case of malloc failure.	2024-07-06 15:33:06 +02:00
Nick Wellnhofer	80aabea1d6	SAX2: Reenable 'directory' as base URI fallback Apparently, some users overwrite this member manually to set a base URI for memory streams. Fixes #753.	2024-07-03 11:55:38 +02:00
Nick Wellnhofer	842a044831	valid: Restore ID lookup Revert a change from `d025cfbb` and don't overwrite ID table entries, so that the first attribute will be returned if there are duplicate IDs. This requires two other changes: - Attributes in entity content are never added to the ID table. This seems reasonable. - Remove the optimization to skip ID lookup when copying and the target document has an empty ID table. This also seems more correct since the document could have ID declarations nevertheless or we could be copying xml:ids into the document for the first time. Fixes #757.	2024-07-03 11:46:06 +02:00
Nick Wellnhofer	f906526175	SAX2: Fix HTML IDs Short-lived regression. Fixes #755.	2024-07-02 23:59:28 +02:00
Nick Wellnhofer	866be54e22	parser: Don't use deprecated xmlSplitQName	2024-07-02 13:34:11 +02:00
Nick Wellnhofer	16e7ecd478	xinclude: Check URI length Don't report long URIs as OOM errors.	2024-07-01 18:03:06 +02:00
Nick Wellnhofer	f505dcaea0	tree: Remove underscores from xmlRegisterCallbacks	2024-06-27 14:45:35 +02:00
Nick Wellnhofer	8b1f79cea0	SAX2: Make xmlSAXDefaultVersion a no-op	2024-06-27 14:44:55 +02:00
Nick Wellnhofer	5cf5b542d9	SAX2: Deprecate xmlSAX2StartElement	2024-06-27 14:44:55 +02:00
Nick Wellnhofer	860fb460ea	SAX2: Fix null deref after malloc failure Short-lived regression.	2024-06-18 20:00:45 +02:00
Nick Wellnhofer	faae3a91ce	SAX2: Split out legacy SAX1 handling Split xmlSAX2StartElement into two functions handling legacy SAX1 and HTML.	2024-06-17 00:54:47 +02:00
Nick Wellnhofer	11c3f84b6c	SAX2: Always make xmlSAX2{Start,End}Element public Simplify symbol availability logic.	2024-06-16 18:47:12 +02:00
Nick Wellnhofer	5238404325	parser: Pass resource type to resource loader	2024-06-12 16:36:12 +02:00
Nick Wellnhofer	64ad272525	parser: Introduce per-context resource loader	2024-06-12 16:22:52 +02:00
Nick Wellnhofer	4ff2dccf9f	SAX2: Warn if URI resolution failed	2024-05-13 12:50:08 +02:00
Nick Wellnhofer	71a7a33e18	parser: Fix base URI of internal parameter entities Search parent inputs of internal parameter entities for base URI. Fixes a long-standing bug, which manifested in a different way after commit `955c177f`. Reproduce with xmllint --noent xmlconf/eduni/errata-2e/E18.xml	2024-05-03 11:53:45 +02:00
Nick Wellnhofer	af2bda4e87	SAX2: Also check URI length before resolving We don't want to exceed the size limit of 1 MB in uri.c. Such errors can't be distinguished from malloc failures.	2024-04-05 13:09:45 +02:00
Nick Wellnhofer	2cc7f71016	SAX2: Fix xmlSAX2EntityDecl with empty base Short-lived regression.	2024-03-29 13:44:28 +01:00
Nick Wellnhofer	730de88b16	SAX2: Optimize appending children xmlSAX2AppendChild can make several assumptions which make appending nodes more efficient. Also handle line numbers in xmlSAX2AppendChild.	2024-03-29 12:59:20 +01:00
Nick Wellnhofer	05c147c3ef	SAX2: Report malloc failure in xmlSAX2AttributeNs	2024-03-22 13:03:37 +01:00
Nick Wellnhofer	6a49bb777c	tree: Introduce xmlSearchNsSafe After the failed experiment with a static XML namespace, introduce versions of xmlSearchNs that report malloc failures. Optimize the no-document case by only adding the XML namespace declaration if it wasn't found in an ancestor.	2024-03-17 21:07:46 +01:00
Nick Wellnhofer	047ea3ecb3	Revert "tree: Allocate XML namespace statically" This reverts commit `2840e33c5e`.	2024-03-17 21:04:40 +01:00
Nick Wellnhofer	9f049afa6d	tree: Refactor element creation and parsing of attribute values Replace xmlStringGetNodeList and xmlStringLenGetNodeList with xmlNodeParseContentInternal which also updates an optional parent node. Don't look up entities a second time via xmlNewReference.	2024-03-15 19:54:26 +01:00
Nick Wellnhofer	2840e33c5e	tree: Allocate XML namespace statically	2024-03-15 19:47:07 +01:00
Nick Wellnhofer	84a71860a8	xmlreader: Fix xmlTextReaderConstEncoding Regression from commit `f1c1f5c6`. Fixes #697.	2024-02-26 15:33:06 +01:00
Nick Wellnhofer	7dc8600a7f	SAX2: Report malloc failure in xmlCheckDefaultedAttributes	2024-02-20 12:32:17 +01:00
Nick Wellnhofer	2e19d0ef04	SAX2: Make sure that OOM errors aren't overwritten	2024-01-26 11:39:51 +01:00
Nick Wellnhofer	57c687592f	SAX2: Limit entity URI length to 2000 bytes Avoid quadratic behavior when loading entities with long URIs multiple times. This limitation could be dropped if we cached external entities.	2024-01-10 15:58:23 +01:00
Nick Wellnhofer	02cc5c3609	parser: Add XML_PARSE_NO_XXE parser option	2024-01-05 20:39:40 +01:00
Nick Wellnhofer	9912c36904	SAX2: Enforce size limit in xmlSAX2Text with XML_PARSE_HUGE	2024-01-02 19:48:23 +01:00
Nick Wellnhofer	37c6618be5	parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.	2024-01-02 15:42:03 +01:00
Nick Wellnhofer	6a9a88a17f	parser: Move progressive flag into input struct	2023-12-29 01:20:08 +01:00
Nick Wellnhofer	d944a41515	parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.	2023-12-29 01:19:56 +01:00
Nick Wellnhofer	5f319304c8	SAX2: Fix error code Today I learned that the TSCII character encoding [1] can blow up the size of text 12 times when converted to UTF-8: $ printf '\x82' \|iconv -f TSCII -t UTF-8 \|hexdump -C 00000000 e0 ae b8 e0 af 8d e0 ae b0 e0 af 80 0000000c [1] https://en.wikipedia.org/wiki/Tamil_Script_Code_for_Information_Interchange	2023-12-28 19:43:48 +01:00
Nick Wellnhofer	955c177f69	parser: Stop using 'directory' struct member This was only used as a pointless fallback for URI resolution.	2023-12-25 23:38:40 +01:00
Nick Wellnhofer	130436917c	parser: Rename xmlErrParser to xmlCtxtErr	2023-12-21 15:02:24 +01:00
Nick Wellnhofer	54c70ed57f	parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.	2023-12-21 02:46:27 +01:00
Nick Wellnhofer	e58ea29f17	SAX2: Report malloc failures Fix many places where malloc failures aren't reported. Improve error handling when parsing entity declarations. Fixes #308.	2023-12-11 22:13:05 +01:00
Nick Wellnhofer	7f00273cf0	parser: Fix invalid free in xmlParseBalancedChunkMemoryRecover Set the dictionary for newDoc in xmlParseBalancedChunkMemoryRecover. This is a long-standing bug which was masked by - xmlParseBalancedChunkMemoryRecover changing the document of the root node. This is a really bad idea, resulting in a mismatch between ctxt->myDoc and ctxt->node->doc. - SAX2.c preferring ctxt->node->doc over ctxt->myDoc until commit `a31e1b06`. Fixes #641.	2023-12-01 19:44:37 +01:00
Nick Wellnhofer	a31e1b0665	SAX2: Fix quadratic behavior in xmlSAX2AttributeNs The last missing piece to make parsing of attributes O(n).	2023-11-04 20:21:54 +01:00
Nick Wellnhofer	e0dd330b8f	parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.	2023-09-29 12:43:22 +02:00
Nick Wellnhofer	da274bfa55	build: Fix build when certain modules are disabled	2023-09-21 02:26:43 +02:00
Nick Wellnhofer	9b5cce7a71	include: Remove more unnecessary includes	2023-09-21 01:50:53 +02:00
Nick Wellnhofer	699299cae3	globals: Stop including globals.h	2023-09-20 22:07:40 +02:00
Nick Wellnhofer	a77f9ab84c	globals: Don't include SAX2.h from globals.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	4e1c13ebfd	debug: Remove debugging code This is barely useful these days and only clutters the code base.	2023-09-19 17:35:09 +02:00
Nick Wellnhofer	cde4499778	SAX2: Allow multiple top-level elements When parsing with HTML_PARSE_NOIMPLIED, the result document can contain multiple top-level elements. Rework xmlSAX2StartElement to simply add the element as a child of ctxt->node or ctxt->myDoc. Don't invoke xmlAddSibling for non-element parents. The context node should always be an element node. Fixes #584.	2023-08-27 16:35:23 +02:00
Nick Wellnhofer	f1c1f5c6b4	parser: Revert change to doc->encoding Fixes #579.	2023-08-17 12:47:14 +02:00
Nick Wellnhofer	cb717d7e02	parser: Update line number after coalescing text nodes This should make the line number of text nodes deterministic. Before, it depended on the callback sequence which depends on the size of chunks fed to the parser.	2023-08-09 16:58:33 +02:00
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00

1 2 3 4

181 Commits