libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-01-13 13:17:36 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	4fefba4cf6	parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.	2024-05-15 17:58:48 +02:00
Nick Wellnhofer	fdc5ff3657	parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See `b717abdd`.	2024-05-03 11:52:54 +02:00
Nick Wellnhofer	39e5b35bd0	parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.	2024-05-03 11:46:01 +02:00
Nick Wellnhofer	45fe9924f0	parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.	2024-04-23 18:36:15 +02:00
Nick Wellnhofer	b717abdd09	parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.	2024-04-23 18:36:15 +02:00
Nick Wellnhofer	f506ec6654	parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106	2024-04-15 12:34:26 +02:00
Nick Wellnhofer	186562a182	parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit `e0dd330b`, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.	2024-03-12 20:02:52 +01:00
Nick Wellnhofer	37c6618be5	parser: Rework parsing of attribute and entity values Don't use a separate function to handle "complex" attributes. Validate UTF-8 byte sequences without decoding. This should improve performance considerably when parsing multi-byte UTF-8 sequences. Use a string buffer to avoid unnecessary allocations and copying when expanding entities. Normalize attribute values in a single pass while expanding entities. Be more lenient in recovery mode. If no entity substitution was requested, validate entities without expanding. Fixes #596. Also fixes #655.	2024-01-02 15:42:03 +01:00
Nick Wellnhofer	f0dc52d09c	parser: Move cleanup of element stacks to xmlParseContent	2024-01-02 14:17:27 +01:00
Nick Wellnhofer	f3fa34dcad	parser: Fix general entity parsing Clear namespace database. Ignore non-fatal errors.	2023-12-28 16:47:41 +01:00
Nick Wellnhofer	ecfbcc8a52	parser: Rework general entity parsing Don't create a new parser context but reuse the existing one. This exposes bug #601 in a more obvious way.	2023-12-25 23:38:40 +01:00
Nick Wellnhofer	7d446e9736	parser: Fix namespaces redefined from default attributes This regressed in commit `e0dd330b`. Also fixes a long-standing issue where namespaces from default attributes weren't added if they match an existing namespace. Fixes #643.	2023-12-08 12:19:16 +01:00
Nick Wellnhofer	a2b5c90a44	hash: Fix deletion of entries during scan Functions like xmlCleanSpecialAttr scan a hash table and possibly delete entries in the callback. xmlHashScanFull must detect such deletions and rescan the entry. This regressed when rewriting the hash table code in `4a513d56`. Fixes #626.	2023-11-21 15:28:59 +01:00
Nick Wellnhofer	7a2d412f68	parser: Copy default namespace in xmlParseBalancedChunkMemory	2023-10-31 20:19:27 +01:00
Nick Wellnhofer	e0c2f14d83	parser: Copy namespaces in xmlParseBalancedChunkMemory Reenable copying of namespaces but don't set SAX data. This should match the old behavior.	2023-10-31 14:04:57 +01:00
Nick Wellnhofer	6337a14a6b	tests: Handle entities in SAX tests	2023-10-06 12:28:59 +02:00
Nick Wellnhofer	9c63cea5a6	test: Add test for push parser boundaries	2022-11-20 21:27:59 +01:00
David Kilzer	03bb929390	Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8(). * encoding.c: (UTF16LEToUTF8): - Fix comment to describe what the code does. (UTF16BEToUTF8): - Fix undefined behavior which was applied to UTF16LEToUTF8() in `2f9382033e`. - Add bounds check to while() loop which was applied to UTF16LEToUTF8() in `be803967db`. - Do not return -2 when (in >= inend) to fix the bug. This was applied to UTF16LEToUTF8() in `496a1cf592`. - Inline (<< 8) statements to match UTF16LEToUTF8(). Add the following tests and results: test/text-4-byte-UTF-16-BE-offset.xml test/text-4-byte-UTF-16-BE.xml test/text-4-byte-UTF-16-LE-offset.xml test/text-4-byte-UTF-16-LE.xml	2022-01-16 14:07:17 +01:00
Nick Wellnhofer	01411e7c5e	Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.	2021-02-08 21:51:26 +01:00
Nick Wellnhofer	eddfbc38fa	Don't load external entity from xmlSAX2GetEntity Despite the comment, I can't see a reason why external entities must be loaded in the SAX handler. For external entities, the handler is typically first invoked via xmlParseReference which will later load the entity on its own if it wasn't loaded yet. The old code also lead to duplicated SAX events which makes it basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser. See the change to the expected test output. Note that xmlSAX2GetEntity was loading the entity via xmlParseCtxtExternalEntity while xmlParseReference uses xmlParseExternalEntityPrivate. In the previous commit, the two functions were merged, trying to compensate for some slight differences between the two mostly identical implementations. But the more urgent reason for this change is that xmlParseReference has the facility to abort early when recursive entities are detected, avoiding what could practically amount to an infinite loop. If you want to backport this change, note that the previous three commits are required as well: `f9ea1a24` Fix copying of entities in xmlParseReference `5c7e0a9a` Copy some XMLReader option flags to parser context `1a3e584a` Merge code paths loading external entities Found by OSS-Fuzz.	2020-02-11 17:35:42 +01:00
Nick Wellnhofer	7218255092	Add test for ICU flush and pivot buffer	2017-11-04 15:38:58 +01:00
Nick Wellnhofer	dbaab1f369	Test SAX2 callbacks with entity substitution This detects regressions like bug 760367.	2017-06-16 21:38:57 +02:00
David Kilzer	4f8606c13c	Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer <https://bugzilla.gnome.org/show_bug.cgi?id=760183 > * parser.c: (xmlCheckCdataPush): Add 'complete' argument to describe whether the buffer passed in is the whole CDATA buffer, or if there is more data to parse. If there is more data to parse, don't return a negative value for an invalid multi-byte UTF-8 character that is split between buffers. (xmlParseTryOrFinish): Pass 'complete' argument to xmlCheckCdataPush() as appropriate. * result/cdata-2-byte-UTF-8.xml: Added. * result/cdata-2-byte-UTF-8.xml.rde: Added. * result/cdata-2-byte-UTF-8.xml.rdr: Added. * result/cdata-2-byte-UTF-8.xml.sax: Added. * result/cdata-2-byte-UTF-8.xml.sax2: Added. * result/cdata-3-byte-UTF-8.xml: Added. * result/cdata-3-byte-UTF-8.xml.rde: Added. * result/cdata-3-byte-UTF-8.xml.rdr: Added. * result/cdata-3-byte-UTF-8.xml.sax: Added. * result/cdata-3-byte-UTF-8.xml.sax2: Added. * result/cdata-4-byte-UTF-8.xml: Added. * result/cdata-4-byte-UTF-8.xml.rde: Added. * result/cdata-4-byte-UTF-8.xml.rdr: Added. * result/cdata-4-byte-UTF-8.xml.sax: Added. * result/cdata-4-byte-UTF-8.xml.sax2: Added. * result/noent/cdata-2-byte-UTF-8.xml: Added. * result/noent/cdata-3-byte-UTF-8.xml: Added. * result/noent/cdata-4-byte-UTF-8.xml: Added. * test/cdata-2-byte-UTF-8.xml: Added. * test/cdata-3-byte-UTF-8.xml: Added. * test/cdata-4-byte-UTF-8.xml: Added. - Add tests and results. Only 'make Readertests XMLPushtests' fails prior to the fix.	2016-04-08 10:18:52 +08:00
Daniel Veillard	df23f584fd	Adding example from bugs 738805 to regression tests For https://bugzilla.gnome.org/show_bug.cgi?id=738805 Tortuous test case provided by pierre.labastie@neuf.fr	2014-10-23 13:52:47 +08:00
Daniel Veillard	dcc1950319	Fix a parsing bug on non-ascii element and CR/LF usage https://bugzilla.gnome.org/show_bug.cgi?id=698550 Somehow the behaviour of the internal parser routine changed slightly when encountering CR/LF, which led to a bug when parsing document with non-ascii Names	2013-05-22 22:56:45 +02:00
Daniel Veillard	a6c76a26ca	566012 part 2 fix regresion tests and push mode * test/utf16bebom.xml: regression test showed that this test case was broken but previous behaviour would not detect it ! * parser.c: fix 566012 for the push mode of the parser, tricky ! * test/ebcdic_566012.xml result//ebcdic_566012.xml*: add the test to the regression suite	2009-08-26 14:37:00 +02:00
Daniel Veillard	283d50279d	587663 Incorrect Attribute-Value Normalization * parser.c: when replacing entities and that the entity is CDATA and reference entities then white space character in replacement text need to be replaced by 0x20 * result/noent/att10: correct the output of the associated regression test	2009-08-25 17:18:39 +02:00
Daniel Veillard	7f4547cdbd	preparing the release of 2.7.2 fix the Solaris portability issue * configure.in doc/* NEWS: preparing the release of 2.7.2 * dict.c: fix the Solaris portability issue * parser.c: additional cleanup on #554660 fix * test/ent13 result/ent13* result/noent/ent13: added the example in the regression test suite. HTMLparser.c: handle leading BOM in htmlParseElement() Daniel svn path=/trunk/; revision=3799	2008-10-03 07:58:23 +00:00
Daniel Veillard	97c9ce2e99	fix various attribute normalisation problems reported by Ashwin this * parser.c: fix various attribute normalisation problems reported by Ashwin * result/c14n/without-comments/example-4 result/c14n/with-comments/example-4: this impacted the result of two c14n tests :-\ * test/att9 test/att10 test/att11 result//att9* result//att10* result//att11*: added 3 specific regression tests coming from the XML spec revision and from Ashwin Daniel svn path=/trunk/; revision=3715	2008-03-25 16:52:41 +00:00
Daniel Veillard	d0d2f090dc	fix handling of empty CDATA nodes as reported and discussed around #514181 * xmlsave.c parser.c: fix handling of empty CDATA nodes as reported and discussed around #514181 and associated patches * test/emptycdata.xml result/emptycdata.xml* result/noent/emptycdata.xml: added a specific test in the regression suite. Daniel svn path=/trunk/; revision=3701	2008-03-07 16:50:21 +00:00
Daniel Veillard	dfac946c3d	fixed the push mode when a big comment occurs before an internal subset, * parser.c: fixed the push mode when a big comment occurs before an internal subset, should close bug #438835 * test/comment6.xml result//comment6.xml*: added a special test in the regression suite Daniel svn path=/trunk/; revision=3635	2007-06-12 14:44:32 +00:00
Daniel Veillard	166e1a9b59	Adding extra test files, just in case ... Daniel	2006-10-10 20:12:24 +00:00
Kasimier T. Buchcik	7b4e2e20fd	Removed the automatic generation of CDATA sections for the content of the * xmlsave.c: Removed the automatic generation of CDATA sections for the content of the "script" and "style" elements when serializing XHTML. The issue was reported by Vincent Lefevre, bug #345147. * result/xhtml1 result/noent/xhtml1: Adjusted regression test results due to the serialization change described above.	2006-07-13 13:07:11 +00:00
Daniel Veillard	6974feb0cf	fixed the comment streaming bug raised by Graham Bennett added to the * parser.c: fixed the comment streaming bug raised by Graham Bennett * test/badcomment.xml result//badcomment.xml*: added to the regression suite. Daniel	2006-02-05 02:43:36 +00:00
Daniel Veillard	a617e24f32	reverted first patches for #319279 which led to #326295 and fixed the * parser.c: reverted first patches for #319279 which led to #326295 and fixed the problem in xmlParseChunk() instead * test/ent11 result//ent11*: added test for #326295 to the regression suite Daniel	2006-01-09 14:38:44 +00:00
Daniel Veillard	6977c6c437	fix bug #324432 with <xml:foo/> added to the regression tests Daniel * SAX2.c: fix bug #324432 with <xml:foo/> * test/ns7 resul//ns7*: added to the regression tests Daniel	2006-01-04 14:03:10 +00:00
Daniel Veillard	dbd6105321	applied second patch from David Madore to be less intrusive when handling * xmlsave.c: applied second patch from David Madore to be less intrusive when handling scripts and style elements in XHTML1 should fix #316041 * test/xhtml1 result//xhtml1\*: updated the test accordingly Daniel	2005-09-12 14:03:26 +00:00
Daniel Veillard	abac41e829	fixing bug #166777 (and #169838 ), it was an heuristic in areBlanks which * parser.c: fixing bug #166777 (and #169838), it was an heuristic in areBlanks which failed. * result/winblanks.xml* result/noent/winblanks.xml test/winblanks.xml: added the input file to the regression tests Daniel	2005-07-06 15:17:38 +00:00
Daniel Veillard	365cf67ff8	applied patch from Malcolm Rowe to avoid namespace troubles on rollback * parser.c: applied patch from Malcolm Rowe to avoid namespace troubles on rollback parsing of elements start #304761 * test/nsclean.xml result/noent/nsclean.xml result/nsclean.xml*: added it to the regression tests. Daniel	2005-06-09 08:18:24 +00:00
Daniel Veillard	8f8a9dd7f1	found and fixed 2 problems in the internal subset scanning code affecting * parser.c: found and fixed 2 problems in the internal subset scanning code affecting the push parser (and the reader), fixes #165126 * test/intsubset2.xml result//intsubset2.xml*: added the test case to the regression tests. Daniel	2005-01-25 21:41:42 +00:00
Daniel Veillard	4c778d8b96	boosting common commnent parsing code, it was really slow. added sprecific * parser.c: boosting common commnent parsing code, it was really slow. * test/comment[3-5].xml result//comment[3-5].xml*: added sprecific regression tests Daniel	2005-01-23 17:37:44 +00:00
Daniel Veillard	48df9613ba	fixed namespace bug in push mode reported by Rob Richards added it to the * parser.c: fixed namespace bug in push mode reported by Rob Richards * test/ns6 result//ns6: added it to the regression tests xmlmodule.c testModule.c include/libxml/xmlmodule.h: added an extra option argument to module opening and defined a couple of flags to the API. Daniel	2005-01-04 21:50:05 +00:00
Daniel Veillard	370ba3d231	fixed the leak reported by Volker Roth on the list added a specific test * parser.c: fixed the leak reported by Volker Roth on the list * test/ent10 result//ent10*: added a specific test for the problem Daniel	2004-10-25 16:23:56 +00:00
Daniel Veillard	f34a20e69d	"" is a valid hexbinary string dixit xmlschema-dev update the test. added * xmlschemastypes.c: "" is a valid hexbinary string dixit xmlschema-dev * result/schemas/hexbinary_0_1.err test/schemas/hexbinary_1.xml: update the test. * test/ns5 result//ns5: added a test for the namespace bug fixed in previous commit. Makefile.am: added a message in the regression tests Daniel	2004-08-31 08:42:17 +00:00
Daniel Veillard	0df3bc3f28	fixed a serious problem when substituing entities using the Reader, the * parser.c xmlreader.c include/libxml/parser.h: fixed a serious problem when substituing entities using the Reader, the entities content might be freed and if rereferenced would crash * Makefile.am test/* result/*: added a new test case and a new test operation for the reader with substitution of entities. Daniel	2004-06-08 12:03:41 +00:00
Daniel Veillard	f0244cea96	apply fix for XHTML1 formating from Nick Wellnhofer fixes bug #141266 * xmlsave.c: apply fix for XHTML1 formating from Nick Wellnhofer fixes bug #141266 * test/xhtmlcomp result//xhtmlcomp*: added the specific regression test Daniel	2004-05-09 23:48:39 +00:00
Daniel Veillard	d3999c7ac6	fix bug reported by Holger Rauch added the test to th regression suite * parser.c: fix bug reported by Holger Rauch * test/att8 result/noent/att8 result/att8 result/att8.rdr result/att8.sax: added the test to th regression suite Daniel	2004-03-10 16:27:03 +00:00
Daniel Veillard	cb35f01d94	xmlAttrSerializeTxtContent don't segfault if NULL is passed. adding an old * tree.c: xmlAttrSerializeTxtContent don't segfault if NULL is passed. * test/att7 result//att7*: adding an old regression test laying around on my laptop Daniel	2004-02-20 08:18:58 +00:00
Daniel Veillard	b37440047e	fixed a problem in push mode when attribute contains unescaped '>' * parser.c: fixed a problem in push mode when attribute contains unescaped '>' characters, fixes bug #134566 * test/att6 result//att6*: added the test to the regression suite Daniel	2004-02-18 14:28:22 +00:00
Daniel Veillard	036143bb53	fixed bug #132575 about finding the end of the internal subset in push * parser.c: fixed bug #132575 about finding the end of the internal subset in push mode. * test/intsubset.xml result/intsubset.xml* result/noent/intsubset.xml: added the test to the regression suite Daniel	2004-02-12 11:57:52 +00:00

1 2

82 Commits