1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-01-13 13:17:36 +03:00
Commit Graph

82 Commits

Author SHA1 Message Date
Nick Wellnhofer
4fefba4cf6 parser: Rework handling of undeclared entities
Throw an error if entity substitution was requested.

Now we only downgrade to a warning if

- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.

Should fix #724.
2024-05-15 17:58:48 +02:00
Nick Wellnhofer
fdc5ff3657 parser: Always throw entity errors if external DTD is loaded
When parsing with XML_PARSE_DTDLOAD, missing entities are always an
error.

Also consolidate behavior when validating. See b717abdd.
2024-05-03 11:52:54 +02:00
Nick Wellnhofer
39e5b35bd0 parser: Don't create undeclared entity refs in substitution mode
We never want to create entity reference nodes if entity substitution
is enabled. This also applies to undeclared entities.
2024-05-03 11:46:01 +02:00
Nick Wellnhofer
45fe9924f0 parser: Don't create reference in xmlLookupGeneralEntity
This should only be done in xmlParseReference.

The handling of undeclared entities is still somewhat inconsistent. In
element content we create references even if entity substitution is
enabled. In attribute values undeclared entities are always ignored.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
b717abdd09 parser: Consolidate error handling for undeclared entities
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
f506ec6654 parser: Always decode entities in namespace URIs
Also decode entities in namespace URIs if entity substitution wasn't
requested. This should fix some corner cases when comparing namespace
URIs. The Namespaces in XML 1.0 spec says:

> In a namespace declaration, the URI reference is the normalized value
> of the attribute, so replacement of XML character and entity
> references has already been done before any comparison.

Make the serialization code escape special characters in namespace URIs
like in attribute values. This fixes serialization if entities were
substituted when parsing.

Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
2024-04-15 12:34:26 +02:00
Nick Wellnhofer
186562a182 parser: Fix detection of duplicate attributes in XML namespace
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.

Fixes #704.
2024-03-12 20:02:52 +01:00
Nick Wellnhofer
37c6618be5 parser: Rework parsing of attribute and entity values
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.

Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.

Normalize attribute values in a single pass while expanding entities.

Be more lenient in recovery mode.

If no entity substitution was requested, validate entities without
expanding. Fixes #596.

Also fixes #655.
2024-01-02 15:42:03 +01:00
Nick Wellnhofer
f0dc52d09c parser: Move cleanup of element stacks to xmlParseContent 2024-01-02 14:17:27 +01:00
Nick Wellnhofer
f3fa34dcad parser: Fix general entity parsing
Clear namespace database.

Ignore non-fatal errors.
2023-12-28 16:47:41 +01:00
Nick Wellnhofer
ecfbcc8a52 parser: Rework general entity parsing
Don't create a new parser context but reuse the existing one.

This exposes bug #601 in a more obvious way.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
7d446e9736 parser: Fix namespaces redefined from default attributes
This regressed in commit e0dd330b.

Also fixes a long-standing issue where namespaces from default
attributes weren't added if they match an existing namespace.

Fixes #643.
2023-12-08 12:19:16 +01:00
Nick Wellnhofer
a2b5c90a44 hash: Fix deletion of entries during scan
Functions like xmlCleanSpecialAttr scan a hash table and possibly delete
entries in the callback. xmlHashScanFull must detect such deletions and
rescan the entry.

This regressed when rewriting the hash table code in 4a513d56.

Fixes #626.
2023-11-21 15:28:59 +01:00
Nick Wellnhofer
7a2d412f68 parser: Copy default namespace in xmlParseBalancedChunkMemory 2023-10-31 20:19:27 +01:00
Nick Wellnhofer
e0c2f14d83 parser: Copy namespaces in xmlParseBalancedChunkMemory
Reenable copying of namespaces but don't set SAX data. This should
match the old behavior.
2023-10-31 14:04:57 +01:00
Nick Wellnhofer
6337a14a6b tests: Handle entities in SAX tests 2023-10-06 12:28:59 +02:00
Nick Wellnhofer
9c63cea5a6 test: Add test for push parser boundaries 2022-11-20 21:27:59 +01:00
David Kilzer
03bb929390 Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().

* encoding.c:
(UTF16LEToUTF8):
- Fix comment to describe what the code does.
(UTF16BEToUTF8):
- Fix undefined behavior which was applied to UTF16LEToUTF8() in
  2f9382033e.
- Add bounds check to while() loop which was applied to
  UTF16LEToUTF8() in be803967db.
- Do not return -2 when (in >= inend) to fix the bug.  This was
  applied to UTF16LEToUTF8() in 496a1cf592.
- Inline (<< 8) statements to match UTF16LEToUTF8().

Add the following tests and results:

  test/text-4-byte-UTF-16-BE-offset.xml
  test/text-4-byte-UTF-16-BE.xml
  test/text-4-byte-UTF-16-LE-offset.xml
  test/text-4-byte-UTF-16-LE.xml
2022-01-16 14:07:17 +01:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
Nick Wellnhofer
eddfbc38fa Don't load external entity from xmlSAX2GetEntity
Despite the comment, I can't see a reason why external entities must be
loaded in the SAX handler. For external entities, the handler is
typically first invoked via xmlParseReference which will later load the
entity on its own if it wasn't loaded yet.

The old code also lead to duplicated SAX events which makes it
basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser.
See the change to the expected test output.

Note that xmlSAX2GetEntity was loading the entity via
xmlParseCtxtExternalEntity while xmlParseReference uses
xmlParseExternalEntityPrivate. In the previous commit, the two
functions were merged, trying to compensate for some slight differences
between the two mostly identical implementations.

But the more urgent reason for this change is that xmlParseReference
has the facility to abort early when recursive entities are detected,
avoiding what could practically amount to an infinite loop.

If you want to backport this change, note that the previous three
commits are required as well:

f9ea1a24 Fix copying of entities in xmlParseReference
5c7e0a9a Copy some XMLReader option flags to parser context
1a3e584a Merge code paths loading external entities

Found by OSS-Fuzz.
2020-02-11 17:35:42 +01:00
Nick Wellnhofer
7218255092 Add test for ICU flush and pivot buffer 2017-11-04 15:38:58 +01:00
Nick Wellnhofer
dbaab1f369 Test SAX2 callbacks with entity substitution
This detects regressions like bug 760367.
2017-06-16 21:38:57 +02:00
David Kilzer
4f8606c13c Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer <https://bugzilla.gnome.org/show_bug.cgi?id=760183>
* parser.c:
(xmlCheckCdataPush): Add 'complete' argument to describe whether
the buffer passed in is the whole CDATA buffer, or if there is
more data to parse.  If there is more data to parse, don't
return a negative value for an invalid multi-byte UTF-8
character that is split between buffers.
(xmlParseTryOrFinish): Pass 'complete' argument to
xmlCheckCdataPush() as appropriate.

* result/cdata-2-byte-UTF-8.xml: Added.
* result/cdata-2-byte-UTF-8.xml.rde: Added.
* result/cdata-2-byte-UTF-8.xml.rdr: Added.
* result/cdata-2-byte-UTF-8.xml.sax: Added.
* result/cdata-2-byte-UTF-8.xml.sax2: Added.
* result/cdata-3-byte-UTF-8.xml: Added.
* result/cdata-3-byte-UTF-8.xml.rde: Added.
* result/cdata-3-byte-UTF-8.xml.rdr: Added.
* result/cdata-3-byte-UTF-8.xml.sax: Added.
* result/cdata-3-byte-UTF-8.xml.sax2: Added.
* result/cdata-4-byte-UTF-8.xml: Added.
* result/cdata-4-byte-UTF-8.xml.rde: Added.
* result/cdata-4-byte-UTF-8.xml.rdr: Added.
* result/cdata-4-byte-UTF-8.xml.sax: Added.
* result/cdata-4-byte-UTF-8.xml.sax2: Added.
* result/noent/cdata-2-byte-UTF-8.xml: Added.
* result/noent/cdata-3-byte-UTF-8.xml: Added.
* result/noent/cdata-4-byte-UTF-8.xml: Added.
* test/cdata-2-byte-UTF-8.xml: Added.
* test/cdata-3-byte-UTF-8.xml: Added.
* test/cdata-4-byte-UTF-8.xml: Added.
- Add tests and results.  Only 'make Readertests XMLPushtests'
  fails prior to the fix.
2016-04-08 10:18:52 +08:00
Daniel Veillard
df23f584fd Adding example from bugs 738805 to regression tests
For https://bugzilla.gnome.org/show_bug.cgi?id=738805

Tortuous test case provided by pierre.labastie@neuf.fr
2014-10-23 13:52:47 +08:00
Daniel Veillard
dcc1950319 Fix a parsing bug on non-ascii element and CR/LF usage
https://bugzilla.gnome.org/show_bug.cgi?id=698550

Somehow the behaviour of the internal parser routine changed
slightly when encountering CR/LF, which led to a bug when
parsing document with non-ascii Names
2013-05-22 22:56:45 +02:00
Daniel Veillard
a6c76a26ca 566012 part 2 fix regresion tests and push mode
* test/utf16bebom.xml: regression test showed that this test case was
  broken but previous behaviour would not detect it !
* parser.c: fix 566012 for the push mode of the parser, tricky !
* test/ebcdic_566012.xml result//ebcdic_566012.xml*: add the test to the
  regression suite
2009-08-26 14:37:00 +02:00
Daniel Veillard
283d50279d 587663 Incorrect Attribute-Value Normalization
* parser.c: when replacing entities and that the entity is CDATA and
  reference entities then white space character in replacement text
  need to be replaced by 0x20
* result/noent/att10: correct the output of the associated regression
  test
2009-08-25 17:18:39 +02:00
Daniel Veillard
7f4547cdbd preparing the release of 2.7.2 fix the Solaris portability issue
* configure.in doc/* NEWS: preparing the release of 2.7.2
* dict.c: fix the Solaris portability issue
* parser.c: additional cleanup on #554660 fix
* test/ent13 result/ent13* result/noent/ent13*: added the
  example in the regression test suite.
* HTMLparser.c: handle leading BOM in htmlParseElement()
Daniel

svn path=/trunk/; revision=3799
2008-10-03 07:58:23 +00:00
Daniel Veillard
97c9ce2e99 fix various attribute normalisation problems reported by Ashwin this
* parser.c: fix various attribute normalisation problems reported
  by Ashwin
* result/c14n/without-comments/example-4
  result/c14n/with-comments/example-4: this impacted the result of
  two c14n tests :-\
* test/att9 test/att10 test/att11 result//att9* result//att10*
  result//att11*: added 3 specific regression tests coming from the
  XML spec revision and from Ashwin
Daniel

svn path=/trunk/; revision=3715
2008-03-25 16:52:41 +00:00
Daniel Veillard
d0d2f090dc fix handling of empty CDATA nodes as reported and discussed around #514181
* xmlsave.c parser.c: fix handling of empty CDATA nodes as 
  reported and discussed around #514181 and associated patches
* test/emptycdata.xml result/emptycdata.xml* 
  result/noent/emptycdata.xml: added a specific test in the
  regression suite.
Daniel

svn path=/trunk/; revision=3701
2008-03-07 16:50:21 +00:00
Daniel Veillard
dfac946c3d fixed the push mode when a big comment occurs before an internal subset,
* parser.c: fixed the push mode when a big comment occurs before
  an internal subset, should close bug #438835
* test/comment6.xml result//comment6.xml*: added a special
  test in the regression suite
Daniel

svn path=/trunk/; revision=3635
2007-06-12 14:44:32 +00:00
Daniel Veillard
166e1a9b59 Adding extra test files, just in case ... Daniel 2006-10-10 20:12:24 +00:00
Kasimier T. Buchcik
7b4e2e20fd Removed the automatic generation of CDATA sections for the content of the
* xmlsave.c: Removed the automatic generation of CDATA sections
  for the content of the "script" and "style" elements when
  serializing XHTML. The issue was reported by Vincent Lefevre,
  bug #345147.
* result/xhtml1 result/noent/xhtml1: Adjusted regression test
  results due to the serialization change described above.
2006-07-13 13:07:11 +00:00
Daniel Veillard
6974feb0cf fixed the comment streaming bug raised by Graham Bennett added to the
* parser.c: fixed the comment streaming bug raised by Graham Bennett
* test/badcomment.xml result//badcomment.xml*: added to the regression suite.
Daniel
2006-02-05 02:43:36 +00:00
Daniel Veillard
a617e24f32 reverted first patches for #319279 which led to #326295 and fixed the
* parser.c: reverted first patches for #319279 which led to #326295
  and fixed the problem in xmlParseChunk() instead
* test/ent11 result//ent11*: added test for #326295 to the regression
  suite
Daniel
2006-01-09 14:38:44 +00:00
Daniel Veillard
6977c6c437 fix bug #324432 with <xml:foo/> added to the regression tests Daniel
* SAX2.c: fix bug #324432 with <xml:foo/>
* test/ns7 resul//ns7*: added to the regression tests
Daniel
2006-01-04 14:03:10 +00:00
Daniel Veillard
dbd6105321 applied second patch from David Madore to be less intrusive when handling
* xmlsave.c: applied second patch from David Madore to be less intrusive
  when handling scripts and style elements in XHTML1 should fix #316041
* test/xhtml1 result//xhtml1\*: updated the test accordingly
Daniel
2005-09-12 14:03:26 +00:00
Daniel Veillard
abac41e829 fixing bug #166777 (and #169838), it was an heuristic in areBlanks which
* parser.c: fixing bug #166777 (and #169838), it was an heuristic
  in areBlanks which failed.
* result/winblanks.xml* result/noent/winblanks.xml test/winblanks.xml:
  added the input file to the regression tests
Daniel
2005-07-06 15:17:38 +00:00
Daniel Veillard
365cf67ff8 applied patch from Malcolm Rowe to avoid namespace troubles on rollback
* parser.c: applied patch from Malcolm Rowe to avoid namespace
  troubles on rollback parsing of elements start #304761
* test/nsclean.xml result/noent/nsclean.xml result/nsclean.xml*:
  added it to the regression tests.
Daniel
2005-06-09 08:18:24 +00:00
Daniel Veillard
8f8a9dd7f1 found and fixed 2 problems in the internal subset scanning code affecting
* parser.c: found and fixed 2 problems in the internal subset scanning
  code affecting the push parser (and the reader), fixes #165126
* test/intsubset2.xml result//intsubset2.xml*: added the test case
  to the regression tests.
Daniel
2005-01-25 21:41:42 +00:00
Daniel Veillard
4c778d8b96 boosting common commnent parsing code, it was really slow. added sprecific
* parser.c: boosting common commnent parsing code, it was really
  slow.
* test/comment[3-5].xml result//comment[3-5].xml*: added sprecific
  regression tests
Daniel
2005-01-23 17:37:44 +00:00
Daniel Veillard
48df9613ba fixed namespace bug in push mode reported by Rob Richards added it to the
* parser.c: fixed namespace bug in push mode reported by
  Rob Richards
* test/ns6 result//ns6*: added it to the regression tests
* xmlmodule.c testModule.c include/libxml/xmlmodule.h:
  added an extra option argument to module opening and defined
  a couple of flags to the API.
Daniel
2005-01-04 21:50:05 +00:00
Daniel Veillard
370ba3d231 fixed the leak reported by Volker Roth on the list added a specific test
* parser.c: fixed the leak reported by Volker Roth on the list
* test/ent10 result//ent10*: added a specific test for the problem
Daniel
2004-10-25 16:23:56 +00:00
Daniel Veillard
f34a20e69d "" is a valid hexbinary string dixit xmlschema-dev update the test. added
* xmlschemastypes.c: "" is a valid hexbinary string dixit xmlschema-dev
* result/schemas/hexbinary_0_1.err test/schemas/hexbinary_1.xml:
  update the test.
* test/ns5 result//ns5*: added a test for the namespace bug fixed
  in previous commit.
* Makefile.am: added a message in the regression tests
Daniel
2004-08-31 08:42:17 +00:00
Daniel Veillard
0df3bc3f28 fixed a serious problem when substituing entities using the Reader, the
* parser.c xmlreader.c include/libxml/parser.h: fixed a serious
  problem when substituing entities using the Reader, the entities
  content might be freed and if rereferenced would crash
* Makefile.am test/* result/*: added a new test case and a new
  test operation for the reader with substitution of entities.
Daniel
2004-06-08 12:03:41 +00:00
Daniel Veillard
f0244cea96 apply fix for XHTML1 formating from Nick Wellnhofer fixes bug #141266
* xmlsave.c: apply fix for XHTML1 formating from Nick Wellnhofer
  fixes bug #141266
* test/xhtmlcomp result//xhtmlcomp*: added the specific regression
  test
Daniel
2004-05-09 23:48:39 +00:00
Daniel Veillard
d3999c7ac6 fix bug reported by Holger Rauch added the test to th regression suite
* parser.c: fix bug reported by Holger Rauch
* test/att8 result/noent/att8 result/att8 result/att8.rdr
  result/att8.sax: added the test to th regression suite
Daniel
2004-03-10 16:27:03 +00:00
Daniel Veillard
cb35f01d94 xmlAttrSerializeTxtContent don't segfault if NULL is passed. adding an old
* tree.c: xmlAttrSerializeTxtContent don't segfault if NULL
  is passed.
* test/att7 result//att7*: adding an old regression test
  laying around on my laptop
Daniel
2004-02-20 08:18:58 +00:00
Daniel Veillard
b37440047e fixed a problem in push mode when attribute contains unescaped '>'
* parser.c: fixed a problem in push mode when attribute contains
  unescaped '>' characters, fixes bug #134566
* test/att6 result//att6*: added the test to the regression suite
Daniel
2004-02-18 14:28:22 +00:00
Daniel Veillard
036143bb53 fixed bug #132575 about finding the end of the internal subset in push
* parser.c: fixed bug #132575 about finding the end of the
  internal subset in push mode.
* test/intsubset.xml result/intsubset.xml* result/noent/intsubset.xml:
  added the test to the regression suite
Daniel
2004-02-12 11:57:52 +00:00