1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 20:25:14 +03:00
Commit Graph

65 Commits

Author SHA1 Message Date
David Kilzer
03bb929390 Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().

* encoding.c:
(UTF16LEToUTF8):
- Fix comment to describe what the code does.
(UTF16BEToUTF8):
- Fix undefined behavior which was applied to UTF16LEToUTF8() in
  2f9382033e.
- Add bounds check to while() loop which was applied to
  UTF16LEToUTF8() in be803967db.
- Do not return -2 when (in >= inend) to fix the bug.  This was
  applied to UTF16LEToUTF8() in 496a1cf592.
- Inline (<< 8) statements to match UTF16LEToUTF8().

Add the following tests and results:

  test/text-4-byte-UTF-16-BE-offset.xml
  test/text-4-byte-UTF-16-BE.xml
  test/text-4-byte-UTF-16-LE-offset.xml
  test/text-4-byte-UTF-16-LE.xml
2022-01-16 14:07:17 +01:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
Nick Wellnhofer
eddfbc38fa Don't load external entity from xmlSAX2GetEntity
Despite the comment, I can't see a reason why external entities must be
loaded in the SAX handler. For external entities, the handler is
typically first invoked via xmlParseReference which will later load the
entity on its own if it wasn't loaded yet.

The old code also lead to duplicated SAX events which makes it
basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser.
See the change to the expected test output.

Note that xmlSAX2GetEntity was loading the entity via
xmlParseCtxtExternalEntity while xmlParseReference uses
xmlParseExternalEntityPrivate. In the previous commit, the two
functions were merged, trying to compensate for some slight differences
between the two mostly identical implementations.

But the more urgent reason for this change is that xmlParseReference
has the facility to abort early when recursive entities are detected,
avoiding what could practically amount to an infinite loop.

If you want to backport this change, note that the previous three
commits are required as well:

f9ea1a24 Fix copying of entities in xmlParseReference
5c7e0a9a Copy some XMLReader option flags to parser context
1a3e584a Merge code paths loading external entities

Found by OSS-Fuzz.
2020-02-11 17:35:42 +01:00
Nick Wellnhofer
7218255092 Add test for ICU flush and pivot buffer 2017-11-04 15:38:58 +01:00
Nick Wellnhofer
dbaab1f369 Test SAX2 callbacks with entity substitution
This detects regressions like bug 760367.
2017-06-16 21:38:57 +02:00
David Kilzer
4f8606c13c Bug 760183: REGRESSION (v2.9.3): XML push parser fails with bogus UTF-8 encoding error when multi-byte character in large CDATA section is split across buffer <https://bugzilla.gnome.org/show_bug.cgi?id=760183>
* parser.c:
(xmlCheckCdataPush): Add 'complete' argument to describe whether
the buffer passed in is the whole CDATA buffer, or if there is
more data to parse.  If there is more data to parse, don't
return a negative value for an invalid multi-byte UTF-8
character that is split between buffers.
(xmlParseTryOrFinish): Pass 'complete' argument to
xmlCheckCdataPush() as appropriate.

* result/cdata-2-byte-UTF-8.xml: Added.
* result/cdata-2-byte-UTF-8.xml.rde: Added.
* result/cdata-2-byte-UTF-8.xml.rdr: Added.
* result/cdata-2-byte-UTF-8.xml.sax: Added.
* result/cdata-2-byte-UTF-8.xml.sax2: Added.
* result/cdata-3-byte-UTF-8.xml: Added.
* result/cdata-3-byte-UTF-8.xml.rde: Added.
* result/cdata-3-byte-UTF-8.xml.rdr: Added.
* result/cdata-3-byte-UTF-8.xml.sax: Added.
* result/cdata-3-byte-UTF-8.xml.sax2: Added.
* result/cdata-4-byte-UTF-8.xml: Added.
* result/cdata-4-byte-UTF-8.xml.rde: Added.
* result/cdata-4-byte-UTF-8.xml.rdr: Added.
* result/cdata-4-byte-UTF-8.xml.sax: Added.
* result/cdata-4-byte-UTF-8.xml.sax2: Added.
* result/noent/cdata-2-byte-UTF-8.xml: Added.
* result/noent/cdata-3-byte-UTF-8.xml: Added.
* result/noent/cdata-4-byte-UTF-8.xml: Added.
* test/cdata-2-byte-UTF-8.xml: Added.
* test/cdata-3-byte-UTF-8.xml: Added.
* test/cdata-4-byte-UTF-8.xml: Added.
- Add tests and results.  Only 'make Readertests XMLPushtests'
  fails prior to the fix.
2016-04-08 10:18:52 +08:00
Daniel Veillard
df23f584fd Adding example from bugs 738805 to regression tests
For https://bugzilla.gnome.org/show_bug.cgi?id=738805

Tortuous test case provided by pierre.labastie@neuf.fr
2014-10-23 13:52:47 +08:00
Daniel Veillard
dcc1950319 Fix a parsing bug on non-ascii element and CR/LF usage
https://bugzilla.gnome.org/show_bug.cgi?id=698550

Somehow the behaviour of the internal parser routine changed
slightly when encountering CR/LF, which led to a bug when
parsing document with non-ascii Names
2013-05-22 22:56:45 +02:00
Daniel Veillard
a6c76a26ca 566012 part 2 fix regresion tests and push mode
* test/utf16bebom.xml: regression test showed that this test case was
  broken but previous behaviour would not detect it !
* parser.c: fix 566012 for the push mode of the parser, tricky !
* test/ebcdic_566012.xml result//ebcdic_566012.xml*: add the test to the
  regression suite
2009-08-26 14:37:00 +02:00
Daniel Veillard
283d50279d 587663 Incorrect Attribute-Value Normalization
* parser.c: when replacing entities and that the entity is CDATA and
  reference entities then white space character in replacement text
  need to be replaced by 0x20
* result/noent/att10: correct the output of the associated regression
  test
2009-08-25 17:18:39 +02:00
Daniel Veillard
7f4547cdbd preparing the release of 2.7.2 fix the Solaris portability issue
* configure.in doc/* NEWS: preparing the release of 2.7.2
* dict.c: fix the Solaris portability issue
* parser.c: additional cleanup on #554660 fix
* test/ent13 result/ent13* result/noent/ent13*: added the
  example in the regression test suite.
* HTMLparser.c: handle leading BOM in htmlParseElement()
Daniel

svn path=/trunk/; revision=3799
2008-10-03 07:58:23 +00:00
Daniel Veillard
97c9ce2e99 fix various attribute normalisation problems reported by Ashwin this
* parser.c: fix various attribute normalisation problems reported
  by Ashwin
* result/c14n/without-comments/example-4
  result/c14n/with-comments/example-4: this impacted the result of
  two c14n tests :-\
* test/att9 test/att10 test/att11 result//att9* result//att10*
  result//att11*: added 3 specific regression tests coming from the
  XML spec revision and from Ashwin
Daniel

svn path=/trunk/; revision=3715
2008-03-25 16:52:41 +00:00
Daniel Veillard
d0d2f090dc fix handling of empty CDATA nodes as reported and discussed around #514181
* xmlsave.c parser.c: fix handling of empty CDATA nodes as 
  reported and discussed around #514181 and associated patches
* test/emptycdata.xml result/emptycdata.xml* 
  result/noent/emptycdata.xml: added a specific test in the
  regression suite.
Daniel

svn path=/trunk/; revision=3701
2008-03-07 16:50:21 +00:00
Daniel Veillard
dfac946c3d fixed the push mode when a big comment occurs before an internal subset,
* parser.c: fixed the push mode when a big comment occurs before
  an internal subset, should close bug #438835
* test/comment6.xml result//comment6.xml*: added a special
  test in the regression suite
Daniel

svn path=/trunk/; revision=3635
2007-06-12 14:44:32 +00:00
Daniel Veillard
166e1a9b59 Adding extra test files, just in case ... Daniel 2006-10-10 20:12:24 +00:00
Kasimier T. Buchcik
7b4e2e20fd Removed the automatic generation of CDATA sections for the content of the
* xmlsave.c: Removed the automatic generation of CDATA sections
  for the content of the "script" and "style" elements when
  serializing XHTML. The issue was reported by Vincent Lefevre,
  bug #345147.
* result/xhtml1 result/noent/xhtml1: Adjusted regression test
  results due to the serialization change described above.
2006-07-13 13:07:11 +00:00
Daniel Veillard
6974feb0cf fixed the comment streaming bug raised by Graham Bennett added to the
* parser.c: fixed the comment streaming bug raised by Graham Bennett
* test/badcomment.xml result//badcomment.xml*: added to the regression suite.
Daniel
2006-02-05 02:43:36 +00:00
Daniel Veillard
a617e24f32 reverted first patches for #319279 which led to #326295 and fixed the
* parser.c: reverted first patches for #319279 which led to #326295
  and fixed the problem in xmlParseChunk() instead
* test/ent11 result//ent11*: added test for #326295 to the regression
  suite
Daniel
2006-01-09 14:38:44 +00:00
Daniel Veillard
6977c6c437 fix bug #324432 with <xml:foo/> added to the regression tests Daniel
* SAX2.c: fix bug #324432 with <xml:foo/>
* test/ns7 resul//ns7*: added to the regression tests
Daniel
2006-01-04 14:03:10 +00:00
Daniel Veillard
dbd6105321 applied second patch from David Madore to be less intrusive when handling
* xmlsave.c: applied second patch from David Madore to be less intrusive
  when handling scripts and style elements in XHTML1 should fix #316041
* test/xhtml1 result//xhtml1\*: updated the test accordingly
Daniel
2005-09-12 14:03:26 +00:00
Daniel Veillard
abac41e829 fixing bug #166777 (and #169838), it was an heuristic in areBlanks which
* parser.c: fixing bug #166777 (and #169838), it was an heuristic
  in areBlanks which failed.
* result/winblanks.xml* result/noent/winblanks.xml test/winblanks.xml:
  added the input file to the regression tests
Daniel
2005-07-06 15:17:38 +00:00
Daniel Veillard
365cf67ff8 applied patch from Malcolm Rowe to avoid namespace troubles on rollback
* parser.c: applied patch from Malcolm Rowe to avoid namespace
  troubles on rollback parsing of elements start #304761
* test/nsclean.xml result/noent/nsclean.xml result/nsclean.xml*:
  added it to the regression tests.
Daniel
2005-06-09 08:18:24 +00:00
Daniel Veillard
8f8a9dd7f1 found and fixed 2 problems in the internal subset scanning code affecting
* parser.c: found and fixed 2 problems in the internal subset scanning
  code affecting the push parser (and the reader), fixes #165126
* test/intsubset2.xml result//intsubset2.xml*: added the test case
  to the regression tests.
Daniel
2005-01-25 21:41:42 +00:00
Daniel Veillard
4c778d8b96 boosting common commnent parsing code, it was really slow. added sprecific
* parser.c: boosting common commnent parsing code, it was really
  slow.
* test/comment[3-5].xml result//comment[3-5].xml*: added sprecific
  regression tests
Daniel
2005-01-23 17:37:44 +00:00
Daniel Veillard
48df9613ba fixed namespace bug in push mode reported by Rob Richards added it to the
* parser.c: fixed namespace bug in push mode reported by
  Rob Richards
* test/ns6 result//ns6*: added it to the regression tests
* xmlmodule.c testModule.c include/libxml/xmlmodule.h:
  added an extra option argument to module opening and defined
  a couple of flags to the API.
Daniel
2005-01-04 21:50:05 +00:00
Daniel Veillard
370ba3d231 fixed the leak reported by Volker Roth on the list added a specific test
* parser.c: fixed the leak reported by Volker Roth on the list
* test/ent10 result//ent10*: added a specific test for the problem
Daniel
2004-10-25 16:23:56 +00:00
Daniel Veillard
f34a20e69d "" is a valid hexbinary string dixit xmlschema-dev update the test. added
* xmlschemastypes.c: "" is a valid hexbinary string dixit xmlschema-dev
* result/schemas/hexbinary_0_1.err test/schemas/hexbinary_1.xml:
  update the test.
* test/ns5 result//ns5*: added a test for the namespace bug fixed
  in previous commit.
* Makefile.am: added a message in the regression tests
Daniel
2004-08-31 08:42:17 +00:00
Daniel Veillard
0df3bc3f28 fixed a serious problem when substituing entities using the Reader, the
* parser.c xmlreader.c include/libxml/parser.h: fixed a serious
  problem when substituing entities using the Reader, the entities
  content might be freed and if rereferenced would crash
* Makefile.am test/* result/*: added a new test case and a new
  test operation for the reader with substitution of entities.
Daniel
2004-06-08 12:03:41 +00:00
Daniel Veillard
f0244cea96 apply fix for XHTML1 formating from Nick Wellnhofer fixes bug #141266
* xmlsave.c: apply fix for XHTML1 formating from Nick Wellnhofer
  fixes bug #141266
* test/xhtmlcomp result//xhtmlcomp*: added the specific regression
  test
Daniel
2004-05-09 23:48:39 +00:00
Daniel Veillard
d3999c7ac6 fix bug reported by Holger Rauch added the test to th regression suite
* parser.c: fix bug reported by Holger Rauch
* test/att8 result/noent/att8 result/att8 result/att8.rdr
  result/att8.sax: added the test to th regression suite
Daniel
2004-03-10 16:27:03 +00:00
Daniel Veillard
cb35f01d94 xmlAttrSerializeTxtContent don't segfault if NULL is passed. adding an old
* tree.c: xmlAttrSerializeTxtContent don't segfault if NULL
  is passed.
* test/att7 result//att7*: adding an old regression test
  laying around on my laptop
Daniel
2004-02-20 08:18:58 +00:00
Daniel Veillard
b37440047e fixed a problem in push mode when attribute contains unescaped '>'
* parser.c: fixed a problem in push mode when attribute contains
  unescaped '>' characters, fixes bug #134566
* test/att6 result//att6*: added the test to the regression suite
Daniel
2004-02-18 14:28:22 +00:00
Daniel Veillard
036143bb53 fixed bug #132575 about finding the end of the internal subset in push
* parser.c: fixed bug #132575 about finding the end of the
  internal subset in push mode.
* test/intsubset.xml result/intsubset.xml* result/noent/intsubset.xml:
  added the test to the regression suite
Daniel
2004-02-12 11:57:52 +00:00
William M. Brack
f9415e4989 Enhanced the handling of UTF-16, UTF-16LE and UTF-16BE encodings. Now
* encoding.c, include/libxml/encoding.h: Enhanced the handling of UTF-16,
  UTF-16LE and UTF-16BE encodings.  Now UTF-16 output is handled internally
  by default, with proper BOM and UTF-16LE encoding.  Native UTF-16LE and
  UTF-16BE encoding will not generate a BOM on output, and will be
  automatically recognized on input.
* test/utf16lebom.xml, test/utf16bebom.xml, result/utf16?ebom*: added
  regression tests for above.
2003-11-28 09:39:10 +00:00
Daniel Veillard
d45325589d fixed #127877, never output &quot; in element content this changes the
* entities.c: fixed #127877, never output &quot; in element content
* result/isolat3 result/slashdot16.xml result/noent/isolat3
  result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml
  result/valid/index.xml result/valid/xlink.xml: this changes the
  output of a few tests
Daniel
2003-11-25 18:29:55 +00:00
Daniel Veillard
d96f6d3429 cleaning up XPath error reporting that time. applied the two patches for
* error.c include/libxml/xmlerror.h include/libxml/xpath.h
  include/libxml/xpathInternals.h xpath.c: cleaning up XPath
  error reporting that time.
* threads.c: applied the two patches for TLS threads
  on Windows from Jesse Pelton
* parser.c: tiny safety patch for xmlStrPrintf() make sure the
  return is always zero terminated. Should also help detecting
  passing wrong buffer size easilly.
* result/VC/* result/valid/rss.xml.err result/valid/xlink.xml.err:
  updated the results to follow the errors string generated by
  last commit.
Daniel
2003-10-07 21:25:12 +00:00
Daniel Veillard
9475a352bd added the same htmlRead APIs than their XML counterparts new parser
* HTMLparser.c testHTML.c xmllint.c include/libxml/HTMLparser.h:
  added the same htmlRead APIs than their XML counterparts
* include/libxml/parser.h: new parser options, not yet implemented,
  added an options field to the context.
* tree.c: patch from Shaun McCance to fix bug #123238 when ]]>
  is found within a cdata section.
* result/noent/cdata2 result/cdata2 result/cdata2.rdr
  result/cdata2.sax test/cdata2: add one more cdata test
Daniel
2003-09-26 12:47:50 +00:00
Daniel Veillard
3b7840cd6c adding namespace checkings while making sure they still parse as
* parser.c parserInternals.c tree.c include/libxml/parser.h
  include/libxml/xmlerror.h: adding namespace checkings
  while making sure they still parse as wellformed documents.
  Add an nsWellFormed status report to the context, and
  provide new appropriate error codes.
* Makefile.am result/namespaces/* test/namespaces/*: add
  specific regression testing for the new namespace support
* test/att5 result/noent/att5 result/att5 result/att5.sax:
  add more coverage for the attribute parsing and normalization
  code.
Daniel
2003-09-11 23:42:01 +00:00
Daniel Veillard
07cb8226c0 Time to commit 3 days of work rewriting the parser internal,
fixing bugs and migrating to SAX2 interface by default. There
is some work letf TODO, like namespace validation and attributes
normalization (this break C14N right now)
* Makefile.am: fixed the test rules
* include/libxml/SAX2.h include/libxml/parser.h
  include/libxml/parserInternals.h SAX2.c parser.c
  parserInternals.c: changing the parser, migrating to SAX2,
  adding new interface to switch back to SAX1 or initialize a
  SAX block for v1 or v2. Most of the namespace work is done
  below SAX, as well as attribute defaulting
* globals.c: changed initialization of the default SAX handlers
* hash.c tree.c include/libxml/hash.h: added QName specific handling
* xmlIO.c: small fix
* xmllint.c testSAX.c: provide a --sax1 switch to test the old
  version code path
* result/p3p result/p3p.sax result/noent/p3p test/p3p: the new code
  pointed out a typo in a very old test namespace
Daniel
2003-09-10 10:51:05 +00:00
Daniel Veillard
67906944fc fixed a namespace error on attribute reporting bug pointed out by Tobias
* SAX2.c: fixed a namespace error on attribute reporting bug
  pointed out by Tobias Reif
* test/p3p result/p3p result/noent/p3p: this test case was wrong
  using xmlsn instead of xmlns...
Daniel
2003-08-28 21:13:25 +00:00
Daniel Veillard
2dcb937a9a patch from Dodji Seketeli about UTF16 BOM when using the push XML parser.
* parserInternals.c: patch from Dodji Seketeli about UTF16 BOM
  when using the push XML parser.
* result/utf16bom.xml result/noent/utf16bom.xml test/utf16bom.xml:
  added the test to the regression suite.
Daniel
2003-07-16 21:18:19 +00:00
Daniel Veillard
8265a18a6a do not generate &quot; for " outside of attributes this changes the output
* entities.c: do not generate &quot; for " outside of attributes
* result//*: this changes the output of some tests
Daniel
2003-06-13 10:05:56 +00:00
Daniel Veillard
67df809c3a Vyacheslav Pindyura managed to trigger a bug in parseStartTag, fixing it.
* parser.c: Vyacheslav Pindyura managed to trigger a bug in
  parseStartTag, fixing it.
* test/att4 result/att4 result/noent/att4: adding the test
* xmlreader.c include/libxml/xmlreader.h doc/libxml2-api.xml: added
  more methods to XmlTextReader.
Daniel
2002-12-16 22:04:11 +00:00
Daniel Veillard
d5c2f92df4 modified the existing APIs to handle XHTML1 serialization rules
* tree.c include/libxml/tree.h: modified the existing APIs
  to handle XHTML1 serialization rules automatically, also add
  xmlIsXHTML() to libxml2 API. Some tweaking to make sure
  libxslt serialization uses it when needed without changing
  the library API.
* test/xhtml1 result/noent/xhtml1 result/valid/xhtml1.xhtml
  result/xhtml1: added a new test specifically for xhtml1 output
  and updated the result of one XHTML1 test
Daniel
2002-11-21 14:10:52 +00:00
Daniel Veillard
6f4561a49c Never commit without running "make tests" :-( fix a couple of stupidities
* valid.c SAX.c: Never commit without running "make tests" :-(
  fix a couple of stupidities in the previous commit
* result/*: a few changes in some attribute order result of previous
  commit.
Daniel
2002-03-25 12:10:14 +00:00
Daniel Veillard
a6d0538776 Fixing #71342 serializing '\n' in attribute values added a specific test.
* tree.c: Fixing #71342 serializing '\n' in attribute values
* result/noent/att3 result/att3 test/att3: added a specific
  test.
Daniel
2002-02-13 13:07:41 +00:00
Daniel Veillard
319a742a50 fixed bug #59981 related to handling of '&' in attributes when entities
* parser.c result/noent/wml.xml: fixed bug #59981 related
  to handling of '&' in attributes when entities are substitued
Daniel
2001-09-11 09:27:09 +00:00
Daniel Veillard
48da910097 allow to inherit attributes from the DTD directly in the tree, this is
* SAX.c testXPath.c valid.c xmllint.c include/libxml/valid.h:
  allow to inherit attributes from the DTD directly in the
  tree, this is needed for XPath and can be a useful feature.
  Inherited namespaces are always provided at the tree level now
* test/defattr* result/defattr* result/noent/defattr*: added a couple
  of tests for this feature (XSLT being the prime user).
Daniel
2001-08-07 01:10:10 +00:00
Daniel Veillard
0e4cd17b61 - parser.c: fixed UTF8 BOM support in push mode
- test/utf8bom.xml result/utf8bom.xml result/noent/utf8bom.xml:
  added a specific testcase
Daniel
2001-06-28 12:13:56 +00:00
Daniel Veillard
1731d6ae0a - xpath.c: trying to get 52979 solved
- tree.c result/ result/noent/: trying to get 52712 solved, this
  also made me clean up the fact that XML output in general should
  not add formating blanks by default, this changed the output of
  a few tests
Daniel
2001-04-10 16:38:06 +00:00