1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-01-12 09:17:37 +03:00
Commit Graph

614 Commits

Author SHA1 Message Date
Nick Wellnhofer
c6083a32d6 parser: Improve error handling in push parser
- Report errors earlier
- Align error messages with pull parser
2023-08-29 18:41:05 +02:00
Nick Wellnhofer
855818bd2b parser: Check for truncated multi-byte sequences
When decoding input data, check whether the "raw" buffer is empty after
parsing the document. Otherwise, the input ends with a truncated
multi-byte sequence which shouldn't be silently ignored.
2023-08-08 15:21:37 +02:00
Nick Wellnhofer
0ffc2d82b5 runtest: Skip element name in schema error messages
This makes sure that memory and streaming tests will report the same
messages.
2023-04-30 21:45:39 +02:00
Nick Wellnhofer
e4f85f1bd2 [CVE-2023-28484] Fix null deref in xmlSchemaFixupComplexType
Fix a null pointer dereference when parsing (invalid) XML schemas.

Thanks to Robby Simpson for the report!

Fixes #491.
2023-04-11 14:29:50 +02:00
David Kilzer
cb1b8b8516 xmlValidatePopElement() can return invalid value (-1)
Covered by:  test/VC/ElementValid5

This only affects XML Reader API with LIBXML_REGEXP_ENABLED and
LIBXML_VALID_ENABLED turned on.

* result/VC/ElementValid5.rdr:
- Update result to add missing error message.

* python/tests/reader2.py:
* result/VC/ElementValid6.rdr:
* result/VC/ElementValid7.rdr:
* result/valid/781333.xml.err.rdr:
- Update result to fix grammar issue.

* valid.c:
(xmlValidatePopElement):
- Check return value of xmlRegExecPushString() to handle -1, and
  assign 'ret = 0;' to return 0 from xmlValidatePopElement().
  This change affects xmlTextReaderValidatePop() from
  xmlreader.c.
- Fix grammar of error message by changing 'child' to
  'children'.
2023-04-10 13:21:53 -07:00
Nick Wellnhofer
d7d0bc6581 SAX2: Ignore namespaces in HTML documents
In commit 21ca8829, we started to ignore namespaces in HTML element
names but we still called xmlSplitQName, effectively stripping the
namespace prefix. This would cause elements like <o:p> being parsed
as <p>. Now we leave the name untouched.

Fixes #508.
2023-03-31 17:08:43 +02:00
Nick Wellnhofer
e20f4d7a65 xinclude: Fix quadratic behavior in xmlXIncludeLoadTxt
Also make text inclusions work with memory buffers, for example when
using a custom entity loader, and fix a memory leak in case of invalid
characters.

Fixes #483.
2023-02-14 12:25:07 +01:00
Nick Wellnhofer
be0ec005f3 xinclude: Abort immediately if max depth was exceeded
Avoids resource exhaustion if the maximum recursion depth was exceeded.

Note that the XInclude engine offers no protection against other
"billion laughs"-style amplification attacks as long as they stay below
the maximum depth.
2023-02-13 11:29:26 +01:00
Nick Wellnhofer
74aa61e0bd parser: Halt parser on DTD errors
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.

Found with libFuzzer.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
608c65bb8e xpath: number('-') should return NaN
Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/81
2023-01-18 15:15:41 +01:00
Nick Wellnhofer
d320a683d1 parser: Fix entity check in attributes
Don't set the "checked" flag when checking entities in default attribute
values. These entities could reference other entities which weren't
defined yet, so the check isn't reliable.

This fixes a short-lived regression which could lead to a call stack
overflow later in xmlStringGetNodeList.
2023-01-17 13:59:24 +01:00
Nick Wellnhofer
a41b09c739 parser: Improve detection of entity loops
Set a flag to detect entity loops at once instead of processing until
the depth limit is exceeded.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
d972393f30 parser: Only report a single entity error
Don't report errors multiple times for nested entity references.
2022-12-23 22:10:39 +01:00
Nick Wellnhofer
ae0c9cfa05 uri: Fix handling of port numbers
Allow port number without host, real fix for #71.

Also compare port numbers in xmlBuildRelativeURI.

Fix handling of port numbers in xmlUriEscape.
2022-12-13 01:43:49 +01:00
Nick Wellnhofer
76c6da4209 error: Make sure that error messages are valid UTF-8
This has caused issues with the Python bindings for a long time.

Should fix #64.
2022-12-04 23:34:19 +01:00
Nick Wellnhofer
9c63cea5a6 test: Add test for push parser boundaries 2022-11-20 21:27:59 +01:00
Nick Wellnhofer
68a6518c45 parser: Rewrite push parser boundary checks
Remove inaccurate xmlParseCheckTransition check.

Remove non-incremental xmlParseGetLasts check.

Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.

Fixes #439.
2022-11-20 21:27:08 +01:00
Nick Wellnhofer
76d6b0d768 html: Don't escape ASCII chars in href attributes
In several cases, href attributes can contain ASCII characters which are
illegal in URIs. Escaping them often does more harm than good.

Fixes #321.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
f61b8a6233 parser: Fix DTD parser progress checks
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, change some DTD parser
functions to make guaranteed progress on certain byte sequences.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
b456e3bb42 xinclude: Always allow XPtr expressions in external documents 2022-10-31 16:49:36 +01:00
Nick Wellnhofer
eef0a7395c xinclude: Implement "streaming" mode
When using xmlreader, XPointer expressions in XIncludes simply cannot
work. Expressions can reference nodes which weren't parsed yet or which
were already deleted.

After fixing nested XIncludes, we reference includes which were parsed
previously. When streaming, these nodes could have been deleted, leading
to use-after-free errors.

Disallow XPointer expressions and truncate the include table in
streaming mode.
2022-10-30 14:12:55 +01:00
Nick Wellnhofer
20e2fb4c1c xinclude: Avoid creation of subcontexts
Don't create subcontext in xmlXIncludeRecurseDoc. Save and restore 'doc'
and 'incTab' instead.

Make xmlXIncludeLoadFallback call xmlXIncludeCopyNode which seems safer
than xmlXIncludeDoProcess since the latter may modify the document.
This should also be more performant since we need to copy the whole
fallback subtree anyway. Also make sure to avoid replacements in
fallback elements in xmlXIncludeDoProcess.
2022-10-25 19:34:38 +02:00
Nick Wellnhofer
d2ed1e4f99 xinclude: Limit recursion depth
This avoids call stack overflows.
2022-10-23 18:52:56 +02:00
Nick Wellnhofer
34496f26db xinclude: Test for inclusion loops 2022-10-23 14:27:05 +02:00
Nick Wellnhofer
bc267cb9bc xinclude: Expand includes in xmlXIncludeCopyNode
This should make nested includes work reliably.

Fixes #424.
2022-10-23 14:27:05 +02:00
Nick Wellnhofer
ea7c9fb5dd xinclude: Don't create result doc for test with errors 2022-10-23 14:27:05 +02:00
Nick Wellnhofer
c99cde3f21 xinclude: Also test error messages
The reader interface with XIncludes is somewhat broken and can generate
different error messages. Start to move tests which are sketchy with
reader to a separate directory.
2022-10-23 14:26:59 +02:00
Nick Wellnhofer
938105b572 Revert "xinclude: Fix regression with nested includes"
This reverts commit 7f04e29731 which
caused memory errors.

See #424.
2022-10-21 15:56:12 +02:00
Nick Wellnhofer
7f04e29731 xinclude: Fix regression with nested includes
This reverts commits 74dcc10b and 87d20b55.

Fixes #424.
2022-10-18 19:17:45 +02:00
Nick Wellnhofer
1d4f5d24ac schemas: Fix null-pointer-deref in xmlSchemaCheckCOSSTDerivedOK
Found by OSS-Fuzz.
2022-09-13 16:56:59 +02:00
Nick Wellnhofer
c714979293 Fix --with-valid --without-regexps build
This build config resulted in segfaults in 'runtest'  because a special
xmlElementContentPtr showed up in a few places. I'm not sure if this is
the right fix.

An error message was changed to conform to the --with-regexps build.

There are still a few missing validity errors, so the tests don't pass.
2022-09-02 18:33:35 +02:00
Nick Wellnhofer
e986d09cf5 Skip incorrectly opened HTML comments
Commit 4fd69f3e fixed handling of '<' characters not followed by an
ASCII letter. But a '<!' sequence followed by invalid characters should
be treated as bogus comment and skipped.

Fixes #380.
2022-08-02 14:38:09 +02:00
Nick Wellnhofer
145170125a Fix parsing of subtracted regex character classes
Fixes #370.
2022-04-23 19:22:42 +02:00
Nick Wellnhofer
4612ce3031 Implement xpath1() XPointer scheme
See https://www.w3.org/2005/04/xpointer-schemes/
2022-04-21 04:26:52 +02:00
Nick Wellnhofer
41afa89fc9 Fix short-lived regression in xmlStaticCopyNode
Commit 7618a3b1 didn't account for coalesced text nodes.

I think it would be better if xmlStaticCopyNode didn't try to coalesce
text nodes at all. This code path can only be triggered if some other
code doesn't coalesce text nodes properly. In this case, OSS-Fuzz found
such behavior in xinclude.c.
2022-04-10 14:17:31 +02:00
Nick Wellnhofer
4de7f2acfe Remove unused result files 2022-04-04 04:28:15 +02:00
Nick Wellnhofer
f1c32b4c78 Allow missing result files in runtest
Treat missing files as empty.
2022-04-04 04:28:15 +02:00
Nick Wellnhofer
95c7f315ab Move SVG tests to runtest.c
Also update the test results for the first time since 2000.
2022-04-04 04:18:07 +02:00
Nick Wellnhofer
48b03c8479 Remove major parts of old test suite
Remove all the parts of the old test suite which are covered by
runtest.c for quite some time.

The following test programs are removed:

- testC14N
- testHTML
- testReader
- testRelax
- testSAX
- testSchemas
- testURI
- testXPath

This also removes a few results of unimportant tests only run by the old
test suite.
2022-04-04 04:14:55 +02:00
Nick Wellnhofer
57b81c208c Normalize XPath strings in-place
Simplify the code and fix a potential memory leak.

Fixes #343.
2022-03-05 18:22:51 +01:00
Nick Wellnhofer
bc06a522c1 Fix recursion check in xinclude.c
Compare the included URL with the document's URL to detect local
inclusions.

Fixes #348.
2022-03-02 20:44:41 +01:00
Mike Dalessio
d7b287b94c htmlParseComment: handle abruptly-closed comments
See guidance provided on abrutply-closed comments here:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
2022-03-02 14:42:47 +00:00
Mike Dalessio
24cdc89006 test coverage for abruptly-closed comments
These establish baseline behavior so that the subsequent commit is
clear about the behavior it will modify.
2022-03-02 14:42:47 +00:00
Nick Wellnhofer
ea6e8f998d Fix certain combinations of regex range quantifiers
Fix regex transitions that have both min/max and a counter. In this
case, we want to save the regex state before incrementing the counter.

Fixes #301 and the issue reported here:

https://mail.gnome.org/archives/xml/2016-April/msg00017.html
2022-02-28 16:56:02 +01:00
Nick Wellnhofer
382fb056b5 Fix range quantifier on subregex
Make sure to add counted exit transitions before other counter
transitions. Otherwise, we won't backtrack correctly.

Fixes #65.
2022-02-28 16:56:02 +01:00
Nick Wellnhofer
ce0871e15c Only warn on invalid redeclarations of predefined entities
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to

- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.

Partial fix for #308.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
652dd12a85 [CVE-2022-23308] Use-after-free of ID and IDREF attributes
If a document is parsed with XML_PARSE_DTDVALID and without
XML_PARSE_NOENT, the value of ID attributes has to be normalized after
potentially expanding entities in xmlRemoveID. Otherwise, later calls
to xmlGetID can return a pointer to previously freed memory.

ID attributes which are empty or contain only whitespace after
entity expansion are affected in a similar way. This is fixed by
not storing such attributes in the ID table.

The test to detect streaming mode when validating against a DTD was
broken. In connection with the defects above, this could result in a
use-after-free when using the xmlReader interface with validation.
Fix detection of streaming mode to avoid similar issues. (This changes
the expected result of a test case. But as far as I can tell, using the
XML reader with XIncludes referencing the root document never worked
properly, anyway.)

All of these issues can result in denial of service. Using xmlReader
with validation could result in disclosure of memory via the error
channel, typically stderr. The security impact of xmlGetID returning
a pointer to freed memory depends on the application. The typical use
case of calling xmlGetID on an unmodified document is not affected.
2022-02-19 19:26:42 +01:00
Nick Wellnhofer
9edc20c154 Fix double counting of CRLF in comments
Fixes #151.
2022-02-07 20:54:07 +01:00
Nick Wellnhofer
5408c10c37 Don't normalize namespace URIs in XPointer xmlns() scheme
Namespace URIs should be compared without escaping or unescaping:

https://www.w3.org/TR/REC-xml-names/#NSNameComparison

Fixes #289.
2022-02-04 14:00:09 +01:00
Nick Wellnhofer
1c7d91abe4 Fix handling of XSD with empty namespace
An empty namespace means no default namespace.

Fixes #303.
2022-02-03 23:31:19 +01:00
Nick Wellnhofer
f480f7509c Update NewsML DTD in test suite
Switch to version 1.2 which has a clearer license.

Fixes #291.
2022-02-03 14:43:17 +01:00
Nick Wellnhofer
d85245f934 Fix regression with PEs in external DTD
Fix a regression introduced with commit a28f7d87. In some cases,
parameter entity references in external DTDs wouldn't be expanded.

Fixes #306.
2022-01-16 21:56:10 +01:00
David Kilzer
03bb929390 Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().

* encoding.c:
(UTF16LEToUTF8):
- Fix comment to describe what the code does.
(UTF16BEToUTF8):
- Fix undefined behavior which was applied to UTF16LEToUTF8() in
  2f9382033e.
- Add bounds check to while() loop which was applied to
  UTF16LEToUTF8() in be803967db.
- Do not return -2 when (in >= inend) to fix the bug.  This was
  applied to UTF16LEToUTF8() in 496a1cf592.
- Inline (<< 8) statements to match UTF16LEToUTF8().

Add the following tests and results:

  test/text-4-byte-UTF-16-BE-offset.xml
  test/text-4-byte-UTF-16-BE.xml
  test/text-4-byte-UTF-16-LE-offset.xml
  test/text-4-byte-UTF-16-LE.xml
2022-01-16 14:07:17 +01:00
Nick Wellnhofer
2732b23466 Fix regression parsing public IDs literals in HTML
Fix regression introduced when reworking htmlParsePubidLiteral in
commit 93ce33c2.

Fixes #318.
2022-01-10 13:37:59 +01:00
Nick Wellnhofer
de5b624f10 Fix handling of unexpected EOF in xmlParseContent
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.

This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
2021-05-08 20:47:36 +02:00
Nick Wellnhofer
3e80560d4b Fix line numbers in error messages for mismatched tags
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.

This also fixes line numbers in the SAX push parser which were never
handled correctly.
2021-05-07 11:48:11 +02:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
Nick Wellnhofer
79301d3d5e Fix timeout when handling recursive entities
Abort parsing early to avoid an almost infinite loop in certain error
cases involving recursive entities.

Found with libFuzzer.
2020-12-18 14:13:46 +01:00
Mike Dalessio
a67b63d183 use new htmlParseLookupCommentEnd to find comment ends
Note that the caret in error messages generated during comment parsing
may have moved by one byte.

See guidance provided on incorrectly-closed comments here:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
2020-12-16 16:12:07 +01:00
Mike Dalessio
29f5d20e84 htmlParseComment: treat --!> as if it closed the comment
See guidance provided on incorrectly-closed comments here:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
2020-12-16 16:12:07 +01:00
Mike Dalessio
e28d9347bc add test coverage for incorrectly-closed comments
this establishes the baseline behavior so that subsequent commits
which modify this behavior are clear about what's being changed.
2020-12-16 16:12:07 +01:00
Nick Wellnhofer
87d20b554c Fix regression introduced with commit 74dcc10b
The code wasn't dead after all, but I can see no reason in delaying
the XPointer evaluation. This could lead to nodes included earlier
appearing in XPointer results.
2020-08-19 13:52:08 +02:00
Nick Wellnhofer
d88df4bd48 Fix corner case with empty xi:fallback
xi:fallback could become empty after recursive expansion. Use a flag
to track whether nodes should be skipped.
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
1abf2967f9 Fix exponential runtime and memory in xi:fallback processing
When creating XML_XINCLUDE_START nodes, the children of the original
xi:include node must be freed, otherwise fallback content is copied
twice, doubling runtime and memory consumption for each nested
xi:fallback/xi:include pair.

Found with libFuzzer.
2020-08-07 19:59:07 +02:00
Nick Wellnhofer
0f9817c75b Don't recurse into xi:include children in xmlXIncludeDoProcess
Otherwise, nested xi:include nodes might result in a use-after-free
if XML_PARSE_NOXINCNODE is specified.

Found with libFuzzer and ASan.
2020-08-06 14:29:33 +02:00
Nick Wellnhofer
93ce33c2b8 Fix several quadratic runtime issues in HTML push parser
Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.

Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.

Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.

Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.

Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.

Fix usage of IS_CHAR macro in htmlParseScript.
2020-07-23 20:47:35 +02:00
David Kilzer
6b4717d61d Add regexp regression tests
- Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup
  <https://bugzilla.gnome.org/show_bug.cgi?id=757711>
- Bug 783015 - Integer-overflow in xmlFAParseQuantExact
  <https://bugzilla.gnome.org/show_bug.cgi?id=783015>

(Regexptests): Add support for checking stderr output when
running regexp tests.  This makes it possible to check in test
cases that fail and not see false-positive error output when
running the tests.  Unlike other libxml2 test suites, if there
is no stderr output, no *.err file needs to be created.
2020-07-06 12:37:53 +02:00
Nick Wellnhofer
477c7f6aff Fix quadratic runtime in HTML parser
Commit eeb99329 removed an important optimization avoiding quadratic
runtime when repeatedly scanning the input buffer for terminating
characters in the HTML push parser. The related bug is

    https://bugzilla.gnome.org/show_bug.cgi?id=444994

Make sure that ctxt->checkIndex is always written and store additional
parser state in ctxt->inSubset which is unused in the HTML parser.

Found by OSS-Fuzz.
2020-07-06 12:17:20 +02:00
Nick Wellnhofer
32cb5dccda Add test case for recursive external parsed entities 2020-02-11 17:36:43 +01:00
Nick Wellnhofer
f20daa9e51 Enable error tests with entity substitution 2020-02-11 17:36:43 +01:00
Nick Wellnhofer
eddfbc38fa Don't load external entity from xmlSAX2GetEntity
Despite the comment, I can't see a reason why external entities must be
loaded in the SAX handler. For external entities, the handler is
typically first invoked via xmlParseReference which will later load the
entity on its own if it wasn't loaded yet.

The old code also lead to duplicated SAX events which makes it
basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser.
See the change to the expected test output.

Note that xmlSAX2GetEntity was loading the entity via
xmlParseCtxtExternalEntity while xmlParseReference uses
xmlParseExternalEntityPrivate. In the previous commit, the two
functions were merged, trying to compensate for some slight differences
between the two mostly identical implementations.

But the more urgent reason for this change is that xmlParseReference
has the facility to abort early when recursive entities are detected,
avoiding what could practically amount to an infinite loop.

If you want to backport this change, note that the previous three
commits are required as well:

f9ea1a24 Fix copying of entities in xmlParseReference
5c7e0a9a Copy some XMLReader option flags to parser context
1a3e584a Merge code paths loading external entities

Found by OSS-Fuzz.
2020-02-11 17:35:42 +01:00
Nick Wellnhofer
f9ea1a24ed Fix copying of entities in xmlParseReference
Before, reader mode would end up in a branch that didn't handle
entities with multiple children and failed to update ent->last, so the
hack copying the "extra" reader data wouldn't trigger. Consequently,
some empty nodes in entities are correctly detected now in the test
suite. (The detection of empty nodes in entities is still buggy,
though.)
2020-02-11 16:37:52 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
c2f209c09f Disallow conditional sections in internal subset
Conditional sections are only allowed in *external* parameter entities
referenced from the internal subset.
2019-09-30 15:47:30 +02:00
Nick Wellnhofer
c51e38cb3a Make xmlParseConditionalSections non-recursive
Avoid call stack overflow in deeply nested conditional sections.

Found by OSS-Fuzz.
2019-09-30 15:47:30 +02:00
Nick Wellnhofer
99a864a1f7 Fix Regextests
- One of the bug316338 test cases is expected to succeed.
- Memory leak in testRegexp.c.
- Refcount handling in xmlExpHashGetEntry.
2019-09-25 15:27:45 +02:00
Nick Wellnhofer
c2b0a184a9 Fix empty branch in regex
Fixes bug 649244:
https://bugzilla.gnome.org/show_bug.cgi?id=649244

Closes #57.
2019-09-25 14:22:47 +02:00
Nick Wellnhofer
62150ed2ab Make xmlParseContent and xmlParseElement non-recursive
Split xmlParseElement into subfunctions. Use nameNsPush to store prefix,
URI and nsNr on the heap, similar to the push parser.

Closes #84.
2019-09-23 17:45:50 +02:00
Nick Wellnhofer
6705f4d28e Remove executable bit from non-executable files 2019-09-16 15:48:59 +02:00
Nick Wellnhofer
eee1dd5acf Fix expected output of test/schemas/any4
libxml2 correctly rejects any4_0.xsd as invalid schema. I can't figure
out what the intent behind this test case was. Simply adjust the
expected output to match the current behavior.

Closes #92.
2019-09-16 15:36:44 +02:00
Nick Wellnhofer
e8c9cd5c7a Fix Schema determinism check of ##other namespaces
Non-compound (##local) and compound string atoms are always disjoint
regardless of whether the compound atom is negated (##other).

Closes #40.
2019-09-16 15:36:02 +02:00
bettermanzzy
01d8cf07d9 Misleading error message with xs:{min|max}Inclusive
Closes #53.
2019-08-25 14:12:34 +02:00
Jan Pokorný
ea695ac0d6 Fix unability to RelaxNG-validate grammar with choice-based name class
Previously, test/relaxng/ambig_name-class2.xml would fail to validate
against test/relaxng/ambig_name-class2.rng:

> test/relaxng/ambig_name-class2.rng:4:
>   element attribute: Relax-NG parser error :
>       Found anyName attribute without oneOrMore ancestor
> Relax-NG schema test/relaxng/ambig_name-class2.rng failed to compile

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2019-08-25 13:29:04 +02:00
Jan Pokorný
8074b88179 Fix unability to validate ambiguously constructed interleave for RelaxNG
Previously, test/relaxng/ambig_name-class.xml would fail to validate
for a simple reason -- interleave within "open-name-class" context
is supposed to be fine with whatever else is pending the consumption,
since effectively, it's unrelated from a higher parsing perspective.

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2019-08-25 13:29:04 +02:00
Nick Wellnhofer
f9fce96313 Fix unsigned integer overflow
It's defined behavior but -fsanitize=unsigned-integer-overflow is
useful to discover bugs.
2019-05-20 13:38:22 +02:00
Nick Wellnhofer
c2f4da1a93 Improve XPath predicate and filter evaluation
Consolidate code paths evaluating XPath predicates and filters.

Don't push context node on stack when evaluating predicates. I have no
idea why this was done. It seems completely useless and trying to pop
the context node from a corrupted stack has already caused security
issues.

Filter nodesets in-place and don't create node sets with NULL gaps which
allows to simplify merging a great deal. Simply move matched nodes
backward and create a compact node set.

Merge xmlXPathCompOpEvalPositionalPredicate into
xmlXPathCompOpEvalPredicate.
2019-04-22 14:48:46 +02:00
Nick Wellnhofer
30a6533e01 Fix float casts in xmlXPathSubstringFunction
Rewrite conversion of double to int in xmlXPathSubstringFunction, adding
range checks to avoid undefined behavior. Make sure to add start and
length as floating-point numbers before converting to int. Fix a bug
when rounding negative start indices.

Remove unneeded calls to xmlXPathIs{Inf,NaN} and rely on IEEE math
instead. Avoid computing the string length. xmlUTF8Strsub works as
expected if the length of the requested substring exceeds the input.

Found with libFuzzer and UBSan.
2019-03-08 14:29:59 +01:00
Nikolai Weibull
c64d4efb31 Remove redefined starts and defines inside include elements
When including a grammar from another grammar, we need to make sure that any
redefines of starts and includes that that grammar does inside any of its
include elements are also removed.
2018-11-29 21:06:06 +01:00
Nikolai Weibull
46da8fc529 Allow choice within choice in nameClass in RELAX NG
The pattern nameClass allows for nested choice elements, for example

  <name>
    <choice>
      <choice>
        <name>a</name>
        <name>b</name>
      </choice>
      <name>c</name>
    </choice>
  </name>

which is semantically equivalent to

  <name>
    <choice>
      <name>a</name>
      <name>b</name>
      <name>c</name>
    </choice>
  </name>

The old code didn’t handle this correctly, as it never expected a choice inside
another choice.  This patch fixes this by flattening any nested choices.

This pattern of nested choice elements comes up in RELAX NG simplification,
where all choice elements are rewritten in this nested manner, see section 4.12
of the RELAX NG specification.
2018-11-29 21:03:11 +01:00
Nikolai Weibull
4338c310eb Look inside divs for starts and defines inside include
RELAX NG allows for div elements inside of include elements.  We need to look
inside those div elements for start and define elements that may be redefining
start and define elements in the included grammar.
2018-11-29 21:00:46 +01:00
Nick Wellnhofer
123234f2cf Free input buffer in xmlHaltParser
This avoids miscalculation of available bytes.

Thanks to Yunho Kim for the report.

Closes: #26
2018-09-11 15:06:17 +02:00
Nick Wellnhofer
7218255092 Add test for ICU flush and pivot buffer 2017-11-04 15:38:58 +01:00
Nick Wellnhofer
5af594d8bc Fix comparison of nodesets to strings
Fix two bugs in xmlXPathNodeValHash which could lead to errors when
comparing nodesets to strings:

- Only use contents of text nodes to compute the hash for element nodes.
  Comments, PIs, and other node types don't affect the string-value and
  must be ignored.
- Reset `string` to NULL for node types other than text.

Reported by Aleksei on the mailing list:

    https://mail.gnome.org/archives/xml/2017-September/msg00016.html
2017-10-07 15:22:57 +02:00
Nick Wellnhofer
69936b129f Revert "Print error messages for truncated UTF-8 sequences"
This reverts commit 79c8a6b which caused a serious regression in
streaming mode.

Also reverts part of commit 52ceced "Fix infinite loops with push
parser in recovery mode".

Fixes bug 786554.
2017-08-30 14:19:06 +02:00
Nick Wellnhofer
899a5d9f0e Detect infinite recursion in parameter entities
When expanding a parameter entity in a DTD, infinite recursion could
lead to an infinite loop or memory exhaustion.

Thanks to Wei Lei for the first of many reports.

Fixes bug 759579.
2017-07-25 15:21:12 +02:00
Nick Wellnhofer
872fea9485 Get rid of "blanks wrapper" for parameter entities
Now that replacement of parameter entities goes exclusively through
xmlSkipBlankChars, we can account for the surrounding space characters
there and remove the "blanks wrapper" hack.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
24246c7626 Fix xmlHaltParser
Pop all extra input streams before resetting the input. Otherwise,
a call to xmlPopInput could make input available again.

Also set input->end to input->cur.

Changes the test output for some error tests. Unfortunately, some
fuzzed test cases were added to the test suite without manual cleanup.
This makes it almost impossible to review the impact of later changes
on the test output.
2017-06-20 13:15:43 +02:00
Nick Wellnhofer
8bbe4508ef Spelling and grammar fixes
Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other
misspellings.
2017-06-17 16:34:23 +02:00
Nick Wellnhofer
5f440d8cad Rework entity boundary checks
Make sure to finish all entities in the internal subset. Nevertheless,
readd a sanity check in xmlParseStartTag2 that was lost in my previous
commit. Also add a sanity check in xmlPopInput. Popping an input
unexpectedly was the source of many recent memory bugs. The check
doesn't mitigate such issues but helps with diagnosis.

Always base entity boundary checks on the input ID, not the input
pointer. The pointer could have been reallocated to the old address.

Always throw a well-formedness error if a boundary check fails. In a
few places, a validity error was thrown.

Fix a few error codes and improve indentation.
2017-06-17 13:25:53 +02:00
Nick Wellnhofer
dbaab1f369 Test SAX2 callbacks with entity substitution
This detects regressions like bug 760367.
2017-06-16 21:38:57 +02:00