1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-01-07 17:17:39 +03:00
Commit Graph

4964 Commits

Author SHA1 Message Date
Nick Wellnhofer
0d9da0290c Test fuzz targets with dummy driver
Run fuzz targets with files in seed corpus during test.
2020-08-24 03:57:03 +02:00
Nick Wellnhofer
3fcf319378 Fix regression introduced with commit d88df4b
Revert the commit and use a different approach.

Found by OSS-Fuzz.
2020-08-22 00:50:42 +02:00
Nick Wellnhofer
87d20b554c Fix regression introduced with commit 74dcc10b
The code wasn't dead after all, but I can see no reason in delaying
the XPointer evaluation. This could lead to nodes included earlier
appearing in XPointer results.
2020-08-19 13:52:08 +02:00
Nick Wellnhofer
fbb7fa9a9a Fix memory leak in xmlXIncludeAddNode error paths
Found by OSS-Fuzz.
2020-08-19 13:13:48 +02:00
Nick Wellnhofer
19cae17f5a Revert "Fix quadratic runtime in xi:fallback processing"
This reverts commit 27119ec33c.

Not copying fallback children didn't fix up namespaces and could lead
to use-after-free errors.

Found by OSS-Fuzz.
2020-08-19 13:13:41 +02:00
Nick Wellnhofer
d63cfeca35 Add TODO comment in xinclude.c
Add some thoughts on the major remaining problems with the XInclude
implementation.
2020-08-17 15:42:20 +02:00
Nick Wellnhofer
804c52978f Stop using maxParserDepth in xpath.c
Only use a single maxDepth value.
2020-08-17 03:39:51 +02:00
Nick Wellnhofer
74dcc10b55 Remove dead code in xinclude.c
'doc' is checked for NULL in xmlXIncludeLoadDoc, so several code
paths can be eliminated.
2020-08-17 03:24:56 +02:00
Nick Wellnhofer
0ff527482d Fix autotools warnings 2020-08-17 02:54:28 +02:00
Nick Wellnhofer
2c74712977 Fix error reporting with xi:fallback
When reporting errors, don't use href of xi:include if xi:fallback
was used. I think this can only be reproduced with
"xmllint --postvalid", see the original bug report:

https://bugzilla.gnome.org/show_bug.cgi?id=152623
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
27119ec33c Fix quadratic runtime in xi:fallback processing
Copying the tree would lead to runtime quadratic in nested fallback
depth, similar to naive string concatenation.
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
d88df4bd48 Fix corner case with empty xi:fallback
xi:fallback could become empty after recursive expansion. Use a flag
to track whether nodes should be skipped.
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
00a86d414b Don't add formatting newlines to XInclude nodes 2020-08-17 01:17:39 +02:00
Nick Wellnhofer
dba82a8c04 Fix XInclude regression introduced with recent commit
The change to xmlXIncludeLoadFallback in commit 11b57459 could
process already freed nodes if text nodes were merged after deleting
nodes with an empty fallback.

Found by OSS-Fuzz.
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
e1c2d0adf0 Fix memory leak in runtest.c 2020-08-17 01:17:39 +02:00
Nick Wellnhofer
2b4769a6bd Make "xmllint --push --recovery" work 2020-08-17 01:17:39 +02:00
Nick Wellnhofer
99fc048d7f Don't use SAX1 if all element handlers are NULL
Running xmllint with "--sax --noout" installs a SAX2 handler with all
callbacks set to NULL. In this case or similar situations, we don't want
to switch to SAX1 parsing.
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
c1ba6f54d3 Revert "Do not URI escape in server side includes"
This reverts commit 960f0e2756.

This commit introduced

- an infinite loop, found by OSS-Fuzz, which could be easily fixed.
- an algorithm with quadratic runtime
- a security issue, see
  https://bugzilla.gnome.org/show_bug.cgi?id=769760

A better approach is to add an option not to escape URLs at all
which libxml2 should have possibly done in the first place.
2020-08-15 18:32:29 +02:00
Nick Wellnhofer
b82fa3dd26 Fix column number accounting in xmlParse*NameAndCompare
Thanks to Frederic Vancraeyveldt for the report.
2020-08-09 15:02:01 +02:00
Nick Wellnhofer
438e595a8c Stop counting nbChars in parser context
The value was inaccurate and never used.
2020-08-09 15:01:45 +02:00
Nick Wellnhofer
f6a9541fb8 Remove unneeded progress checks in HTML parser
The HTML parser should now be guaranteed to make progress, so the
checks became unnecessary.
2020-08-09 14:54:37 +02:00
Nick Wellnhofer
9de7b94d4f Use strcmp when fuzzing
This should improve data-flow-guided fuzzing.
2020-08-08 20:37:30 +02:00
Nick Wellnhofer
10a0794878 Fix XPath fuzzer 2020-08-08 17:46:11 +02:00
Nick Wellnhofer
6c128fd58a Fuzz XInclude engine 2020-08-08 14:32:44 +02:00
Nick Wellnhofer
50f06b3efb Fix out-of-bounds read with 'xmllint --htmlout'
Make sure that truncated UTF-8 sequences don't cause an out-of-bounds
array access.

Thanks to @SuhwanSong and the Agency for Defense Development (ADD) for
the report.

Fixes #178.
2020-08-07 21:54:27 +02:00
Nick Wellnhofer
1abf2967f9 Fix exponential runtime and memory in xi:fallback processing
When creating XML_XINCLUDE_START nodes, the children of the original
xi:include node must be freed, otherwise fallback content is copied
twice, doubling runtime and memory consumption for each nested
xi:fallback/xi:include pair.

Found with libFuzzer.
2020-08-07 19:59:07 +02:00
Nick Wellnhofer
11b5745927 Don't process siblings of root in xmlXIncludeProcess
xmlXIncludeDoProcess would follow the siblings of the tree root and
also expand these nodes. When using an XML reader, this could lead to
siblings of the current node being expanded without having been parsed
completely.
2020-08-07 18:51:52 +02:00
Nick Wellnhofer
0f9817c75b Don't recurse into xi:include children in xmlXIncludeDoProcess
Otherwise, nested xi:include nodes might result in a use-after-free
if XML_PARSE_NOXINCNODE is specified.

Found with libFuzzer and ASan.
2020-08-06 14:29:33 +02:00
Nick Wellnhofer
5725c1153a Fix memory leak in xmlXIncludeIncludeNode error paths
Found with libFuzzer and ASan.
2020-08-06 14:29:24 +02:00
Nick Wellnhofer
ad26a60f95 Add XPath and XPointer fuzzer 2020-08-06 14:12:32 +02:00
Nick Wellnhofer
956534e02e Check for custom free function in global destructor
Calling a custom deallocation function in the global destructor could
cause all kinds of unexpected problems. See for example

    https://github.com/sparklemotion/nokogiri/issues/2059

Only clean up if memory is managed with malloc/free.
2020-08-04 19:27:13 +02:00
Nick Wellnhofer
8e7c20a1af Fix integer overflow when comparing schema dates
Found by OSS-Fuzz.
2020-08-03 17:35:53 +02:00
Nick Wellnhofer
905820a44c Update fuzzing code
- Shorten timeouts
- Align options from Makefile and options files
- Add section headers to Makefile
- Skip invalid UTF-8 in regexp fuzzer
- Update regexp.dict
- Generate HTML seed corpus in correct format
2020-07-31 11:55:13 +02:00
Nick Wellnhofer
68eadabd00 Fix exponential runtime in xmlFARecurseDeterminism
In order to prevent visiting a state twice, states must be marked as
visited for the whole duration of graph traversal because states might
be reached by different paths. Otherwise state graphs like the
following can lead to exponential runtime:

  ->O-->O-->O-->O-->O->
     \ / \ / \ / \ /
      O   O   O   O

Reset the "visited" flag only after the graph was traversed.

xmlFAComputesDeterminism still has massive performance problems when
handling fuzzed input. By design, it has quadratic time complexity in
the number of reachable states. Some issues might also stem from
redundant epsilon transitions. With this fix, fuzzing regexes with a
maximum length of 100 becomes feasible at least.

Found with libFuzzer.
2020-07-31 11:55:13 +02:00
Nick Wellnhofer
1a360c1c2e More *NodeDumpOutput fixes
When leaving nodes, restrict more operations to XML_ELEMENT_NODEs.
2020-07-29 00:39:15 +02:00
Nick Wellnhofer
7b2e517261 Fix *NodeDumpOutput functions
Only output end tag for elements. Should fix serialization of document
fragments.
2020-07-28 21:52:55 +02:00
Nick Wellnhofer
dc6f009280 Make xmlNodeDumpOutputInternal non-recursive
Fixes stack overflow with deeply nested documents.
2020-07-28 21:00:09 +02:00
Nick Wellnhofer
5330153da4 Make xhtmlNodeDumpOutput non-recursive
Fixes stack overflow with deeply nested documents.
2020-07-28 21:00:09 +02:00
Nick Wellnhofer
b79ab6e6d9 Make htmlNodeDumpFormatOutput non-recursive
Fixes stack overflow with deeply nested HTML documents.

Found by OSS-Fuzz.
2020-07-28 03:44:30 +02:00
Nick Wellnhofer
21ca8829a7 Don't try to handle namespaces when building HTML documents
Don't try to resolve namespace in xmlSAX2StartElement when parsing
HTML documents. This useless operation could slow down the parser
considerably.

Found by OSS-Fuzz.
2020-07-25 17:57:29 +02:00
Nick Wellnhofer
93ce33c2b8 Fix several quadratic runtime issues in HTML push parser
Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.

Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.

Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.

Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.

Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.

Fix usage of IS_CHAR macro in htmlParseScript.
2020-07-23 20:47:35 +02:00
Nick Wellnhofer
10d0947249 Fix .gitattributes
The files in 'test' and 'result' have mixed line endings, so disable
end-of-line conversion.
2020-07-23 20:46:42 +02:00
Nick Wellnhofer
173a0830dc Fix quadratic runtime when push parsing HTML start tags
Make sure that htmlParseStartTag doesn't terminate on characters for
which IS_CHAR_CH is false like control chars.

In htmlParseTryOrFinish, only switch to START_TAG if the next character
starts a valid name. Otherwise, htmlParseStartTag might return without
consuming all characters up to the final '>'.

Found by OSS-Fuzz.
2020-07-22 23:33:04 +02:00
David Kilzer
0e5c4fec15 Reset XML parser input before reporting errors
Apply changes to htmlParseChunk() in 13ba5b61 and 3f18e748 to
xmlParseChunk().
2020-07-19 14:10:33 +02:00
Nick Wellnhofer
6995eed077 Fix quadratic runtime when push parsing HTML entity refs
The HTML push parser would look ahead for characters in "; >/" to
terminate an entity reference but actual parsing could stop earlier,
potentially resulting in quadratic runtime.

Parse char data and references alternately in htmlParseTryOrFinish
and only look ahead once for a terminating '<' character.

Found by OSS-Fuzz.
2020-07-19 14:05:57 +02:00
Nick Wellnhofer
8e219b154e Fix HTML push parser lookahead
The parsing rules when looking for terminating chars or sequences in
the push parser differed from the actual parsing code. This could
result in the lookahead to overshoot and data being rescanned,
potentially leading to quadratic runtime.

Comments must never be handled during lookahead. Attribute values must
only be skipped for start tags and doctype declarations, not for end
tags, comments, PIs and script content.
2020-07-15 16:44:36 +02:00
Nick Wellnhofer
e050062ca9 Make htmlCurrentChar always translate U+0000
The general assumption is that htmlCurrentChar only returns 0 if the
end of the input buffer is reached. The UTF-8 path already logged an
error if a zero byte U+0000 was found and returned a space character
instead. Make the ASCII code path do the same.

htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so
even if 0 was returned from htmlCurrentChar, the push parser would make
progress. But rescanning the input could cause performance problems.

The pull parser would abort parsing and now handles zero bytes in ASCII
mode the same way as the push parser or as in UTF-8 mode.

It would be better to return the replacement character U+FFFD instead,
but some of the client code assumes that the UTF-8 length of input and
output matches.
2020-07-15 16:10:13 +02:00
Nick Wellnhofer
dfd4e33048 Rework control flow in htmlCurrentChar
Don't call xmlCurrentChar after switching encodings. Rearrange code
blocks and fall through to normal UTF-8 handling.
2020-07-15 16:10:13 +02:00
Nick Wellnhofer
922bebccdd Make 'xmllint --html --push -' read from stdin 2020-07-15 14:20:42 +02:00
Nick Wellnhofer
1493130ef2 Fix UTF-8 decoder in HTML parser
Reject sequences starting with a continuation byte as well as overlong
sequences like the XML parser.

Also fixes an infinite loop in connection with previous commit 50078922
since htmlCurrentChar would return 0 even if not at the end of the
buffer.

Found by OSS-Fuzz.
2020-07-15 12:54:25 +02:00