1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 20:25:14 +03:00
Commit Graph

116 Commits

Author SHA1 Message Date
Nick Wellnhofer
da996c8d0f uri: Report malloc failures
Fix many places where malloc failures weren't reported, for example
after calling xmlStrdup.

Introduce new public API functions that return a separate error code if
a memory allocation fails:

- xmlParseURISafe
- xmlBuildURISafe
- xmlBuildRelativeURISafe

Update the fuzzer to check whether malloc failures are reported.
2023-12-11 22:05:47 +01:00
Nick Wellnhofer
ec7f65069a tests: Fix tests --with-valid --without-xinclude
Fix a copy/paste error from commit 4eba9f9c.

Fixes #632.
2023-11-27 18:03:01 +01:00
Nick Wellnhofer
4f132bcdb3 fuzz: Raise rss_limit_mb 2023-10-15 13:04:54 +02:00
Nick Wellnhofer
c13a019134 fuzz: Test xmlTextReaderRead after EOF or failure 2023-10-15 13:04:54 +02:00
Nick Wellnhofer
e019d97fd0 fuzz: Test XML_PARSE_XINCLUDE | XML_PARSE_VALID 2023-10-15 13:04:54 +02:00
Nick Wellnhofer
fa48187304 fuzz: Disable XML_PARSE_SAX1 option in xml fuzzer
There a no plans to fix quadratic behavior in the legacy SAX1 interface.
2023-09-30 14:45:53 +02:00
Nick Wellnhofer
b7d56ef7f1 malloc-fail: Report malloc failure in xmlRegEpxFromParse
Also check whether malloc failures are reported when fuzzing.
2023-09-22 19:53:11 +02:00
Nick Wellnhofer
f98fa86318 regexp: Fix status codes and handle invalid UTF-8
Fixes #561.
2023-09-22 19:01:11 +02:00
Nick Wellnhofer
f9d717af97 fuzz: Allow to fuzz without push, reader or output modules 2023-09-21 13:05:49 +02:00
Nick Wellnhofer
da274bfa55 build: Fix build when certain modules are disabled 2023-09-21 02:26:43 +02:00
Nick Wellnhofer
834b8123ef parser: Stream data when reading from memory
Don't create a copy of the whole input buffer. Read the data chunk by
chunk to save memory.

Historically, it was probably envisioned to read data from memory
without additional copying. This doesn't work reliably with the current
design of the XML parser which requires a terminating null byte at the
end of input buffers. This lead to xmlReadMemory interfaces, which
expect pointer and size arguments, being changed to make a
zero-terminated copy of the input buffer. Interfaces based on
xmlReadDoc, which actually expect a zero-terminated string and
would make zero-copy operation work, were then simplified to rely on
xmlReadMemoryi, resulting in an unnecessary copy.

To avoid copying (possibly gigabytes) of memory temporarily, we now
stream in-memory input just like content read from files in a
chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of
250 bytes). As a side effect, we also avoid another copy of the whole
input when handling non-UTF-8 data which was made possible by some
earlier commits.

Interfaces expecting zero-terminated strings now make use of strnlen
which unfortunately isn't part of the standard C library and only
mandated since POSIX 2008.
2023-08-08 15:21:28 +02:00
Nick Wellnhofer
5f4ec41bae fuzz: Add valid.options 2023-03-12 19:47:07 +01:00
Nick Wellnhofer
f6fddb78a5 fuzz: Also test init function of URI fuzzer 2023-03-12 16:20:31 +01:00
Nick Wellnhofer
4eba9f9cfc fuzz: Separate fuzzer for DTD validation 2023-03-12 16:19:33 +01:00
Nick Wellnhofer
42322eba82 fuzz: Inject random malloc failures
Fixes #344.
2023-03-08 14:14:22 +01:00
Nick Wellnhofer
7cd2676277 fuzz: Add maxAlloc item to static seed corpus 2023-03-08 14:07:15 +01:00
Nick Wellnhofer
541b1e2850 fuzz: Support variable integer sizes in fuzz data
Also switch to big-endian.
2023-03-08 13:59:00 +01:00
Nick Wellnhofer
f560065f4d fuzz: Fix duplicate detection in fuzzEntityRecorder
Store a non-NULL value in the hash.
2023-02-28 21:23:11 +01:00
Nick Wellnhofer
791a1e80b9 fuzz: Set filename in xmlFuzzEntityLoader 2023-02-28 21:23:11 +01:00
Nick Wellnhofer
cbd9c6c5af fuzz: Allow xmlFuzzReadString(NULL) 2023-02-28 21:23:11 +01:00
Nick Wellnhofer
aa6b7ed1ed fuzz: Fix Makefile dependencies 2023-02-28 21:23:11 +01:00
Nick Wellnhofer
d1272c2ed6 fuzz: Add xinclude to .gitignore 2023-02-13 11:16:57 +01:00
Nick Wellnhofer
ba910d344f fuzz: Add test/recurse to seed corpus 2022-12-26 18:12:26 +01:00
Nick Wellnhofer
09dac45ab9 fuzz: Add separate XInclude fuzzer
XIncludes involve XPath processing which can still lead to timeouts when
fuzzing. This will probably take a while to fix. The rest of the XML
parsing code should hopefully run without timeouts now. OSS-Fuzz only
shows a single timeout test case, so separate the XInclude from the core
XML fuzzer.
2022-12-26 18:12:26 +01:00
Nick Wellnhofer
c885bebb5d fuzz: Remove size limit, disable XInclude
Now that entity expansion issues should be fixed, we should get more
interesting timeout errors from OSS-Fuzz. Disable XInclude for now,
since it often timeouts in XPath computations. The XInclude tests should
be moved to a separate fuzz target.
2022-12-23 23:12:52 +01:00
Nick Wellnhofer
9aba613b14 fuzz: Add new XInclude test directory to corpus 2022-10-31 17:09:54 +01:00
Nick Wellnhofer
128c0261c6 warnings: Fix -Wstrict-prototypes warning 2022-10-25 19:34:38 +02:00
Nick Wellnhofer
513d65fee8 Use AM_CFLAGS and AM_LDFLAGS consistently 2022-09-02 18:33:36 +02:00
Nick Wellnhofer
d0ab5c4fe6 Fix compiler warnings in fuzzing code 2022-09-02 18:33:36 +02:00
Nick Wellnhofer
4612ce3031 Implement xpath1() XPointer scheme
See https://www.w3.org/2005/04/xpointer-schemes/
2022-04-21 04:26:52 +02:00
Nick Wellnhofer
3f74e42bae Simplify 'make check' targets 2022-04-04 05:41:51 +02:00
Nick Wellnhofer
95c7f315ab Move SVG tests to runtest.c
Also update the test results for the first time since 2000.
2022-04-04 04:18:07 +02:00
Nick Wellnhofer
7016b0e099 Don't overlink executables
With very few exceptions, utilities and test programs don't require any
external libraries.

- xmllint and xmlcatalog need libreadline
- runtest and testThreads need pthreads
2022-04-03 14:08:43 +02:00
David Seifert
5c71ada83a
Detect libm using libtool's macros 2022-03-30 16:51:17 +02:00
Nick Wellnhofer
6117700e2c Remove special configuration for certain maintainers 2022-02-20 21:49:05 +01:00
Nick Wellnhofer
d19bab68f4 Fix fuzz/.gitignore after fixing VPATH build 2022-02-19 19:26:42 +01:00
Nick Wellnhofer
8626648790 Fix fuzzer test with VPATH build
Also fixes make distcheck.
2022-02-14 18:06:38 +01:00
Nick Wellnhofer
be889b6581 Make xmlFuzzReadString return a zero size in error case
Avoids use of uninitialized memory.
2022-02-12 15:54:54 +01:00
Daniel Veillard
b48e77cf4f Release of libxml2-2.9.12
Brown paper bag release, some recently added sources were missing from
the 2.9.11 tarball:
- configure.ac: bump version
- fuzz/Makefile.am: add fuzz.h and seed/regexp to EXTRA_DIST
2021-05-13 20:56:16 +02:00
Nick Wellnhofer
8446d4593e Reduce some fuzzer timeouts
OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for
several hundred hours without hitting the 20s timeout. It seems that
most timeouts resulting from accidentally quadratic behavior in the
HTML parser have been fixed. Start to gradually reduce the timeout to
find new performance issues.
2021-03-01 20:56:40 +01:00
Nick Wellnhofer
85c817a200 Improve fuzzer stability
- Add more calls to xmlInitializeCatalog.
- Call xmlResetLastError after fuzzing each input.
2021-02-22 22:29:28 +01:00
Nick Wellnhofer
f9ccb3b818 Check for feature flags in fuzzer tests 2021-02-22 22:29:28 +01:00
Nick Wellnhofer
7a90bdfae6 Another attempt at improving fuzzer stability
xmlInitializeCatalog is not called from xmlInitParser.
2021-02-22 17:58:06 +01:00
Nick Wellnhofer
0fb3ae5840 Revert "Improve HTML fuzzer stability"
This reverts commit de1b51eddc.
2021-02-22 17:31:05 +01:00
Nick Wellnhofer
0987001c1b Add charset names to fuzzing dictionaries 2021-02-22 13:21:38 +01:00
Nick Wellnhofer
de1b51eddc Improve HTML fuzzer stability
Call htmlInitAutoClose during fuzzer initialization to fix stability
issue. Leave a note concerning problems with this function.
2021-02-22 13:21:38 +01:00
Nick Wellnhofer
ec808a4415 Speed up HTML fuzzer
htmlDocDumpMemory uses the "HTML" encoding if no other encoding was
specified in the source HTML. This encoding can be extremely slow
because of an inefficiency in htmlEntityValueLookup. Stop encoding
the output for now.
2021-02-07 14:39:55 +01:00
Nick Wellnhofer
e2b975c317 Handle malloc failures in fuzzing code
Avoid misdiagnosis in OOM situations.
2020-12-18 14:10:13 +01:00
Nick Wellnhofer
9086988ffa Enforce maximum length of fuzz input
Remove the libfuzzer max_len option which doesn't apply to other
fuzzing engines. Enforce the maximum length directly in the fuzz
targets. For the xml target, lower the maximum when expanding entities
to avoid timeout and OOM errors.
2020-12-16 16:12:07 +01:00
Nick Wellnhofer
8a85263f13 Add fuzzing dictionaries to EXTRA_DIST
Also add static seed corpus for the URI fuzzer.
2020-10-25 20:08:16 +01:00
Nick Wellnhofer
6f1470a5d6 Hardcode maximum XPath recursion depth
Always limit nested functions calls to 5000. This avoids call stack
overflows with deeply nested expressions.

The expression parser produces about 10 nested function calls when
parsing a subexpression in parentheses, so the effective nesting limit
is about 500 which should be more than enough.

Use a lower limit when fuzzing to account for increased memory usage
when using sanitizers.
2020-08-26 00:22:25 +02:00
Nick Wellnhofer
8c3ef083ca Pass URL of main entity in XML fuzzer 2020-08-24 23:17:34 +02:00
Nick Wellnhofer
0d5f3710fb Consolidate seed corpus generation
Implement file handling in C to speed up corpus generation.
2020-08-24 21:14:55 +02:00
Nick Wellnhofer
0d9da0290c Test fuzz targets with dummy driver
Run fuzz targets with files in seed corpus during test.
2020-08-24 03:57:03 +02:00
Nick Wellnhofer
804c52978f Stop using maxParserDepth in xpath.c
Only use a single maxDepth value.
2020-08-17 03:39:51 +02:00
Nick Wellnhofer
0ff527482d Fix autotools warnings 2020-08-17 02:54:28 +02:00
Nick Wellnhofer
10a0794878 Fix XPath fuzzer 2020-08-08 17:46:11 +02:00
Nick Wellnhofer
6c128fd58a Fuzz XInclude engine 2020-08-08 14:32:44 +02:00
Nick Wellnhofer
ad26a60f95 Add XPath and XPointer fuzzer 2020-08-06 14:12:32 +02:00
Nick Wellnhofer
905820a44c Update fuzzing code
- Shorten timeouts
- Align options from Makefile and options files
- Add section headers to Makefile
- Skip invalid UTF-8 in regexp fuzzer
- Update regexp.dict
- Generate HTML seed corpus in correct format
2020-07-31 11:55:13 +02:00
Nick Wellnhofer
93ce33c2b8 Fix several quadratic runtime issues in HTML push parser
Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.

Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.

Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.

Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.

Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.

Fix usage of IS_CHAR macro in htmlParseScript.
2020-07-23 20:47:35 +02:00
Nick Wellnhofer
eac1c7e2e5 Fuzz target for XML Schemas
This only tests the schema parser for now.
2020-06-23 16:20:27 +02:00
Nick Wellnhofer
ffd31dbefd Move entity recorder to fuzz.c 2020-06-21 12:15:46 +02:00
Nick Wellnhofer
536f421d37 Fuzz target for HTML parser 2020-06-15 15:23:38 +02:00
Nick Wellnhofer
e98150d444 Add options file for xml fuzzer
This will be picked up OSS-Fuzz, limiting the maximum input size to
80 KB and hopefully avoiding timeouts. Some of the timeouts seem to be
related to our suboptimal handling of excessive entity expansion.
The new fuzzers support external entities and make this problem even
more prominent.
2020-06-09 13:53:06 +02:00
Nick Wellnhofer
00ed736eec Add a couple of libFuzzer targets
- XML fuzzer
  Currently tests the pull parser, push parser and reader, as well as
  serialization. Supports splitting fuzz data into multiple documents
  for things like external DTDs or entities. The seed corpus is built
  from parts of the test suite.

- Regexp fuzzer
  Seed corpus was statically generated from test suite.

- URI fuzzer
  Tests parsing and most other functions from uri.c.
2020-06-05 13:53:11 +02:00