1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-27 04:55:04 +03:00
Commit Graph

5084 Commits

Author SHA1 Message Date
Nick Wellnhofer
de1b51eddc Improve HTML fuzzer stability
Call htmlInitAutoClose during fuzzer initialization to fix stability
issue. Leave a note concerning problems with this function.
2021-02-22 13:21:38 +01:00
Markus Rickert
09320f0551 Add CI for MSVC x86 2021-02-21 14:26:40 +01:00
Nick Wellnhofer
dcb80b92da Fix slow parsing of HTML with encoding errors
Under certain circumstances, the HTML parser would try to guess and
switch input encodings multiple times, leading to slow processing of
documents with encoding errors. The repeated scanning of the input
buffer when guessing encodings could even lead to quadratic behavior.

The code htmlCurrentChar probably assumed that if there's an encoding
handler, it is guaranteed to produce valid UTF-8. This holds true in
general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
encoding handler simply invoked memcpy without checking for invalid
UTF-8. This still must be fixed, preferably by not using this handler
at all.

Also leave a note that switching encodings twice seems impossible to
implement correctly. Add a check when handling UTF-8 encoding errors
in htmlCurrentChar to avoid this situation, even if encoders produce
invalid UTF-8.

Found by OSS-Fuzz.
2021-02-20 21:28:56 +01:00
hhb
02bee4c414 Add a flag to not output anything when xmllint succeeded 2021-02-20 16:49:45 +01:00
Simon Josefsson
4defa2c24a Fix warnings in libxml.m4 with autoconf 2.70+.
Closes #219.
2021-02-20 16:48:20 +01:00
Nick Wellnhofer
cbe1212db6 Fix null deref introduced with previous commit
Found by OSS-Fuzz.
2021-02-09 17:07:21 +01:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
SVGAnimate
07920b4381 Add the copy of type from original xmlDoc in xmlCopyDoc()
A bug related to php DOMDocument:

    https://bugs.php.net/bug.php?id=80665

When copy/clone an html document, the xmlDoc->type goes from
XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
2021-02-08 17:17:09 +01:00
Markus Rickert
2065d34090 Add CI for CMake on MSVC 2021-02-08 17:15:44 +01:00
Mike Dalessio
afad37216b parser.c: shrink the input buffer when appropriate
Fixes GNOME/libxml2#200

Also see discussions at:
- GNOME/libxml2#192
- https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e
- https://github.com/sparklemotion/nokogiri/issues/2132
2021-02-08 17:14:35 +01:00
Nick Wellnhofer
ec808a4415 Speed up HTML fuzzer
htmlDocDumpMemory uses the "HTML" encoding if no other encoding was
specified in the source HTML. This encoding can be extremely slow
because of an inefficiency in htmlEntityValueLookup. Stop encoding
the output for now.
2021-02-07 14:39:55 +01:00
Nick Wellnhofer
e6495e4789 Remove unused encoding parameter of HTML output functions
The encoding string is unused. Encodings are set by way of the output
buffer.
2021-02-07 14:39:55 +01:00
Nick Wellnhofer
954696e7cf Fix infinite loop in HTML parser introduced with recent commits
Check for XML_PARSER_EOF to avoid an infinite loop introduced with
recent changes to the HTML push parser.

Found by OSS-Fuzz.
2021-02-07 14:38:55 +01:00
Nick Wellnhofer
acb3566739 Fix quadratic runtime when parsing CDATA sections
Use optimized concatenation for CDATA sections in addition to normal
text. This also affects HTML script content.

Found by OSS-Fuzz.
2021-02-03 13:57:26 +01:00
Markus Rickert
f93ca3e140 Update minimum required CMake version 2021-01-15 18:31:20 +01:00
Markus Rickert
0048728916 Add variables for configured options to CMake config files 2021-01-05 22:03:47 +01:00
Markus Rickert
95519737af Check if variables exist when defining targets 2021-01-05 22:03:47 +01:00
Markus Rickert
c26e45259c Check if target exists when reading target properties 2021-01-05 22:03:47 +01:00
Markus Rickert
ec11987592 Add xmlcatalog target and definition to config files 2021-01-05 22:03:47 +01:00
Markus Rickert
2377a312b9 Remove include directories for link-only dependencies 2021-01-05 22:03:47 +01:00
Markus Rickert
26835480dc Fix ICU build in CMake 2021-01-05 22:03:47 +01:00
Markus Rickert
296ab61e1c Configure pkgconfig, xml2-config, and xml2Conf.sh file 2021-01-05 22:03:47 +01:00
Nick Wellnhofer
79301d3d5e Fix timeout when handling recursive entities
Abort parsing early to avoid an almost infinite loop in certain error
cases involving recursive entities.

Found with libFuzzer.
2020-12-18 14:13:46 +01:00
Nick Wellnhofer
45da175c14 Fix memory leak in xmlParseElementMixedContentDecl
Free parsed content if malloc fails to avoid a memory leak.

Found with libFuzzer.
2020-12-18 14:11:58 +01:00
Nick Wellnhofer
1d73f07d67 Fix null deref in xmlStringGetNodeList
Check for malloc failure to avoid null deref.

Found with libFuzzer.
2020-12-18 14:10:59 +01:00
Nick Wellnhofer
e2b975c317 Handle malloc failures in fuzzing code
Avoid misdiagnosis in OOM situations.
2020-12-18 14:10:13 +01:00
Mike Dalessio
a67b63d183 use new htmlParseLookupCommentEnd to find comment ends
Note that the caret in error messages generated during comment parsing
may have moved by one byte.

See guidance provided on incorrectly-closed comments here:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
2020-12-16 16:12:07 +01:00
Mike Dalessio
29f5d20e84 htmlParseComment: treat --!> as if it closed the comment
See guidance provided on incorrectly-closed comments here:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment
2020-12-16 16:12:07 +01:00
Mike Dalessio
e28d9347bc add test coverage for incorrectly-closed comments
this establishes the baseline behavior so that subsequent commits
which modify this behavior are clear about what's being changed.
2020-12-16 16:12:07 +01:00
Nick Wellnhofer
9086988ffa Enforce maximum length of fuzz input
Remove the libfuzzer max_len option which doesn't apply to other
fuzzing engines. Enforce the maximum length directly in the fuzz
targets. For the xml target, lower the maximum when expanding entities
to avoid timeout and OOM errors.
2020-12-16 16:12:07 +01:00
Nick Wellnhofer
1fe385304f Remove temporary members from struct _xmlXPathContext
These values are hardcoded now and the struct members, while public,
were recently introduced and never part of an official release.
2020-12-16 15:27:13 +01:00
Nick Wellnhofer
8ca3a59b2e Fix integer overflow in xmlSchemaGetParticleTotalRangeMin
The function is only used once and its return value is only checked for
zero. Disable the function like its Max counterpart and add an
implementation for the special case.

Found by OSS-Fuzz.
2020-12-15 20:14:28 +01:00
Xiaoming Ni
649d02eaa4 encoding: fix memleak in xmlRegisterCharEncodingHandler()
The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.

so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().

Reported-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
2020-12-07 14:38:14 +01:00
Xiaoming Ni
cb7a572b3e xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val"
The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.

Signed-off-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
2020-12-07 14:37:55 +01:00
Markus Rickert
84b76d99f1 Update CMake config files 2020-12-07 14:37:23 +01:00
Markus Rickert
d0ccb3a6b6 Add xmlcatalog and xmllint to CMake export 2020-12-07 14:37:18 +01:00
Nick Wellnhofer
acdc2ff360 Simplify xmlexports.h
All the compiler switches essentially set the same macros. The only
exception was MSVC which omitted the "extern" keyword for exported
variables. This in turn broke clang-cl.

This commit rewrites and simplifies the whole header.

Closes #163.
2020-12-06 17:31:38 +01:00
Nick Wellnhofer
a218ff0ec0 Fix null pointer deref in xmlXPtrRangeInsideFunction
Found by OSS-Fuzz.
2020-12-06 17:26:36 +01:00
Nick Wellnhofer
94c2e415a9 Fix quadratic runtime in HTML push parser with null bytes
Null bytes in the input stream do not necessarily signal an EOF
condition. Check the stream pointers for EOF to avoid quadratic
rescanning of input data.

Note that the CUR_CHAR macro used in functions like htmlParseCharData
calls htmlCurrentChar which translates null bytes.

Found by OSS-Fuzz.
2020-12-06 16:44:11 +01:00
Markus Rickert
1c4f9a6db5 Require dependencies based on enabled CMake options 2020-11-30 12:43:48 +01:00
Michael Matz
faea2fa9b8 Avoid quadratic checking of identity-constraints
key/unique/keyref schema attributes currently use qudratic loops
to check their various constraints (that keys are unique and that
keyrefs refer to existing keys).  That becomes extremely slow if
there are many elements with keys.  This happens in the wild with
e.g. the OVAL XML descriptions of security patches.  You need the
openscap schemata, and then an example xml file:

% zypper in openscap-utils
% wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml
% time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates

real    16m59,857s
user    16m55,787s
sys     0m1,060s

This patch makes libxml use a hash table to avoid the quadratic
behaviour.  The existing hash table only accepts strings as keys, so
we're mostly reusing the canonical representation of key values to derive
such strings (with the caveat given in a comment).  The alternative
would be to rework the hash table code to accept either numbers or free
functions as hash workers, but the code is fast enough as is.

With the patch we have this then:

% time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates

real    0m3,531s
user    0m3,427s
sys     0m0,103s

So, a ~300x speedup.  This patch survives 'make check' and 'make tests'.
2020-11-30 11:22:54 +01:00
Markus Rickert
8272db5318 Use NAMELINK_COMPONENT in CMake install 2020-11-30 11:22:54 +01:00
Markus Rickert
5c7bdbc906 Add CMake files to EXTRA_DIST 2020-11-30 11:22:53 +01:00
Markus Rickert
7a62870a3c Add missing compile definition for static builds to CMake 2020-11-30 11:08:14 +01:00
Markus Rickert
e028d29379 Add CI for CMake on Linux and MinGW 2020-11-30 11:07:46 +01:00
Frederik Seiffert
b516ed189e Fix building with ICU 68.
ICU 68 no longer defines the TRUE macro.

Closes #204.
2020-11-19 18:10:32 +01:00
Victor Stinner
ac5e99911a Convert python/libxml.c to PY_SSIZE_T_CLEAN
Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string
length (int len) explicitly to Py_ssize_t when passing a string to a
function call using PyObject_CallMethod() with the "s#" format.
2020-11-19 18:09:22 +01:00
Victor Stinner
f42a0524c6 Build the Python extension with PY_SSIZE_T_CLEAN
The Python extension module now uses Py_ssize_t rather than int for
string lengths. This change makes the extension compatible with
Python 3.10.

Fixes #203.
2020-11-19 18:09:22 +01:00
Nick Wellnhofer
0ace6c4d7e Add CI test for Python 3 2020-11-19 18:09:22 +01:00
Elliott Hughes
7c06d99e1f Fix xmlURIEscape memory leaks.
Found by running the fuzz/uri.c fuzzer under asan (internal Android bug
171610679).

Always free `ret` when exiting on failure. I've moved the definition of
NULLCHK down past where ret is always initialized to make it clear that
this is safe.

This patch also fixes the indentation of two of the NULLCHK call sites
to make it more obvious that NULLCHK isn't `if`-like.
2020-11-09 18:17:01 +01:00