libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-27 04:55:04 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	de1b51eddc	Improve HTML fuzzer stability Call htmlInitAutoClose during fuzzer initialization to fix stability issue. Leave a note concerning problems with this function.	2021-02-22 13:21:38 +01:00
Markus Rickert	09320f0551	Add CI for MSVC x86	2021-02-21 14:26:40 +01:00
Nick Wellnhofer	dcb80b92da	Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.	2021-02-20 21:28:56 +01:00
hhb	02bee4c414	Add a flag to not output anything when xmllint succeeded	2021-02-20 16:49:45 +01:00
Simon Josefsson	4defa2c24a	Fix warnings in libxml.m4 with autoconf 2.70+. Closes #219.	2021-02-20 16:48:20 +01:00
Nick Wellnhofer	cbe1212db6	Fix null deref introduced with previous commit Found by OSS-Fuzz.	2021-02-09 17:07:21 +01:00
Nick Wellnhofer	01411e7c5e	Check for invalid redeclarations of predefined entities Implement section "4.6 Predefined Entities" of the XML 1.0 spec and check whether redeclarations of predefined entities match the original definitions. Note that some test cases declared <!ENTITY lt "<"> But the XML spec clearly states that this is illegal: > If the entities lt or amp are declared, they MUST be declared as > internal entities whose replacement text is a character reference to > the respective character (less-than sign or ampersand) being escaped; > the double escaping is REQUIRED for these entities so that references > to them produce a well-formed result. Also fixes #217 but the connection is only tangential. The integer overflow discovered by fuzzing was more related to the fact that various parts of the parser disagreed on whether to prefer predefined entities over their redeclarations. The whole situation is a mess and even depends on legacy parser options. But now that redeclarations are validated, it shouldn't make a difference. As noted in the added comment, this is also one of the cases where overly defensive checks can hide interesting logic bugs from fuzzers.	2021-02-08 21:51:26 +01:00
SVGAnimate	07920b4381	Add the copy of type from original xmlDoc in xmlCopyDoc() A bug related to php DOMDocument: https://bugs.php.net/bug.php?id=80665 When copy/clone an html document, the xmlDoc->type goes from XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.	2021-02-08 17:17:09 +01:00
Markus Rickert	2065d34090	Add CI for CMake on MSVC	2021-02-08 17:15:44 +01:00
Mike Dalessio	afad37216b	parser.c: shrink the input buffer when appropriate Fixes GNOME/libxml2#200 Also see discussions at: - GNOME/libxml2#192 - https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e - https://github.com/sparklemotion/nokogiri/issues/2132	2021-02-08 17:14:35 +01:00
Nick Wellnhofer	ec808a4415	Speed up HTML fuzzer htmlDocDumpMemory uses the "HTML" encoding if no other encoding was specified in the source HTML. This encoding can be extremely slow because of an inefficiency in htmlEntityValueLookup. Stop encoding the output for now.	2021-02-07 14:39:55 +01:00
Nick Wellnhofer	e6495e4789	Remove unused encoding parameter of HTML output functions The encoding string is unused. Encodings are set by way of the output buffer.	2021-02-07 14:39:55 +01:00
Nick Wellnhofer	954696e7cf	Fix infinite loop in HTML parser introduced with recent commits Check for XML_PARSER_EOF to avoid an infinite loop introduced with recent changes to the HTML push parser. Found by OSS-Fuzz.	2021-02-07 14:38:55 +01:00
Nick Wellnhofer	acb3566739	Fix quadratic runtime when parsing CDATA sections Use optimized concatenation for CDATA sections in addition to normal text. This also affects HTML script content. Found by OSS-Fuzz.	2021-02-03 13:57:26 +01:00
Markus Rickert	f93ca3e140	Update minimum required CMake version	2021-01-15 18:31:20 +01:00
Markus Rickert	0048728916	Add variables for configured options to CMake config files	2021-01-05 22:03:47 +01:00
Markus Rickert	95519737af	Check if variables exist when defining targets	2021-01-05 22:03:47 +01:00
Markus Rickert	c26e45259c	Check if target exists when reading target properties	2021-01-05 22:03:47 +01:00
Markus Rickert	ec11987592	Add xmlcatalog target and definition to config files	2021-01-05 22:03:47 +01:00
Markus Rickert	2377a312b9	Remove include directories for link-only dependencies	2021-01-05 22:03:47 +01:00
Markus Rickert	26835480dc	Fix ICU build in CMake	2021-01-05 22:03:47 +01:00
Markus Rickert	296ab61e1c	Configure pkgconfig, xml2-config, and xml2Conf.sh file	2021-01-05 22:03:47 +01:00
Nick Wellnhofer	79301d3d5e	Fix timeout when handling recursive entities Abort parsing early to avoid an almost infinite loop in certain error cases involving recursive entities. Found with libFuzzer.	2020-12-18 14:13:46 +01:00
Nick Wellnhofer	45da175c14	Fix memory leak in xmlParseElementMixedContentDecl Free parsed content if malloc fails to avoid a memory leak. Found with libFuzzer.	2020-12-18 14:11:58 +01:00
Nick Wellnhofer	1d73f07d67	Fix null deref in xmlStringGetNodeList Check for malloc failure to avoid null deref. Found with libFuzzer.	2020-12-18 14:10:59 +01:00
Nick Wellnhofer	e2b975c317	Handle malloc failures in fuzzing code Avoid misdiagnosis in OOM situations.	2020-12-18 14:10:13 +01:00
Mike Dalessio	a67b63d183	use new htmlParseLookupCommentEnd to find comment ends Note that the caret in error messages generated during comment parsing may have moved by one byte. See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment	2020-12-16 16:12:07 +01:00
Mike Dalessio	29f5d20e84	htmlParseComment: treat `--!>` as if it closed the comment See guidance provided on incorrectly-closed comments here: https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment	2020-12-16 16:12:07 +01:00
Mike Dalessio	e28d9347bc	add test coverage for incorrectly-closed comments this establishes the baseline behavior so that subsequent commits which modify this behavior are clear about what's being changed.	2020-12-16 16:12:07 +01:00
Nick Wellnhofer	9086988ffa	Enforce maximum length of fuzz input Remove the libfuzzer max_len option which doesn't apply to other fuzzing engines. Enforce the maximum length directly in the fuzz targets. For the xml target, lower the maximum when expanding entities to avoid timeout and OOM errors.	2020-12-16 16:12:07 +01:00
Nick Wellnhofer	1fe385304f	Remove temporary members from struct _xmlXPathContext These values are hardcoded now and the struct members, while public, were recently introduced and never part of an official release.	2020-12-16 15:27:13 +01:00
Nick Wellnhofer	8ca3a59b2e	Fix integer overflow in xmlSchemaGetParticleTotalRangeMin The function is only used once and its return value is only checked for zero. Disable the function like its Max counterpart and add an implementation for the special case. Found by OSS-Fuzz.	2020-12-15 20:14:28 +01:00
Xiaoming Ni	649d02eaa4	encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>	2020-12-07 14:38:14 +01:00
Xiaoming Ni	cb7a572b3e	xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val" The xmlSchemaGetFacetValueAsUlong() API is an external API. The validity of external input parameters must be strictly verified. Before accessing "facet->val->value", we need check whether "facet->val" is a null pointer. Signed-off-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>	2020-12-07 14:37:55 +01:00
Markus Rickert	84b76d99f1	Update CMake config files	2020-12-07 14:37:23 +01:00
Markus Rickert	d0ccb3a6b6	Add xmlcatalog and xmllint to CMake export	2020-12-07 14:37:18 +01:00
Nick Wellnhofer	acdc2ff360	Simplify xmlexports.h All the compiler switches essentially set the same macros. The only exception was MSVC which omitted the "extern" keyword for exported variables. This in turn broke clang-cl. This commit rewrites and simplifies the whole header. Closes #163.	2020-12-06 17:31:38 +01:00
Nick Wellnhofer	a218ff0ec0	Fix null pointer deref in xmlXPtrRangeInsideFunction Found by OSS-Fuzz.	2020-12-06 17:26:36 +01:00
Nick Wellnhofer	94c2e415a9	Fix quadratic runtime in HTML push parser with null bytes Null bytes in the input stream do not necessarily signal an EOF condition. Check the stream pointers for EOF to avoid quadratic rescanning of input data. Note that the CUR_CHAR macro used in functions like htmlParseCharData calls htmlCurrentChar which translates null bytes. Found by OSS-Fuzz.	2020-12-06 16:44:11 +01:00
Markus Rickert	1c4f9a6db5	Require dependencies based on enabled CMake options	2020-11-30 12:43:48 +01:00
Michael Matz	faea2fa9b8	Avoid quadratic checking of identity-constraints key/unique/keyref schema attributes currently use qudratic loops to check their various constraints (that keys are unique and that keyrefs refer to existing keys). That becomes extremely slow if there are many elements with keys. This happens in the wild with e.g. the OVAL XML descriptions of security patches. You need the openscap schemata, and then an example xml file: % zypper in openscap-utils % wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml % time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 16m59,857s user 16m55,787s sys 0m1,060s This patch makes libxml use a hash table to avoid the quadratic behaviour. The existing hash table only accepts strings as keys, so we're mostly reusing the canonical representation of key values to derive such strings (with the caveat given in a comment). The alternative would be to rework the hash table code to accept either numbers or free functions as hash workers, but the code is fast enough as is. With the patch we have this then: % time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null opensuse.leap.15.1.xml validates real 0m3,531s user 0m3,427s sys 0m0,103s So, a ~300x speedup. This patch survives 'make check' and 'make tests'.	2020-11-30 11:22:54 +01:00
Markus Rickert	8272db5318	Use NAMELINK_COMPONENT in CMake install	2020-11-30 11:22:54 +01:00
Markus Rickert	5c7bdbc906	Add CMake files to EXTRA_DIST	2020-11-30 11:22:53 +01:00
Markus Rickert	7a62870a3c	Add missing compile definition for static builds to CMake	2020-11-30 11:08:14 +01:00
Markus Rickert	e028d29379	Add CI for CMake on Linux and MinGW	2020-11-30 11:07:46 +01:00
Frederik Seiffert	b516ed189e	Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.	2020-11-19 18:10:32 +01:00
Victor Stinner	ac5e99911a	Convert python/libxml.c to PY_SSIZE_T_CLEAN Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string length (int len) explicitly to Py_ssize_t when passing a string to a function call using PyObject_CallMethod() with the "s#" format.	2020-11-19 18:09:22 +01:00
Victor Stinner	f42a0524c6	Build the Python extension with PY_SSIZE_T_CLEAN The Python extension module now uses Py_ssize_t rather than int for string lengths. This change makes the extension compatible with Python 3.10. Fixes #203.	2020-11-19 18:09:22 +01:00
Nick Wellnhofer	0ace6c4d7e	Add CI test for Python 3	2020-11-19 18:09:22 +01:00
Elliott Hughes	7c06d99e1f	Fix xmlURIEscape memory leaks. Found by running the fuzz/uri.c fuzzer under asan (internal Android bug 171610679). Always free `ret` when exiting on failure. I've moved the definition of NULLCHK down past where ret is always initialized to make it clear that this is safe. This patch also fixes the indentation of two of the NULLCHK call sites to make it more obvious that NULLCHK isn't `if`-like.	2020-11-09 18:17:01 +01:00

1 2 3 4 5 ...

5084 Commits