IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Under certain circumstances, the HTML parser would try to guess and
switch input encodings multiple times, leading to slow processing of
documents with encoding errors. The repeated scanning of the input
buffer when guessing encodings could even lead to quadratic behavior.
The code htmlCurrentChar probably assumed that if there's an encoding
handler, it is guaranteed to produce valid UTF-8. This holds true in
general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
encoding handler simply invoked memcpy without checking for invalid
UTF-8. This still must be fixed, preferably by not using this handler
at all.
Also leave a note that switching encodings twice seems impossible to
implement correctly. Add a check when handling UTF-8 encoding errors
in htmlCurrentChar to avoid this situation, even if encoders produce
invalid UTF-8.
Found by OSS-Fuzz.
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.
Note that some test cases declared
<!ENTITY lt "<">
But the XML spec clearly states that this is illegal:
> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.
Also fixes#217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.
As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
A bug related to php DOMDocument:
https://bugs.php.net/bug.php?id=80665
When copy/clone an html document, the xmlDoc->type goes from
XML_HTML_DOCUMENT_NODE to XML_DOCUMENT_NODE.
htmlDocDumpMemory uses the "HTML" encoding if no other encoding was
specified in the source HTML. This encoding can be extremely slow
because of an inefficiency in htmlEntityValueLookup. Stop encoding
the output for now.
Remove the libfuzzer max_len option which doesn't apply to other
fuzzing engines. Enforce the maximum length directly in the fuzz
targets. For the xml target, lower the maximum when expanding entities
to avoid timeout and OOM errors.
The function is only used once and its return value is only checked for
zero. Disable the function like its Max counterpart and add an
implementation for the special case.
Found by OSS-Fuzz.
The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.
so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().
Reported-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.
Signed-off-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
All the compiler switches essentially set the same macros. The only
exception was MSVC which omitted the "extern" keyword for exported
variables. This in turn broke clang-cl.
This commit rewrites and simplifies the whole header.
Closes#163.
Null bytes in the input stream do not necessarily signal an EOF
condition. Check the stream pointers for EOF to avoid quadratic
rescanning of input data.
Note that the CUR_CHAR macro used in functions like htmlParseCharData
calls htmlCurrentChar which translates null bytes.
Found by OSS-Fuzz.
key/unique/keyref schema attributes currently use qudratic loops
to check their various constraints (that keys are unique and that
keyrefs refer to existing keys). That becomes extremely slow if
there are many elements with keys. This happens in the wild with
e.g. the OVAL XML descriptions of security patches. You need the
openscap schemata, and then an example xml file:
% zypper in openscap-utils
% wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml
% time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 16m59,857s
user 16m55,787s
sys 0m1,060s
This patch makes libxml use a hash table to avoid the quadratic
behaviour. The existing hash table only accepts strings as keys, so
we're mostly reusing the canonical representation of key values to derive
such strings (with the caveat given in a comment). The alternative
would be to rework the hash table code to accept either numbers or free
functions as hash workers, but the code is fast enough as is.
With the patch we have this then:
% time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 0m3,531s
user 0m3,427s
sys 0m0,103s
So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string
length (int len) explicitly to Py_ssize_t when passing a string to a
function call using PyObject_CallMethod() with the "s#" format.
The Python extension module now uses Py_ssize_t rather than int for
string lengths. This change makes the extension compatible with
Python 3.10.
Fixes#203.
Found by running the fuzz/uri.c fuzzer under asan (internal Android bug
171610679).
Always free `ret` when exiting on failure. I've moved the definition of
NULLCHK down past where ret is always initialized to make it clear that
this is safe.
This patch also fixes the indentation of two of the NULLCHK call sites
to make it more obvious that NULLCHK isn't `if`-like.