IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When building the internal representation of a regexp, it is possible
that a lot of empty transitions are created. Therefore there is a step
to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions.
There is an error there for this case:
* State 1 has a transition with an atom (in this case "a") to state 2.
* State 2 is final and has an epsilon transition to state 1.
After reduction it looked like:
* State 1 has a transition with an atom (in this case "a") to itself
and is final.
In other words, the empty string is accepted when it shouldn't be.
The attached patch skips the reduction step for final states.
An alternative would be to insert or increment counters when reducing a
final state, but this seemed error prone and unnecessary, since there
aren't that many final states.
Fixes#282
The old, non-recursive HTML serialization code would always terminate
the output with a newline. The new implementation omitted the newline
if the document node had no children. Readd the newline when
serializing empty documents.
Fixes#266.
Fix accounting of recursion depth when parsing XPath expressions.
This silly bug introduced in commit 804c5297 could lead to spurious
errors when parsing larger expressions or XSLT documents.
Should fix#264.
Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted
parent pointers. This used to work with the old recursive code but the
non-recursive rewrite required parent pointers to be set correctly.
Unfortunately, lxml relies on the old behavior and passes subtrees with
a corrupted structure. Fall back to a recursive function call if an
invalid parent pointer is detected.
Fixes#255.
Brown paper bag release, some recently added sources were missing from
the 2.9.11 tarball:
- configure.ac: bump version
- fuzz/Makefile.am: add fuzz.h and seed/regexp to EXTRA_DIST
Prompted by CVE-2021-3541, but this includes an awful lot of serious bug
fixes by Nick and others.
- configure.ac: bumped to new release
- doc/* updated and regenerated
This is relapted to parameter entities expansion and following
the line of the billion laugh attack. Somehow in that path the
counting of parameters was missed and the normal algorithm based
on entities "density" was useless.
Always call nameNsPush instead of namePush. The latter is unused now
and should probably be removed from the public API. I can't see how
it could be used reasonably from client code and the unprefixed name
has always polluted the global namespace.
Fixes a null pointer dereference introduced with de5b624f when parsing
in SAX1 mode.
Found by OSS-Fuzz.
Make the parser context's "pushTab" point to an array of structs
instead of void pointers. This avoids casting unrelated types to void
pointers, improving readability and portability, and allows for more
efficient packing. Ultimately, the struct could be extended to include
the contents of "nameTab" and "spaceTab", further simplifying the code.
Historically, "pushTab" was only used by the push parser (hence the
name), so the change to the public headers should be safe.
Also remove an unused parameter from xmlParseEndTag2.
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.
This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.
This also fixes line numbers in the SAX push parser which were never
handled correctly.
Fix regression introduced with b25acce8. Some users like libxslt may
call the HTML output functions on documents with uppercase tag names,
so we must keep case-insensitive string comparison.
Fixes#248.
Check return value of recursive calls to
xmlParseElementChildrenContentDeclPriv and return immediately in case
of errors. Otherwise, struct xmlElementContent could contain unexpected
null pointers, leading to a null deref when post-validating documents
which aren't well-formed and parsed in recovery mode.
Fixes#243.
The --dropdtd option can leave dangling pointers in entity reference
nodes. Make sure to skip these nodes when processing XIncludes.
This also avoids scanning entity declarations and even modifying
them inadvertently during XInclude processing.
Move from a block list to an allow list approach to avoid descending
into other node types that can't contain elements.
Fixes#237.
Code is currently assuming UTF-8 without validating. Truncated UTF-8
input can cause out-of-bounds array access.
Adds further checks to partial fix in 50f06b3e.
Fixes#178
Call htmlCtxtUseOptions to make sure that names aren't stored in
dictionaries.
Note that this issue only affects xmllint using the HTML push parser.
Fixes#230.
Currently, it catches mingw-w64 in there as well, but mingw-w64 follows
linux-like naming with no weird postfixes
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
The DBL_MAX approach could lead to errors caused by excess precision.
Switch back to the division-by-zero approach with a work-around for
MSVC and use the extern globals instead of macro expressions.
Fix another case where only recursion depth was limited, but entities
would still be expanded over and over again.
The test case discovered by fuzzing only affected parsing in recovery
mode with XML_PARSE_RECOVER.
Found by OSS-Fuzz.
Switch to binary search. This is the first time bsearch is used in the
libxml2 code base. But it's a standard library function since C89 and
should be portable.
I can't see a reason to check attribute content for UTF-8 validity.
Other parts of the API like xmlNewText have always assumed valid UTF-8
as extra checks only slow down processing.
Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not
freeing the old encoding would cause a memory leak.
Note that this was last changed in 2008 with commit 6f8611fd which
removed unnecessary encoding/decoding steps. Setting attributes should
be even faster now.
Found by OSS-Fuzz.
OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for
several hundred hours without hitting the 20s timeout. It seems that
most timeouts resulting from accidentally quadratic behavior in the
HTML parser have been fixed. Start to gradually reduce the timeout to
find new performance issues.
Add a special case for the predefined XML namespace when looking up DTD
attribute defaults in xmlGetPropNodeInternal to avoid calling
xmlGetNsList.
This fixes quadratic behavior in
- xmlNodeGetBase
- xmlNodeGetLang
- xmlNodeGetSpacePreserve
Found by OSS-Fuzz.