1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-27 04:55:04 +03:00
Commit Graph

5084 Commits

Author SHA1 Message Date
Nick Wellnhofer
2732b23466 Fix regression parsing public IDs literals in HTML
Fix regression introduced when reworking htmlParsePubidLiteral in
commit 93ce33c2.

Fixes #318.
2022-01-10 13:37:59 +01:00
Nick Wellnhofer
dea91c97de Fix buffering in xmlOutputBufferWrite
Fix a regression introduced with commit a697ed1e which caused
xmlOutputBufferWrite to flush internal buffers too late.

Fixes #296.
2021-07-27 16:12:54 +02:00
Arne Becker
ec6e3efb06 Patch to forbid epsilon-reduction of final states
When building the internal representation of a regexp, it is possible
that a lot of empty transitions are created. Therefore there is a step
to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions.

There is an error there for this case:

* State 1 has a transition with an atom (in this case "a") to state 2.
* State 2 is final and has an epsilon transition to state 1.

After reduction it looked like:
* State 1 has a transition with an atom (in this case "a") to itself
  and is final.

In other words, the empty string is accepted when it shouldn't be.

The attached patch skips the reduction step for final states.
An alternative would be to insert or increment counters when reducing a
final state, but this seemed error prone and unnecessary, since there
aren't that many final states.

Fixes #282
2021-07-06 21:59:25 +02:00
Timothy Lyanguzov
22f1521122 Use version in configure.ac for CMake
Now CMake script reads version from configure.ac to prevent unsynchronized versions
2021-06-07 21:43:08 +02:00
Nick Wellnhofer
92d9ab4c28 Fix whitespace when serializing empty HTML documents
The old, non-recursive HTML serialization code would always terminate
the output with a newline. The new implementation omitted the newline
if the document node had no children. Readd the newline when
serializing empty documents.

Fixes #266.
2021-06-07 15:09:53 +02:00
Nick Wellnhofer
3e1aad4fe5 Fix XPath recursion limit
Fix accounting of recursion depth when parsing XPath expressions.

This silly bug introduced in commit 804c5297 could lead to spurious
errors when parsing larger expressions or XSLT documents.

Should fix #264.
2021-06-02 17:39:41 +02:00
Nick Wellnhofer
13ad8736d2 Fix regression in xmlNodeDumpOutputInternal
Commit 85b1792e could cause additional whitespace if xmlNodeDump was
called with a non-zero starting level.
2021-05-25 11:16:13 +02:00
Markus Rickert
a46e85f669 Update CMake project version 2021-05-23 11:58:41 +02:00
Markus Rickert
a1cac3bbe4 Add CMake alias targets for embedded projects 2021-05-23 11:57:58 +02:00
David King
2c0f2f0341 Fix some validation errors in the FAQ
Move paragraphs inside li elements.
2021-05-23 11:57:12 +02:00
David King
b92b16f659 Remove unused variable in xmlCharEncOutFunc
Fixes a compiler warning:

encoding.c: In function 'xmlCharEncOutFunc__internal_alias':
encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable]
 2632 |     int output = 0;

https://gitlab.gnome.org/GNOME/libxml2/-/issues/254
2021-05-23 11:55:32 +02:00
Markus Rickert
7d4060d252 Add missing file xmlwin32version.h.in to EXTRA_DIST 2021-05-21 12:23:02 +02:00
Markus Rickert
4fc473d7e8 Add instructions on how to use CMake to compile libxml 2021-05-21 12:22:13 +02:00
Nick Wellnhofer
85b1792e37 Work around lxml API abuse
Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted
parent pointers. This used to work with the old recursive code but the
non-recursive rewrite required parent pointers to be set correctly.

Unfortunately, lxml relies on the old behavior and passes subtrees with
a corrupted structure. Fall back to a recursive function call if an
invalid parent pointer is detected.

Fixes #255.
2021-05-21 12:19:25 +02:00
Mike Dalessio
a7b9f3ebdf
fix: avoid segfault at exit when using custom memory functions
This extends the fix introduced by 956534e to Windows processes
dynamically loading libxml2.

Closes #256.
2021-05-20 13:38:54 -04:00
Daniel Veillard
b48e77cf4f Release of libxml2-2.9.12
Brown paper bag release, some recently added sources were missing from
the 2.9.11 tarball:
- configure.ac: bump version
- fuzz/Makefile.am: add fuzz.h and seed/regexp to EXTRA_DIST
2021-05-13 20:56:16 +02:00
Daniel Veillard
e1bcffea18 Release of libxml2-2.9.11
Prompted by CVE-2021-3541, but this includes an awful lot of serious bug
fixes by Nick and others.
- configure.ac: bumped to new release
- doc/* updated and regenerated
2021-05-13 15:35:21 +02:00
Daniel Veillard
8598060bac Patch for security issue CVE-2021-3541
This is relapted to parameter entities expansion and following
the line of the billion laugh attack. Somehow in that path the
counting of parameters was missed and the normal algorithm based
on entities "density" was useless.
2021-05-13 14:55:12 +02:00
Nick Wellnhofer
bfd2f4300f Fix null deref in legacy SAX1 parser
Always call nameNsPush instead of namePush. The latter is unused now
and should probably be removed from the public API. I can't see how
it could be used reasonably from client code and the unprefixed name
has always polluted the global namespace.

Fixes a null pointer dereference introduced with de5b624f when parsing
in SAX1 mode.

Found by OSS-Fuzz.
2021-05-09 19:03:16 +02:00
Nick Wellnhofer
ce00c36e65 Store per-element parser state in a struct
Make the parser context's "pushTab" point to an array of structs
instead of void pointers. This avoids casting unrelated types to void
pointers, improving readability and portability, and allows for more
efficient packing. Ultimately, the struct could be extended to include
the contents of "nameTab" and "spaceTab", further simplifying the code.

Historically, "pushTab" was only used by the push parser (hence the
name), so the change to the public headers should be safe.

Also remove an unused parameter from xmlParseEndTag2.
2021-05-08 22:16:49 +02:00
Nick Wellnhofer
de5b624f10 Fix handling of unexpected EOF in xmlParseContent
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.

This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
2021-05-08 20:47:36 +02:00
Nick Wellnhofer
3e80560d4b Fix line numbers in error messages for mismatched tags
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.

This also fixes line numbers in the SAX push parser which were never
handled correctly.
2021-05-07 11:48:11 +02:00
Nick Wellnhofer
7279d23636 Fix htmlTagLookup
Fix regression introduced with b25acce8. Some users like libxslt may
call the HTML output functions on documents with uppercase tag names,
so we must keep case-insensitive string comparison.

Fixes #248.
2021-05-06 10:54:29 +02:00
PaulHiggs
33468d7e70 update for xsd:language type check
Fixes #242.
2021-05-03 18:10:51 +02:00
Nick Wellnhofer
babe75030c Propagate error in xmlParseElementChildrenContentDeclPriv
Check return value of recursive calls to
xmlParseElementChildrenContentDeclPriv and return immediately in case
of errors. Otherwise, struct xmlElementContent could contain unexpected
null pointers, leading to a null deref when post-validating documents
which aren't well-formed and parsed in recovery mode.

Fixes #243.
2021-05-01 17:24:49 +02:00
Nick Wellnhofer
5465a8e57f Update INSTALL.libxml2
Fixes #238.
2021-04-25 21:26:15 +02:00
Nick Wellnhofer
1098c30a04 Fix user-after-free with xmllint --xinclude --dropdtd
The --dropdtd option can leave dangling pointers in entity reference
nodes. Make sure to skip these nodes when processing XIncludes.

This also avoids scanning entity declarations and even modifying
them inadvertently during XInclude processing.

Move from a block list to an allow list approach to avoid descending
into other node types that can't contain elements.

Fixes #237.
2021-04-22 19:44:26 +02:00
Nick Wellnhofer
72b3c067ce Fix dangling pointer with xmllint --dropdtd
Reset doc->intSubset when dropping the DTD.
2021-04-22 19:24:50 +02:00
Joel Hockey
bf22713507 Validate UTF8 in xmlEncodeEntities
Code is currently assuming UTF-8 without validating. Truncated UTF-8
input can cause out-of-bounds array access.

Adds further checks to partial fix in 50f06b3e.

Fixes #178
2021-04-22 11:57:32 +02:00
Nick Wellnhofer
1358d157d0 Fix use-after-free with xmllint --html --push
Call htmlCtxtUseOptions to make sure that names aren't stored in
dictionaries.

Note that this issue only affects xmllint using the HTML push parser.

Fixes #230.
2021-04-21 13:49:44 +02:00
Nick Wellnhofer
fb08d9fe83 Fix include order in c14n.h
- Include xmlversion.h before testing feature flags.
- Include libxml headers before extern "C".

Fixes #226.
2021-03-20 22:05:33 +01:00
Christopher Degawa
d3a02679b8
CMake: Only add postfixes if MSVC
Currently, it catches mingw-w64 in there as well, but mingw-w64 follows
linux-like naming with no weird postfixes

Signed-off-by: Christopher Degawa <ccom@randomderp.com>
2021-03-16 13:04:53 -05:00
Nick Wellnhofer
868e49cffd Allow FP division by zero in xmlXPathInit 2021-03-16 10:36:04 +01:00
Nick Wellnhofer
d25460da14 Fix XPath NaN/Inf for older GCC versions
The DBL_MAX approach could lead to errors caused by excess precision.
Switch back to the division-by-zero approach with a work-around for
MSVC and use the extern globals instead of macro expressions.
2021-03-13 19:14:27 +01:00
Nick Wellnhofer
e20c9c148c Fix xmlGetNodePath with invalid node types
Make xmlGetNodePath return NULL instead of invalid XPath when hitting
unsupported node types like DTD content.

Reported here:
https://mail.gnome.org/archives/xml/2021-January/msg00012.html

Original report:
https://bugs.php.net/bug.php?id=80680
2021-03-13 18:46:00 +01:00
Nick Wellnhofer
c3fd8c4295 Fix exponential behavior with recursive entities
Fix another case where only recursion depth was limited, but entities
would still be expanded over and over again.

The test case discovered by fuzzing only affected parsing in recovery
mode with XML_PARSE_RECOVER.

Found by OSS-Fuzz.
2021-03-13 17:37:09 +01:00
Nick Wellnhofer
683de7efe4 Fix duplicate xmlStrEqual calls in htmlParseEndTag 2021-03-04 19:22:35 +01:00
Nick Wellnhofer
8095365b77 Speed up htmlCheckAutoClose
Switch to binary search.
2021-03-04 19:22:35 +01:00
Nick Wellnhofer
b25acce858 Speed up htmlTagLookup
Switch to binary search. This is the first time bsearch is used in the
libxml2 code base. But it's a standard library function since C89 and
should be portable.
2021-03-04 17:44:45 +01:00
Nick Wellnhofer
ad101bb5b5 Clarify xmlNewDocProp documentation 2021-03-02 13:43:31 +01:00
Nick Wellnhofer
a6e6498fb1 Stop checking attributes for UTF-8 validity
I can't see a reason to check attribute content for UTF-8 validity.
Other parts of the API like xmlNewText have always assumed valid UTF-8
as extra checks only slow down processing.

Besides, setting doc->encoding to "ISO-8859-1" seems pointless, and not
freeing the old encoding would cause a memory leak.

Note that this was last changed in 2008 with commit 6f8611fd which
removed unnecessary encoding/decoding steps. Setting attributes should
be even faster now.

Found by OSS-Fuzz.
2021-03-02 13:35:04 +01:00
Nick Wellnhofer
8446d4593e Reduce some fuzzer timeouts
OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for
several hundred hours without hitting the 20s timeout. It seems that
most timeouts resulting from accidentally quadratic behavior in the
HTML parser have been fixed. Start to gradually reduce the timeout to
find new performance issues.
2021-03-01 20:56:40 +01:00
Nick Wellnhofer
688b41a0fb Fix quadratic behavior when looking up xml:* attributes
Add a special case for the predefined XML namespace when looking up DTD
attribute defaults in xmlGetPropNodeInternal to avoid calling
xmlGetNsList.

This fixes quadratic behavior in

- xmlNodeGetBase
- xmlNodeGetLang
- xmlNodeGetSpacePreserve

Found by OSS-Fuzz.
2021-03-01 14:36:38 +01:00
Nick Wellnhofer
ce2fbaa89d Only run a few CI tests unless scheduled
Only run the following tests by default

- gcc
- clang:asan
- cmake:mingw:w64-x86_64:shared
- cmake:msvc:v141:x64:shared
2021-02-22 22:29:28 +01:00
Nick Wellnhofer
85c817a200 Improve fuzzer stability
- Add more calls to xmlInitializeCatalog.
- Call xmlResetLastError after fuzzing each input.
2021-02-22 22:29:28 +01:00
Nick Wellnhofer
f9ccb3b818 Check for feature flags in fuzzer tests 2021-02-22 22:29:28 +01:00
Markus Rickert
88c657d643 Use CMake PROJECT_VERSION 2021-02-22 21:11:00 +01:00
Nick Wellnhofer
7a90bdfae6 Another attempt at improving fuzzer stability
xmlInitializeCatalog is not called from xmlInitParser.
2021-02-22 17:58:06 +01:00
Nick Wellnhofer
0fb3ae5840 Revert "Improve HTML fuzzer stability"
This reverts commit de1b51eddc.
2021-02-22 17:31:05 +01:00
Nick Wellnhofer
0987001c1b Add charset names to fuzzing dictionaries 2021-02-22 13:21:38 +01:00