1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 20:25:14 +03:00
Commit Graph

128 Commits

Author SHA1 Message Date
Nick Wellnhofer
250faf3c83 parser: Fix regression in xmlParserNodeInfo accounting
Commit 62150ed2 broke begin_pos and begin_line when extra node info was
recorded.

Fixes #523.
2023-04-20 15:38:00 +02:00
Nick Wellnhofer
d7d0bc6581 SAX2: Ignore namespaces in HTML documents
In commit 21ca8829, we started to ignore namespaces in HTML element
names but we still called xmlSplitQName, effectively stripping the
namespace prefix. This would cause elements like <o:p> being parsed
as <p>. Now we leave the name untouched.

Fixes #508.
2023-03-31 17:08:43 +02:00
Nick Wellnhofer
cb4334b7ab malloc-fail: Fix memory leak in xmlSAX2StartElementNs
Found with libFuzzer, see #344.
2023-02-17 17:16:51 +01:00
Nick Wellnhofer
0c5f40b788 malloc-fail: Fix null deref in xmlSAX2AttributeInternal
Found with libFuzzer, see #344.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
b3b53dcce4 malloc-fail: Fix null deref in xmlSAX2Text
Found with libFuzzer, see #344.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
463bbeeca1 entities: Rework entity amplification checks
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.

We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.

We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.

A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.

The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.

Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.

Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.

Fixes #294.
Fixes #345.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
cecd364dd2 parser: Don't call *DefaultSAXHandlerInit from xmlInitParser
Change the default handler definitions to match the result after calling
the initialization functions.

This makes sure that no thread-local variables are accessed when calling
xmlInitParser.
2022-11-25 15:02:04 +01:00
Nick Wellnhofer
68a6518c45 parser: Rewrite push parser boundary checks
Remove inaccurate xmlParseCheckTransition check.

Remove non-incremental xmlParseGetLasts check.

Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.

Fixes #439.
2022-11-20 21:27:08 +01:00
Nick Wellnhofer
7ceaee9430 malloc-fail: Fix memory leak in xmlSAX2ExternalSubset
Found with libFuzzer, see #344.
2022-11-02 16:05:05 +01:00
Nick Wellnhofer
81621b1fe4 Fix compiler warnings in SAX2.c 2022-09-02 18:44:59 +02:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00
Nick Wellnhofer
aeb69fd357 Fix overflow check in SAX2.c 2022-09-01 02:33:57 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
0e49f8826a Mark most SAX1 functions as deprecated
No compiler warnings generated yet.
2022-08-24 14:07:57 +02:00
Nick Wellnhofer
4b184240be Remove htmlDefaultSAXHandler from non-SAX1 build
This matches long-standing behavior of the XML counterpart.
2022-08-22 14:24:25 +02:00
Nick Wellnhofer
3e7b4f37aa Avoid calling xmlSetTreeDoc
Create text nodes with xmlNewDocText or set the document directly to
avoid xmlSetTreeDoc being called when the node is inserted.
2022-06-20 01:49:39 +02:00
Nick Wellnhofer
40483d0ce2 Deprecate module init and cleanup functions
These functions shouldn't be part of the public API. Most init
functions are only thread-safe when called from xmlInitParser. Global
variables should only be cleaned up by calling xmlCleanupParser.
2022-03-06 15:59:43 +01:00
Nick Wellnhofer
4a8c71eb7c Remove DOCBparser
This code has been broken and deprecated since version 2.6.0, released
in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never
compiled since 2012. I couldn't find a Debian package using any of its
symbols, so it seems safe to remove this module.
2022-03-04 22:56:21 +01:00
Nick Wellnhofer
c41bc10da3 Fix unused variable warnings with disabled features 2022-02-22 19:57:12 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
e03590c9ad Don't add IDs containing unexpanded entity references
When parsing without entity substitution, IDs or IDREFs containing
unexpanded entity reference like "abc&x;def" could be created. We could
try to expand these entities like in validation mode, but it seems
safer to honor the request not to expand entities. We silently ignore
such IDs for now.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
d7cb33cf44 Rework validation context flags
Use a bitmask instead of magic values to

- keep track whether the validation context is part of a parser context
- keep track whether xmlValidateDtdFinal was called

This allows to add addtional flags later.

Note that this deliberately changes the name of a public struct member,
assuming that this was always private data never to be used by client
code.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
a647e43025 Fix casting of line numbers in SAX2.c
The line member is an unsigned short. Avoids integer conversion warnings
with UBSan.

Also use USHRT_MAX instead of hard-coded constant.
2022-01-25 03:20:28 +01:00
David King
92bce68c0d Fix memory leak in xmlSAX2AttributeDecl
Found by Coverity.

https://bugzilla.redhat.com/show_bug.cgi?id=1938806
2022-01-16 14:11:28 +01:00
Nick Wellnhofer
acb3566739 Fix quadratic runtime when parsing CDATA sections
Use optimized concatenation for CDATA sections in addition to normal
text. This also affects HTML script content.

Found by OSS-Fuzz.
2021-02-03 13:57:26 +01:00
Nick Wellnhofer
21ca8829a7 Don't try to handle namespaces when building HTML documents
Don't try to resolve namespace in xmlSAX2StartElement when parsing
HTML documents. This useless operation could slow down the parser
considerably.

Found by OSS-Fuzz.
2020-07-25 17:57:29 +02:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Nick Wellnhofer
eddfbc38fa Don't load external entity from xmlSAX2GetEntity
Despite the comment, I can't see a reason why external entities must be
loaded in the SAX handler. For external entities, the handler is
typically first invoked via xmlParseReference which will later load the
entity on its own if it wasn't loaded yet.

The old code also lead to duplicated SAX events which makes it
basically impossible to reuse xmlSAX2GetEntity for a custom SAX parser.
See the change to the expected test output.

Note that xmlSAX2GetEntity was loading the entity via
xmlParseCtxtExternalEntity while xmlParseReference uses
xmlParseExternalEntityPrivate. In the previous commit, the two
functions were merged, trying to compensate for some slight differences
between the two mostly identical implementations.

But the more urgent reason for this change is that xmlParseReference
has the facility to abort early when recursive entities are detected,
avoiding what could practically amount to an infinite loop.

If you want to backport this change, note that the previous three
commits are required as well:

f9ea1a24 Fix copying of entities in xmlParseReference
5c7e0a9a Copy some XMLReader option flags to parser context
1a3e584a Merge code paths loading external entities

Found by OSS-Fuzz.
2020-02-11 17:35:42 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
6b49db2cb2 Fix memory leak in xmlSAX2StartElement
Introduced by a recent commit. Only happens if max depth is exceeded
in SAX1 mode.

Found by OSS-Fuzz.
2019-01-07 18:07:00 +01:00
Nick Wellnhofer
1567b55b72 Set doc on element obtained from freeElems
In commit 8c9daf79, a call to xmlFreeNode was added in
xmlSAX2StartElementNs. If a node was obtained from the freeElems list,
make sure to set the doc, otherwise xmlFreeNode wouldn't realize that
the node name might be in the dictionary, causing an invalid free.

Note that the issue fixed in commit 8c9daf79 requires commit 0ed6addb
and this one to work properly.

Found by OSS-Fuzz.
2018-11-22 16:28:46 +01:00
Nick Wellnhofer
0ed6addb8f Unlink node before freeing it in xmlSAX2StartElement
The node may have been added to the document already, so it must be
unlinked first. Thanks to David Kilzer for spotting this.
2018-09-22 15:41:01 +02:00
Nick Wellnhofer
8c9daf790a Check return value of nodePush in xmlSAX2StartElement
If the maximum depth is exceeded, nodePush halts the parser which
results in freeing the input buffer since the previous commit. This
invalidates the attribute pointers, so the error condition must be
checked.

Found by OSS-Fuzz.
2018-09-12 13:52:47 +02:00
Nick Wellnhofer
d422b954be Fix pointer/int cast warnings on 64-bit Windows
On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer.
Switch to ptrdiff_t instead which should be the same size as a pointer
on every somewhat sane platform without requiring C99 types like
intptr_t.

Fixes bug 788312.

Thanks to J. Peter Mugaas for the report and initial patch.
2017-10-09 13:47:49 +02:00
Nick Wellnhofer
83fb4119a9 Fix memory leaks in SAX1 parser
Found by OSS-Fuzz. I could only reproduce this with the (obsolete)
SAX1 parser.

One leak is caused by duplicate namespaced attribute names and can be
reproduced in memory mode (testcase 4556417027538944):

    $ cat file
    <d xmlns:a="ns" a:x="v" xmlns:b="ns" b:x="v"/>
    $ xmllint --sax1 --memory file

The other is caused by ATTLISTs with a normalized default for "xmlns"
if they're processed after the entity recursion limit was hit
(testcase 5580750034305024).

    $ cat file
    <!DOCTYPE d [
	<!ENTITY a '<d>&a;'>
	<!ATTLIST d xmlns NMTOKEN 't'>
    ]>
    <d>&a;
    $ xmllint --sax1 --valid file

Also see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2461
2017-09-06 01:12:34 +02:00
Nick Wellnhofer
8bbe4508ef Spelling and grammar fixes
Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other
misspellings.
2017-06-17 16:34:23 +02:00
David Tardon
074180119f Do not leak the new CData node if adding fails
For https://bugzilla.gnome.org/show_bug.cgi?id=780918
2017-04-07 18:24:52 +02:00
David Kilzer
4472c3a5a5 Fix some format string warnings with possible format string vulnerability
For https://bugzilla.gnome.org/show_bug.cgi?id=761029

Decorate every method in libxml2 with the appropriate
LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups
following the reports.
2016-05-23 15:01:07 +08:00
Daniel Veillard
a6ea72ad19 Fix processing in SAX2 in case of an allocation failure
Related to https://bugzilla.gnome.org/show_bug.cgi?id=731360
2014-07-14 20:29:34 +08:00
Gaurav
3e0eec4319 Adding some missing NULL checks
in SAX2 DOM building code and in the HTML parser
2014-06-13 14:45:20 +08:00
Nicolas Le Cam
52010c639a Compile out use of xmlValidateNCName() when not available.
Fix compilation with minimum and valid.
2014-02-10 10:36:20 +08:00
Nicolas Le Cam
77b5b46409 Legacy needs xmlSAX2StartElement() and xmlSAX2EndElement().
Fix compilation with minimum and legacy.
2014-02-10 10:32:45 +08:00
Gaurav
a885f13a67 Fix a possible NULL dereference
https://bugzilla.gnome.org/show_bug.cgi?id=705400
In case of allocation error the pointer was dereferenced before the
test for a failure
2013-08-03 22:16:02 +08:00
Daniel Veillard
ab0e35044c Activate detection of encoding in external subset
https://bugzilla.gnome.org/show_bug.cgi?id=694228

the ctxt->encoding was percolated down when parsing the external
subset leading to failures
2013-03-27 13:21:38 +08:00
Daniel Veillard
cff2546f13 Cache presence of '<' in entities content
slightly modify how ent->checked is used, and use the lowest bit to
keep the information
2013-03-11 15:59:22 +08:00
Daniel Veillard
a3f1e3e571 Avoid extra processing on entities
If an entity has already been checked for correctness no
need to check it on every reference
2013-03-11 15:59:21 +08:00
Daniel Veillard
6c91aa384f Fix a regression in 2.9.0 breaking validation while streaming
https://bugzilla.gnome.org/show_bug.cgi?id=684774
with help from Kjell Ahlstedt <kjell.ahlstedt@bredband.net>
2012-10-25 15:33:59 +08:00
Daniel Veillard
7651606f31 Various cleanups to avoid compiler warnings 2012-09-11 14:02:08 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
968a03a2e5 Add support for big line numbers in error reporting
Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com>

* parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser
  option not switch on by default, it's an opt-in
* SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers
  in the psvi field of text nodes
* tree.c: expand xmlGetLineNo to extract those informations, also
  make sure we can't fail on recursive behaviour
* error.c: in __xmlRaiseError, if a node is provided, call
  xmlGetLineNo() if we can't get a valid line number.
* xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
2012-08-13 12:41:33 +08:00