1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-04-24 18:50:07 +03:00

847 Commits

Author SHA1 Message Date
Nick Wellnhofer
3ffcc03b16 parser: Deprecate more internal functions 2023-04-26 20:23:23 +02:00
Nick Wellnhofer
250faf3c83 parser: Fix regression in xmlParserNodeInfo accounting
Commit 62150ed2 broke begin_pos and begin_line when extra node info was
recorded.

Fixes #523.
2023-04-20 15:38:00 +02:00
Nick Wellnhofer
9282b08431 parser: Fix regression in memory pull parser with encoding
Revert another change from commit 98840d40.

Decode the whole buffer when reading from memory and switching to the
initial encoding. Add some comments about potential improvements.
2023-04-19 22:32:19 +02:00
David Kilzer
86105c0493 Fix use-after-free in xmlParseContentInternal()
* parser.c:
(xmlParseCharData):
- Check if the parser has stopped before advancing
  `ctxt->input->cur`.  This only occurs if a custom SAX error
  handler calls xmlStopParser() on fatal errors.

Fixes #518.
2023-04-16 12:01:05 -07:00
Nick Wellnhofer
b4d46cee80 parser: Remove first line handling in xmlParseChunk
After reworking EBCDIC detection, this isn't necessary.
2023-04-12 15:10:01 +02:00
Nick Wellnhofer
98840d40da parser: Rework EBCDIC code page detection
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.

Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
2023-03-21 21:35:15 +01:00
Nick Wellnhofer
3eb9f5ca4e parser: Limit name length in xmlParseEncName 2023-03-21 13:19:31 +01:00
Nick Wellnhofer
04d1bedd8c parser: Rework shrinking of input buffers
Don't try to grow the input buffer in xmlParserShrink. This makes sure
that no memory allocations are made and the function always succeeds.

Remove unnecessary invocations of SHRINK. Invoke SHRINK at the end of
DTD parsing loops.

Shrink before growing.
2023-03-21 13:19:18 +01:00
Nick Wellnhofer
067986fa67 parser: Fix regressions from previous commits
- Fix memory leak in xmlParseNmtoken.
- Fix buffer overread after htmlParseCharDataInternal.
2023-03-18 16:51:40 +01:00
Nick Wellnhofer
3e85d7b7ab parser: Rely on CUR_CHAR/NEXT to grow the input buffer
The input buffer is now grown reliably when calling CUR_CHAR
(xmlCurrentChar) or NEXT (xmlNextChar). This allows to remove many
other invocations of GROW.
2023-03-17 14:02:23 +01:00
Nick Wellnhofer
c81d0d04bf malloc-fail: Add more error checks when parsing names
xmlParseName and similar functions must return NULL if an error occurs.

Found by OSS-Fuzz, see #344.
2023-03-17 12:39:35 +01:00
Nick Wellnhofer
b167c73144 parser: Fix short-lived regression causing infinite loops
Fix 3eb6bf03. We really have to halt the parser, so the input buffer
gets reset.
2023-03-14 15:16:04 +01:00
Nick Wellnhofer
2099441f32 parser: Stop calling xmlParserInputShrink
Introduce xmlParserShrink which takes a parser context to simplify error
handling.
2023-03-13 17:51:13 +01:00
Nick Wellnhofer
cabde70f8b parser: Simplify calculation of available buffer space 2023-03-12 19:07:23 +01:00
Nick Wellnhofer
b75976e029 parser: Use size_t when subtracting input buffer pointers
Avoid integer overflows.
2023-03-12 19:06:19 +01:00
Nick Wellnhofer
9a6ca81612 parser: Check for integer overflow when updating checkIndex
Unfortunately, checkIndex is a long, not a size_t. Check for integer
overflow before updating the value.
2023-03-12 19:03:11 +01:00
Nick Wellnhofer
bd63d730b8 html: Impose some length limits
Impose length limits on names, attribute values, PIs and comments,
similar to the XML parser.
2023-03-12 17:40:55 +01:00
Nick Wellnhofer
3eb6bf0386 parser: Stop calling xmlParserInputGrow
Introduce xmlParserGrow which takes a parser context to simplify error
handling.
2023-03-12 17:05:51 +01:00
Nick Wellnhofer
207ebdfd2a malloc-fail: Fix out-of-bounds read in xmlGROW
Short-lived regression from 56cc2211.
2023-03-12 14:43:01 +01:00
Nick Wellnhofer
56cc2211bc parser: Merge xmlParserInputGrow into xmlGROW
Simplifies the code and makes error handling easier.
2023-03-09 22:27:58 +01:00
Nick Wellnhofer
14604a446e malloc-fail: Fix out-of-bounds read in xmlCurrentChar
Found by OSS-Fuzz.
2023-03-09 22:10:44 +01:00
Nick Wellnhofer
3f69fc805c parser: Tighten expansion limits
- Lower the amount of expansion which is always allowed from
  10MB to 1MB.
- Lower the maximum amplification factor from 10 to 5.
- Lower the "fixed cost" from 50 to 20.
2023-03-08 13:58:49 +01:00
Nick Wellnhofer
5d55315e32 parser: Fix OOB read when formatting error message
Don't try to print characters beyond the end of the buffer.

Found by OSS-Fuzz.
2023-02-18 17:29:07 +01:00
Nick Wellnhofer
f8852184a1 malloc-fail: Fix memory leak in xmlParseEntityDecl
Found with libFuzzer, see #344.
2023-02-17 17:16:50 +01:00
Nick Wellnhofer
e6d22f925a malloc-fail: Fix reallocation in inputPush
Store xmlRealloc result in temporary variable to avoid null deref in
error handler.

Found with libFuzzer, see #344.
2023-01-24 11:47:33 +01:00
Nick Wellnhofer
6fd8904108 malloc-fail: Fix use-after-free in xmlParseStartTag2
Fix error handling in xmlCtxtGrowAttrs.

Found with libFuzzer, see #344.
2023-01-24 11:47:33 +01:00
Nick Wellnhofer
d1b8785693 malloc-fail: Fix infinite loop in xmlParseTextDecl
Memory errors can set `instate` to `XML_PARSER_EOF` which results in
`NEXT` making no progress.

Found with libFuzzer, see #344.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
bd9de3a31f malloc-fail: Fix null deref in xmlAddDefAttrs
Found with libFuzzer, see #344.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
33d4a0fe40 parser: Fix progress check in xmlParseExternalSubset
Avoid infinite loop. Short-lived regression from f61b8a62.

Found with libFuzzer.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
74aa61e0bd parser: Halt parser on DTD errors
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.

Found with libFuzzer.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
d320a683d1 parser: Fix entity check in attributes
Don't set the "checked" flag when checking entities in default attribute
values. These entities could reference other entities which weren't
defined yet, so the check isn't reliable.

This fixes a short-lived regression which could lead to a call stack
overflow later in xmlStringGetNodeList.
2023-01-17 13:59:24 +01:00
Nick Wellnhofer
59b3366178 error: Limit number of parser errors
Reporting errors is expensive and some abusive test cases can generate
an error for each invalid input byte. This causes the parser to spend
most of the time with error handling. Limit the number of errors and
warnings to 100.
2022-12-27 14:41:19 +01:00
Nick Wellnhofer
66e9fd66e8 parser: Fix infinite loop with push parser in recovery mode
Short-lived regression from commit b1f9c193. Found by OSS-Fuzz.
2022-12-25 21:30:32 +01:00
Nick Wellnhofer
49b54d7e2b parser: Fix null deref in xmlStringDecodeEntitiesInt
Short-lived regression.
2022-12-25 15:06:51 +01:00
Nick Wellnhofer
1865668b61 parser: Fix accounting of consumed input bytes
Only add consumed bytes if

- we're not parsing an entity
- we're parsing external parameter entities for the first time.

Always ignore internal parameter entities.
2022-12-23 23:11:11 +01:00
Nick Wellnhofer
bc18f4a67c parser: Lower entity nesting limit with XML_PARSE_HUGE
The old limit of 1024 could lead to excessively deep call stacks. This
could probably be set much lower without causing issues.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
dd62e541ec parser: Don't increase depth twice when parsing internal entities
Fix xmlParseBalancedChunkMemoryInternal.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
a41b09c739 parser: Improve detection of entity loops
Set a flag to detect entity loops at once instead of processing until
the depth limit is exceeded.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
d972393f30 parser: Only report a single entity error
Don't report errors multiple times for nested entity references.
2022-12-23 22:10:39 +01:00
Nick Wellnhofer
077df27eb1 parser: Fix integer overflow of input ID
Applies a patch from Chromium. Also stop incrementing input ID of
subcontexts. This isn't necessary.

Fixes #465.
2022-12-22 15:22:01 +01:00
David Kilzer
0bd4e4e032 xmlParseStartTag2() contains typo when checking for default definitions for an attribute in a namespace
* parser.c:
(xmlParseStartTag2):
- Fix index into defaults->values.  It is only correct the first
  time through the loop when i == 0.

Fixes #467.
2022-12-21 19:35:33 -08:00
Nick Wellnhofer
b47ebf047e parser: Deprecate xmlString*DecodeEntities
These are internal functions.
2022-12-21 21:06:03 +01:00
Nick Wellnhofer
ec6633afae parser: Remove useless ent->etype test in xmlParseReference
If ent->etype is invalid, ret can't equal XML_ERR_OK.
2022-12-21 20:35:31 +01:00
Nick Wellnhofer
7ee7f0360a parser: Remove useless ent->children tests in xmlParseReference
The if-block before always returns if ent->children == NULL.
2022-12-21 20:35:31 +01:00
Nick Wellnhofer
ce76ebfd13 entities: Stop counting entities
This was only used in the old version of xmlParserEntityCheck.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
a3c8b1805e entities: Add entity flag for loop check 2022-12-21 20:19:10 +01:00
Nick Wellnhofer
463bbeeca1 entities: Rework entity amplification checks
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.

We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.

We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.

A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.

The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.

Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.

Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.

Fixes #294.
Fixes #345.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
7e3f469be9 entities: Use flags to store '<' check results
Instead of abusing the LSB of the "checked" member, store the result
of testing for occurrence of '<' character in "flags".

Also use the flags in xmlParseStringEntityRef instead of rescanning
every time.
2022-12-19 15:59:49 +01:00
Nick Wellnhofer
481d79d44c entities: Add XML_ENT_PARSED flag
To check whether an entity was already parsed, the code previously
tested whether "checked" was non-zero or "children" was non-null. The
"children" check could be unreliable because an empty entity also
results in an empty (NULL) node list. Use a separate flag to make this
check more reliable.
2022-12-19 15:26:46 +01:00
Alex Richardson
4b959ee168 Remove hacky heuristic from b2dc5675e94aa6b5557ba63f7d66b0f08dd17e4d
Checking whether the context is close to the parent context by hardcoding
250 is not portable (I noticed tests were failing on Morello since the value
is 288 there due to pointers being 128 bits). Instead we should ensure
that the XML_VCTXT_USE_PCTXT flag is not set in cases where the user data
is not actually a parser context (or ideally add a separate field but that
would be an ABI break.

From what I can see in the source, the XML_VCTXT_USE_PCTXT is only set if
the userData field points to a valid context, and if this is not the case
the flag should be cleared when changing userData rather than relying on
the offset between the two. Looking at the history, I think
d7cb33cf44aa688f24215c9cd398c1a26f0d25ff fixed most of the need for this
workaround, but it looks like there are a few more locations that need
updating; This commit changes two more places to set/clear/copy the
XML_VCTXT_USE_PCTXT flag, so this heuristic should not be needed anymore.
I've also drop two = NULL assignment in xmllint since this is not needed
after a call to memset().

There was also an uninitialized vctxt.flags (and other fields) in
`xmlShellValidate()`, which I've fixed by adding a memset() call.
2022-12-01 15:31:25 +00:00