1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 20:25:14 +03:00
Commit Graph

6343 Commits

Author SHA1 Message Date
Nick Wellnhofer
3efbe916a1 parser: Mark 'token' member as unused in xmlParserCtxt 2024-01-05 20:39:40 +01:00
Nick Wellnhofer
b82fd81d06 parser: Rework xmlCtxtParseDocument
Make xmlCtxtParseDocument take a parser input which can be popped after
parsing.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
c2b3294f60 fuzz: Abort on invalid UTF-8
The parser should never generate invalid UTF-8 these days even in
recovery mode.
2024-01-04 21:20:51 +01:00
Michele Bianchi
df098e3bf6 Set LIBXML2_FOUND if it has been properly configured 2024-01-04 19:22:57 +00:00
Nick Wellnhofer
d7d300ba04 parser: Remove remnants of runtime debugging feature
Apparently, this feature was remove long ago.

Fixes #651.
2024-01-04 17:50:11 +01:00
Nick Wellnhofer
8c5848bdd5 parser: Make xmlParseContent more useful
This is an internal function which isn't really usable without some
hacks. See WebKit/Chromium trying to recreate the effects of
xmlDetectSAX2 manually, for example.

Make xmlParseContent perform late initialization and check whether the
content was fully parsed.

Also rename xmlDetectSAX2 and document why it's needed.
2024-01-04 17:45:03 +01:00
Nick Wellnhofer
65c65b6524 tests: Move away from global error handlers 2024-01-04 15:41:43 +01:00
Nick Wellnhofer
07c05546fa error: Make xmlFormatError public
This is a useful function to get a verbose error report.

Allows to remove duplicated code from runtest.c. Also reactivate check
for schema parser failures.
2024-01-04 15:41:43 +01:00
Nick Wellnhofer
d0eb5a7e54 parser: Remove xmlErrEncodingInt
Convert the last user to xmlFatalErr.
2024-01-04 15:28:57 +01:00
Nick Wellnhofer
f30b9b2331 fuzz: Add assertion in xmlCopyCharMultibyte
This is an internal function that should never receive out-of-range
codepoints.
2024-01-04 15:28:57 +01:00
Nick Wellnhofer
a7356dfecc parser: Clear invalid entity content
This was removed in earlier commits, but we really want to make sure
that entity content is syntactically valid.
2024-01-04 15:28:57 +01:00
Nick Wellnhofer
30d839776a fuzz: Disable catalogs
The catalogs API doesn't report OOM errors. It's basically impossible
to use it safely in its current form.
2024-01-04 15:18:14 +01:00
Nick Wellnhofer
ca5965d594 save: Report more malloc failures 2024-01-02 23:43:06 +01:00
Nick Wellnhofer
2c9cd0b68b fuzz: Abort on internal errors 2024-01-02 19:48:23 +01:00
Nick Wellnhofer
661ef93694 valid: Fix some error codes 2024-01-02 19:48:23 +01:00
Nick Wellnhofer
0821efc8ee encoding: Check whether encoding handlers support input/output
The "HTML" encoding handler doesn't support input which could lead to a
wrong error report.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
85f99023ae parser: Fix buffer size checks
Don't test size of remaining data. This causes false positives with
memory buffers.

Also impose XML_MAX_HUGE_LENGTH limit when parsing with XML_PARSE_HUGE.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
e8fb3d639f parser: Convert some "internal errors" to meaningful codes 2024-01-02 19:48:23 +01:00
Nick Wellnhofer
9912c36904 SAX2: Enforce size limit in xmlSAX2Text with XML_PARSE_HUGE 2024-01-02 19:48:23 +01:00
Nick Wellnhofer
5cb4b05c57 parser: Lower maximum entity nesting depth
Limit entity nesting depth to 20 or 40 with XML_PARSE_HUGE.

Change error code to XML_ERR_RESOURCE_LIMIT.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
a2cc7f5f04 parser: Set depth limit to 2048 with XML_PARSE_HUGE
Deeply nested documents can cause performance problems, so the nesting
depth should always be limited to a reasonable value.

Also remove the global xmlParserMaxDepth setting which isn't thread-safe
and seems unused.
2024-01-02 19:42:06 +01:00
Nick Wellnhofer
875bb08489 parser: Implement xmlCtxtSetOptions
Surprisingly, some options can only be enabled with xmlCtxtUseOptions
and it's impossible to unset them. Add a new API function
xmlCtxtSetOptions which sets or clears all options.

Finally document all parser options.

Make sure to synchronize option bits and struct members.
2024-01-02 19:42:06 +01:00
Nick Wellnhofer
33ec407a73 parser: Always prefer option members over bitmask
If an option has an extra member in xmlParserCtxt, it takes precedence
over the value from the options bitmask. Fix a few places where this was
ignored.
2024-01-02 17:58:53 +01:00
Nick Wellnhofer
22fd571f3c parser: Don't modify SAX2 handler if XML_PARSE_SAX1 is set
It's a bad idea to modify members of the SAX handler struct for option
state management. Ideally, ctxt->options should be the preferred source
of truth.
2024-01-02 16:42:23 +01:00
Nick Wellnhofer
37c6618be5 parser: Rework parsing of attribute and entity values
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.

Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.

Normalize attribute values in a single pass while expanding entities.

Be more lenient in recovery mode.

If no entity substitution was requested, validate entities without
expanding. Fixes #596.

Also fixes #655.
2024-01-02 15:42:03 +01:00
Nick Wellnhofer
4dcc2d743e save: Output U+FFFD replacement characters
This degrades more gracefully and helps to diagnose errors.

We stop raising errors for now, since there's no way to report malloc
failures during error handling yet.
2024-01-02 15:39:11 +01:00
Nick Wellnhofer
2b79f106ff parser: Simplify entity size accounting 2024-01-02 14:17:27 +01:00
Nick Wellnhofer
08d9b2588f parser: Support namespace scope in NsData struct
The previous approach of recreating the NsData struct was flawed.
2024-01-02 14:17:27 +01:00
Nick Wellnhofer
5de48d1263 parser: Simplify error handling when parsing entities 2024-01-02 14:17:27 +01:00
Nick Wellnhofer
f0dc52d09c parser: Move cleanup of element stacks to xmlParseContent 2024-01-02 14:17:27 +01:00
Nick Wellnhofer
a1ed589b4b parser: Avoid unwanted expansion of parameter entities
Remove PE handling from xmlSkipBlankChars and add a separate version
that handles PEs. Only call xmlSkipBlankCharsPE when parsing DTD
constructs. This should make sure that PEs don't get expanded
accidentally, for example in text declarations.
2024-01-02 14:17:27 +01:00
Nick Wellnhofer
16b0dbc1b3 parser: Fix XML_ERR_UNSUPPORTED_ENCODING errors
Commit 45157261 added the check in the wrong place.

Also allow unsupported encoding in xmlNewInputInternal.

Fixes #654.
2024-01-02 14:17:27 +01:00
Nick Wellnhofer
e45a4d7115 io: Always forward IO errors to global handler
The HTTP module raises errors without context. This won't be fixed,
so send them to the global error handler.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
a73483ed41 parser: Remove extraneous error message
This is not an "internal error" but some other error reported elsewhere.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
7e0bbbc143 parser: New input API
Provide a new set of functions to create xmlParserInputs. These can be
used for the document entity or from external entity loaders.

- Don't require xmlParserInputBuffer.
- All functions take a base URI.
- All functions take an encoding as string.
- xmlNewInputURL also takes a public ID.
- xmlNewInputMemory takes a size_t.
- Optimization hints for memory buffers.

Improve documentation.

Only call xmlInitParser before allocating a new parser context.

Call xmlCtxtUseOptions as early as possible.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
451572615c parser: Downgrade XML_ERR_UNSUPPORTED_ENCODING to warning
If the actual encoding is UTF-8 or ASCII, we don't want to fail.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
24b7144f2c parser: More refactoring of entity parsing
Remove xmlCreateEntityParserCtxtInternal.

Rework xmlNewEntityInputStream.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
d3ceea0b5b parser: Fix encoding handling in xmlParserInputBufferCreateIO
Don't pass encoding to xmlParserInputBufferCreateIO but use
xmlSwitchEncoding to make sure that the encoding sticks.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
d025cfbb4b parser: Always copy content from entity to target.
Make sure that references from IDs are updated.

Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
2023-12-29 01:22:11 +01:00
Nick Wellnhofer
6337ff793b parser: Simplify control flow in xmlParseReference 2023-12-29 01:21:45 +01:00
Nick Wellnhofer
579186f2e0 parser: Remove xmlSetEntityReferenceFunc feature
This has been deprecated for a long time.
2023-12-29 01:20:51 +01:00
Nick Wellnhofer
b848338c5a parser: More refactoring of entity loading
This sets input->entity also for general entities.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
4ecc85d2cb parser: Push general entity input streams on the stack
This allows the error handler to give more context.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
a5dcf0f422 parser: Mark more parser context members as unused 2023-12-29 01:20:08 +01:00
Nick Wellnhofer
6a9a88a17f parser: Move progressive flag into input struct 2023-12-29 01:20:08 +01:00
Nick Wellnhofer
4f14fe9cf7 parser: Remove remaining ctxt->instate checks
Now ctxt->instate is only used for push parser states.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
d944a41515 parser: Fix in-parameter-entity and in-external-dtd checks
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.

Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
2023-12-29 01:19:56 +01:00
Nick Wellnhofer
477a7ed82c html: Abort earlier on fatal errors 2023-12-28 19:43:48 +01:00
Nick Wellnhofer
5f319304c8 SAX2: Fix error code
Today I learned that the TSCII character encoding [1] can blow up the
size of text 12 times when converted to UTF-8:

    $ printf '\x82' |iconv -f TSCII -t UTF-8 |hexdump -C
    00000000  e0 ae b8 e0 af 8d e0 ae  b0 e0 af 80
    0000000c

[1] https://en.wikipedia.org/wiki/Tamil_Script_Code_for_Information_Interchange
2023-12-28 19:43:48 +01:00
Nick Wellnhofer
ab63197149 uri: Keep fragment intact when resolving filesystem paths 2023-12-28 17:07:03 +01:00