1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-04-09 14:50:07 +03:00

990 Commits

Author SHA1 Message Date
Nick Wellnhofer
5cce7af791 parser: Fix loading of parameter entities in external DTDs
Regressed with commit 12f0bb94.

Fixes #816.
2024-11-12 16:35:58 +01:00
Nick Wellnhofer
4334cbb4e3 parser: Fix downstream code that swaps DTDs
Downstream code like the nginx xslt module can change the document's DTD
pointers in a SAX callback. If an entity from a separate DTD is parsed
lazily, its content must not reference the current document.

Regressed with commit d025cfbb.

Fixes #815.
2024-11-12 16:35:38 +01:00
Nick Wellnhofer
929297749c parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
2024-11-12 16:32:33 +01:00
Nick Wellnhofer
3a648d11a3 parser: Make xmlParseChunk return an error if parser was stopped
This regressed after enhancing the disableSAX member in 2.13.

Should fix #777.
2024-07-25 17:27:43 +02:00
Nick Wellnhofer
de28e6ed3a [CVE-2024-40896] Fix XXE protection in downstream code
Some users set an entity's children manually in the getEntity SAX
callback to restrict entity expansion. This stopped working after
renaming the "checked" member of xmlEntity, making at least one
downstream project and its dependants susceptible to XXE attacks.

See #761.
2024-07-24 14:34:13 +02:00
Nick Wellnhofer
e30cb632e7 parser: Fix error return of xmlParseBalancedChunkMemory
Only return an error code if the chunk is not well-formed to match the
2.12 behavior. Return 0 on non-fatal errors like invalid namespaces.

Fixes #765.
2024-07-08 13:32:58 +02:00
Nick Wellnhofer
046f61c698 parser: Reenable ctxt->directory
Unused internally, but used in downstream code.

Should fix #753.
2024-07-02 22:29:25 +02:00
Nick Wellnhofer
def06f376e parser: Selectively reenable reading from "-"
Make filename "-" mean stdin for legacy SAX1 functions and xmlReadFile.
This should hopefully fix most command line utilities.

See #737.
2024-06-17 18:17:15 +02:00
Nick Wellnhofer
8322eef39d parser: Pass global object to sax->setDocumentLocator
Revert part of commit c011e760.

Fixes #732.
2024-06-14 16:55:44 +02:00
Nick Wellnhofer
5510e989cb doc: Don't mention xmlNewInputURL 2024-06-12 16:05:49 +02:00
Nick Wellnhofer
8318b5a634 parser: Fix NULL checks for output arguments 2024-06-09 15:08:43 +02:00
Nick Wellnhofer
0cde1b78d6 parser: Fix "Truncated multi-byte sequence" error
Don't raise the error if decoding failed.
2024-06-07 00:02:31 +02:00
Nick Wellnhofer
122b61309f parser: Fix performance regression when parsing namespaces
The namespace hash table didn't reuse deleted buckets, leading to
quadratic behavior.

Also ignore deleted buckets when resizing.

Fixes #726.
2024-06-06 15:52:09 +02:00
Nick Wellnhofer
a7e26707be parser: Don't overwrite OOM errors in xmlSBuf 2024-06-03 14:04:44 +02:00
Nick Wellnhofer
e75e878e02 doc: Update and fix documentation 2024-05-20 14:23:39 +02:00
Nick Wellnhofer
4fefba4cf6 parser: Rework handling of undeclared entities
Throw an error if entity substitution was requested.

Now we only downgrade to a warning if

- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.

Should fix #724.
2024-05-15 17:58:48 +02:00
Nick Wellnhofer
4ff2dccf9f SAX2: Warn if URI resolution failed 2024-05-13 12:50:08 +02:00
Nick Wellnhofer
4fe116ebd3 parser: Don't report error on invalid URI
Only fragment identifiers are an error.

This removes the last user of xmlErrMsg*. Now every error reported by
the parser should result in one of ctxt->wellFormed, ctxt->nsWellFormed
or ctxt->valid being set to zero.
2024-05-13 12:50:08 +02:00
Nick Wellnhofer
a4c2b7233f io: Don't set close callback in xmlParserInputBufferCreateFd 2024-05-05 17:27:12 +02:00
Nick Wellnhofer
fdc5ff3657 parser: Always throw entity errors if external DTD is loaded
When parsing with XML_PARSE_DTDLOAD, missing entities are always an
error.

Also consolidate behavior when validating. See b717abdd.
2024-05-03 11:52:54 +02:00
Nick Wellnhofer
39e5b35bd0 parser: Don't create undeclared entity refs in substitution mode
We never want to create entity reference nodes if entity substitution
is enabled. This also applies to undeclared entities.
2024-05-03 11:46:01 +02:00
Nick Wellnhofer
1cdfece12b memory: Remove memory debugging
This is useless compared to sanitizers or valgrind and has a
considerable performance impact if enabled accidentally.
2024-04-28 20:42:55 +02:00
Nick Wellnhofer
45fe9924f0 parser: Don't create reference in xmlLookupGeneralEntity
This should only be done in xmlParseReference.

The handling of undeclared entities is still somewhat inconsistent. In
element content we create references even if entity substitution is
enabled. In attribute values undeclared entities are always ignored.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
b717abdd09 parser: Consolidate error handling for undeclared entities
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
f506ec6654 parser: Always decode entities in namespace URIs
Also decode entities in namespace URIs if entity substitution wasn't
requested. This should fix some corner cases when comparing namespace
URIs. The Namespaces in XML 1.0 spec says:

> In a namespace declaration, the URI reference is the normalized value
> of the attribute, so replacement of XML character and entity
> references has already been done before any comparison.

Make the serialization code escape special characters in namespace URIs
like in attribute values. This fixes serialization if entities were
substituted when parsing.

Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
2024-04-15 12:34:26 +02:00
Nick Wellnhofer
2840e33c5e tree: Allocate XML namespace statically 2024-03-15 19:47:07 +01:00
Nick Wellnhofer
186562a182 parser: Fix detection of duplicate attributes in XML namespace
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.

Fixes #704.
2024-03-12 20:02:52 +01:00
Nick Wellnhofer
4d774612f3 parser: Fix column number in attribute values
Short-lived regression from 37c6618b.
2024-02-13 12:00:02 +01:00
Nick Wellnhofer
95f2a17440 parser: Fix crash in xmlParseInNodeContext with HTML documents
Ignore namespaces if we have an HTML document with namespaces added
manually.

Fixes #672.
2024-01-30 13:35:41 +01:00
Nick Wellnhofer
6dc2fdb2bd parser: Account for full size of non-well-formed entities
Account for the full size of the entity if parsing stops because of
errors. In our cost model, we have to assume that the entity loader
processes the whole entity regardless of its content.
2024-01-10 15:58:23 +01:00
Nick Wellnhofer
29beef653c parser: Pop inputs if parsing DTD failed
This should provide some statistics in ctxt->sizeentcopy even in the
error or recovery case.
2024-01-10 15:58:23 +01:00
Nick Wellnhofer
02a2038de4 parser: Handle NOCDATA properly when expanding entities
Short-lived regression from e1153832.
2024-01-10 14:17:49 +01:00
Nick Wellnhofer
e1153832b0 parser: Fix quadratic behavior when copying entities
Process the first and last text node with the SAX handler to make the
text merging optimization kick in.

Fixes #657.
2024-01-07 15:42:39 +01:00
Nick Wellnhofer
f237e5b934 parser: Avoid duplicate namespace errors
Don't report an extra attribute uniqueness error if a namespace is
undeclared. This matches old behavior.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
02cc5c3609 parser: Add XML_PARSE_NO_XXE parser option 2024-01-05 20:39:40 +01:00
Nick Wellnhofer
12f0bb9478 parser: Synchronize more options 2024-01-05 20:39:40 +01:00
Nick Wellnhofer
3efbe916a1 parser: Mark 'token' member as unused in xmlParserCtxt 2024-01-05 20:39:40 +01:00
Nick Wellnhofer
b82fd81d06 parser: Rework xmlCtxtParseDocument
Make xmlCtxtParseDocument take a parser input which can be popped after
parsing.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
d7d300ba04 parser: Remove remnants of runtime debugging feature
Apparently, this feature was remove long ago.

Fixes #651.
2024-01-04 17:50:11 +01:00
Nick Wellnhofer
8c5848bdd5 parser: Make xmlParseContent more useful
This is an internal function which isn't really usable without some
hacks. See WebKit/Chromium trying to recreate the effects of
xmlDetectSAX2 manually, for example.

Make xmlParseContent perform late initialization and check whether the
content was fully parsed.

Also rename xmlDetectSAX2 and document why it's needed.
2024-01-04 17:45:03 +01:00
Nick Wellnhofer
a7356dfecc parser: Clear invalid entity content
This was removed in earlier commits, but we really want to make sure
that entity content is syntactically valid.
2024-01-04 15:28:57 +01:00
Nick Wellnhofer
30d839776a fuzz: Disable catalogs
The catalogs API doesn't report OOM errors. It's basically impossible
to use it safely in its current form.
2024-01-04 15:18:14 +01:00
Nick Wellnhofer
85f99023ae parser: Fix buffer size checks
Don't test size of remaining data. This causes false positives with
memory buffers.

Also impose XML_MAX_HUGE_LENGTH limit when parsing with XML_PARSE_HUGE.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
e8fb3d639f parser: Convert some "internal errors" to meaningful codes 2024-01-02 19:48:23 +01:00
Nick Wellnhofer
5cb4b05c57 parser: Lower maximum entity nesting depth
Limit entity nesting depth to 20 or 40 with XML_PARSE_HUGE.

Change error code to XML_ERR_RESOURCE_LIMIT.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
a2cc7f5f04 parser: Set depth limit to 2048 with XML_PARSE_HUGE
Deeply nested documents can cause performance problems, so the nesting
depth should always be limited to a reasonable value.

Also remove the global xmlParserMaxDepth setting which isn't thread-safe
and seems unused.
2024-01-02 19:42:06 +01:00
Nick Wellnhofer
875bb08489 parser: Implement xmlCtxtSetOptions
Surprisingly, some options can only be enabled with xmlCtxtUseOptions
and it's impossible to unset them. Add a new API function
xmlCtxtSetOptions which sets or clears all options.

Finally document all parser options.

Make sure to synchronize option bits and struct members.
2024-01-02 19:42:06 +01:00
Nick Wellnhofer
33ec407a73 parser: Always prefer option members over bitmask
If an option has an extra member in xmlParserCtxt, it takes precedence
over the value from the options bitmask. Fix a few places where this was
ignored.
2024-01-02 17:58:53 +01:00
Nick Wellnhofer
22fd571f3c parser: Don't modify SAX2 handler if XML_PARSE_SAX1 is set
It's a bad idea to modify members of the SAX handler struct for option
state management. Ideally, ctxt->options should be the preferred source
of truth.
2024-01-02 16:42:23 +01:00
Nick Wellnhofer
37c6618be5 parser: Rework parsing of attribute and entity values
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.

Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.

Normalize attribute values in a single pass while expanding entities.

Be more lenient in recovery mode.

If no entity substitution was requested, validate entities without
expanding. Fixes #596.

Also fixes #655.
2024-01-02 15:42:03 +01:00