1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 12:25:09 +03:00
Commit Graph

631 Commits

Author SHA1 Message Date
Nick Wellnhofer
9678163f54 html: Don't check for valid XML characters 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
4eeac30944 html: Start to fix EOF and U+0000 handling 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
17da54c522 html: Normalize newlines 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
3adb396d87 html: Parse bogus comments instead of ignoring them
Also treat XML processing instructions as bogus comments.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
e1834745e0 html: Add character data tests 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
f9ed30e972 html: HTML5 character data states 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
5951179239 html: Parse named character references according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
a80f8b64a9 html: Allow attributes in end tags
Attribute are syntactically allowed in HTML5 end tags but otherwise
ignored.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
dcb2abb2fe html: Parse tag and attribute names according to HTML5
HTML5 allows bascially all characters in tag and attribute names.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
bd9eed4694 parser: Make unsupported encodings an error in declarations
This was changed in 45157261, but in encoding declarations, unsupported
encodings should raise a fatal error.

Fixes #794.
2024-09-02 19:29:39 +02:00
Nick Wellnhofer
8ae06d5223 SAX2: Don't merge CDATA sections
The Document Object Model (DOM) Level 3 Core Specification says:

> Adjacent CDATASection nodes are not merged by use of the normalize
> method of the Node interface.

Fixes #412.
2024-08-29 01:31:19 +02:00
Nick Wellnhofer
322e733b84 xinclude: Fix fallback for text includes
Fixes #772.
2024-07-18 19:32:23 +02:00
Nick Wellnhofer
842a044831 valid: Restore ID lookup
Revert a change from d025cfbb and don't overwrite ID table entries, so
that the first attribute will be returned if there are duplicate IDs.

This requires two other changes:

- Attributes in entity content are never added to the ID table. This
  seems reasonable.

- Remove the optimization to skip ID lookup when copying and the target
  document has an empty ID table. This also seems more correct since the
  document could have ID declarations nevertheless or we could be
  copying xml:ids into the document for the first time.

Fixes #757.
2024-07-03 11:46:06 +02:00
Nick Wellnhofer
30be984a0f encoding: Rework ISO-8859-X conversion
Optimize code. Pass tables as context parameter. Check for
XML_ENC_ERR_SPACE.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
7c11da2d98 tests: Clarify licence of test/intsubset2.xml 2024-06-27 12:49:06 +02:00
Nick Wellnhofer
b8903b9e0d runtest: Remove result handling from schemasOneTest
We only care about errors.
2024-06-22 21:59:03 +02:00
Nick Wellnhofer
e68ccfa988 tests: Port Schematron tests to C 2024-06-22 21:59:03 +02:00
Nick Wellnhofer
1dd5e76a69 xinclude: Don't remove root element
Don't replace include element at root with empty nodeset.
2024-06-18 20:12:03 +02:00
Nick Wellnhofer
52ce0d70f9 tests: Add XInclude test for issue #733 2024-06-17 17:35:12 +02:00
Nick Wellnhofer
2608baaf92 parser: Make failure to load main document a warning
Revert the change that made failures to load the main document an error.

This fixes the --path option of xmllint and xsltproc.

Should fix #733.
2024-06-14 20:06:07 +02:00
Nick Wellnhofer
669bd34993 xpointer: Remove support for XPointer locations
The latest spec for what it essentially an XPath extension seems to be
this working draft from 2002:

    https://www.w3.org/TR/xptr-xpointer/

The xpointer() scheme is listed as "being reviewed" in the XPointer
registry since at least 2006. libxml2 seems to be the only modern
software that tries to implement this spec, but the code has many bugs
and quality issues.

If you configure --with-legacy, old symbols are retained for ABI
compatibility.
2024-06-12 18:20:01 +02:00
Nick Wellnhofer
4fefba4cf6 parser: Rework handling of undeclared entities
Throw an error if entity substitution was requested.

Now we only downgrade to a warning if

- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.

Should fix #724.
2024-05-15 17:58:48 +02:00
Nick Wellnhofer
fdc5ff3657 parser: Always throw entity errors if external DTD is loaded
When parsing with XML_PARSE_DTDLOAD, missing entities are always an
error.

Also consolidate behavior when validating. See b717abdd.
2024-05-03 11:52:54 +02:00
Nick Wellnhofer
39e5b35bd0 parser: Don't create undeclared entity refs in substitution mode
We never want to create entity reference nodes if entity substitution
is enabled. This also applies to undeclared entities.
2024-05-03 11:46:01 +02:00
Nick Wellnhofer
45fe9924f0 parser: Don't create reference in xmlLookupGeneralEntity
This should only be done in xmlParseReference.

The handling of undeclared entities is still somewhat inconsistent. In
element content we create references even if entity substitution is
enabled. In attribute values undeclared entities are always ignored.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
b717abdd09 parser: Consolidate error handling for undeclared entities
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
f506ec6654 parser: Always decode entities in namespace URIs
Also decode entities in namespace URIs if entity substitution wasn't
requested. This should fix some corner cases when comparing namespace
URIs. The Namespaces in XML 1.0 spec says:

> In a namespace declaration, the URI reference is the normalized value
> of the attribute, so replacement of XML character and entity
> references has already been done before any comparison.

Make the serialization code escape special characters in namespace URIs
like in attribute values. This fixes serialization if entities were
substituted when parsing.

Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
2024-04-15 12:34:26 +02:00
Seiya Nakata
5bb84b47b8 relaxng: Fix tree corruption in xmlRelaxNGParseNameClass
Don't create cycles in tree structure. This will lead to an infinite
loop or call stack overflow later.

Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711
2024-04-05 13:45:06 +02:00
Nick Wellnhofer
f43197fca7 tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling
Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and
xmlAddNextSibling would only try to merge text nodes with one of its
new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml
and possibly other downstream code depend on text nodes not being
merged.

To avoid breaking downstream code while still having somewhat
consistent API behavior, it's probably best to make these functions
never coalesce text nodes.
2024-03-29 14:21:11 +01:00
Nick Wellnhofer
4ccd3eb80f tree: Refactor node insertion
Also fixes a text coalescing bug.
2024-03-15 19:54:26 +01:00
Nick Wellnhofer
186562a182 parser: Fix detection of duplicate attributes in XML namespace
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.

Fixes #704.
2024-03-12 20:02:52 +01:00
Nick Wellnhofer
63986c45b9 parser: Report fatal error if document entity couldn't be loaded
Only lower error level when loading entities.

Fixes #667.
2024-01-22 21:07:41 +01:00
Nick Wellnhofer
29beef653c parser: Pop inputs if parsing DTD failed
This should provide some statistics in ctxt->sizeentcopy even in the
error or recovery case.
2024-01-10 15:58:23 +01:00
Nick Wellnhofer
f237e5b934 parser: Avoid duplicate namespace errors
Don't report an extra attribute uniqueness error if a namespace is
undeclared. This matches old behavior.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
07c05546fa error: Make xmlFormatError public
This is a useful function to get a verbose error report.

Allows to remove duplicated code from runtest.c. Also reactivate check
for schema parser failures.
2024-01-04 15:41:43 +01:00
Nick Wellnhofer
d0eb5a7e54 parser: Remove xmlErrEncodingInt
Convert the last user to xmlFatalErr.
2024-01-04 15:28:57 +01:00
Nick Wellnhofer
e8fb3d639f parser: Convert some "internal errors" to meaningful codes 2024-01-02 19:48:23 +01:00
Nick Wellnhofer
37c6618be5 parser: Rework parsing of attribute and entity values
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.

Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.

Normalize attribute values in a single pass while expanding entities.

Be more lenient in recovery mode.

If no entity substitution was requested, validate entities without
expanding. Fixes #596.

Also fixes #655.
2024-01-02 15:42:03 +01:00
Nick Wellnhofer
f0dc52d09c parser: Move cleanup of element stacks to xmlParseContent 2024-01-02 14:17:27 +01:00
Nick Wellnhofer
d025cfbb4b parser: Always copy content from entity to target.
Make sure that references from IDs are updated.

Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
2023-12-29 01:22:11 +01:00
Nick Wellnhofer
4ecc85d2cb parser: Push general entity input streams on the stack
This allows the error handler to give more context.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
d944a41515 parser: Fix in-parameter-entity and in-external-dtd checks
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.

Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
2023-12-29 01:19:56 +01:00
Nick Wellnhofer
b8313b589f xpath: Rewrite substring-before and substring-after
Don't use buffers. Check malloc failures.
2023-12-28 16:47:45 +01:00
Nick Wellnhofer
f3fa34dcad parser: Fix general entity parsing
Clear namespace database.

Ignore non-fatal errors.
2023-12-28 16:47:41 +01:00
Nick Wellnhofer
ecfbcc8a52 parser: Rework general entity parsing
Don't create a new parser context but reuse the existing one.

This exposes bug #601 in a more obvious way.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
6e3a2ac660 xinclude: Rework xml:base fixup
The xml:base fixup was broken in more complex cases.

Also avoid parsing and building the included URI multiple times.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
f0df3e6d00 tests: Try to fix RelaxNG test cases
These were added recently in ea695ac0 and 8074b881 but were a total mess
of symbolic links and apparently mixed up files.

Symbolic links don't work on Windows.

Try to salvage one of the tests.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
8d0aaf4b95 parser: Remove xmlErrEncoding
Use xmlFatalErr or xmlCtxtErrIO.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
7e511f35f1 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile
This allows to report the reason why opening a file failed to the parser
context and improve error messages. Now we can also remove the stat call
before opening a file.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
83c6aeef49 relaxng: Improve error handling
Pass RelaxNG structured error handler to XML parser.

Handle malloc failure from xmlRaiseError.

Remove argument from memory error handler.

Use xmlRaiseMemoryError.

Don't use xmlGenericError.

Remove TODO macro.
2023-12-21 15:01:42 +01:00