1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 12:25:09 +03:00
Commit Graph

7012 Commits

Author SHA1 Message Date
Nick Wellnhofer
9f04cce695 html: Remove unused or useless return codes
htmlParseStartTag should always succeed (except for malloc failures).
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
e179f3ec0e html: Stop reporting syntax errors
It doesn't make much sense to keep the old syntax error handling which
doesn't conform to HTML5.

Handling HTML5 parser errors is rather involved and not essential for
parsers.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c6af101728 html: Test tokenizer against html5lib test suite 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
27752f75ca html: Fix EOF handling in start tags 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
b19d353970 html: Fix EOF handling in comments 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
17e56ac54a html: Fix parsing of end tags 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
24a09033c9 html: Fix bogus end tags 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
bca6485476 html: Allow U+000C FORM FEED as whitespace 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
6edf1a645e html: Fix DOCTYPE parsing 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
9678163f54 html: Don't check for valid XML characters 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
a6955c13c7 html: Parse numeric character references according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
4eeac30944 html: Start to fix EOF and U+0000 handling 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
e062a4a9b3 html: Add HTML5 parser option
This option passes tokenizer output directly to the SAX callbacks,
making it possible to test the tokenizer against the html5lib test
suite.

This will produce unbalanced calls to the startElement and endElement
callbacks, but it's the only way to support a SAX like interface for
HTML5. It can be used for filtering or rewriting HTML5, for example.

A HTML5 tree builder could then be implemented on top of the SAX
callbacks.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
17da54c522 html: Normalize newlines 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
341dc78f24 html: Deduplicate code in htmlCurrentChar 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
3adb396d87 html: Parse bogus comments instead of ignoring them
Also treat XML processing instructions as bogus comments.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
8444017578 html: Add missing calls to htmlCheckParagraph() 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
86d6b9b051 html: Deduplicate some code 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
0d324bde36 html: Simplify node info accounting 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
ccb61f599e html: Remove duplicate calls to htmlAutoClose 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
e1834745e0 html: Add character data tests 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
f9ed30e972 html: HTML5 character data states 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
5951179239 html: Parse named character references according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
d5cd0f07f8 html: Prefer SKIP(1) over NEXT in HTML parser
Use SKIP(1) where it's safe to avoid a function call.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
dc2d498318 html: Rework htmlLookupSequence
Rename to htmlLookupString and use strstr for increased performance.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
637215a4de html: Always terminate doctype declarations on '>'
Align with HTML5 spec. This allows to remove the old quote handling in
htmlLookupSequence.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
72e29f9a3d html: Fix quadratic behavior in push parser
Fix quadratic behavior related to unquoted attribute values. We really
have to replicate parts of the HTML5 state machine to find the end of
tags relibably.

Fixes #533.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
a80f8b64a9 html: Allow attributes in end tags
Attribute are syntactically allowed in HTML5 end tags but otherwise
ignored.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
f2272c231b html: Handle unexpected-solidus-in-tag according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
939b53ee12 html: Stop skipping tag content
Tag and attributes names should always be parsed succesfully now.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
dcb2abb2fe html: Parse tag and attribute names according to HTML5
HTML5 allows bascially all characters in tag and attribute names.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
d67833a3c5 xmllint: Use proper type to store seconds since epoch
Should avoid year 2038 problem.

Fixes #801.
2024-09-26 19:34:34 +02:00
correctmost
81d38ed069 meson: Fix duplicate listing of libxml2.devhelp2
The duplication caused a warning when uninstalling.
2024-09-25 07:52:10 -04:00
Nick Wellnhofer
b1c5aa6544 xpath: Deprecate xmlXPathNAN and xmlXPath*INF
Users should simply use the C99 macros.
2024-09-19 12:50:59 +02:00
Nick Wellnhofer
55ddccb645 io: Make sure not to pass partial UTF-8 to write callback
We cannot split UTF-8 at arbitrary boundaries.
2024-09-14 00:05:13 +02:00
Nick Wellnhofer
c46b89e243 xpath: Deprecate xmlXPathEvalExpr
Also check the argument instead of crashing if there's no context.
2024-09-13 21:06:36 +02:00
Nick Wellnhofer
03f1bdd260 xpath: Make recursion check work with xmlXPathCompile
The check for maximum recursion depth required a parser context with an
xmlXPathContext which xmlXPathCompile didn't provide.

All other code should already set up or require an xmlXPathContext.
2024-09-13 20:59:47 +02:00
Nick Wellnhofer
dae160c64b encoding: Fix table entry for "UTF16" 2024-09-13 12:08:20 +02:00
Nick Wellnhofer
5e7874015e save: Make xmlEscapeTab signed
Fixes issues in platforms where char is unsigned.

Fixes #797.
2024-09-10 17:50:08 +02:00
Nick Wellnhofer
6e503eb742 encoding: Handle more ICU error codes
U_ILLEGAL_ESCAPE_SEQUENCE and U_UNSUPPORTED_ESCAPE_SEQUENCE can occur
with ISO-2022.
2024-09-10 03:34:46 +02:00
Nick Wellnhofer
55d36c5990 encoding: Fix error code in xmlUconvConvert
Broke in 46ec621e.
2024-09-10 03:11:18 +02:00
Nick Wellnhofer
de10d4cd5f include: Check whether _MSC_VER is defined
Should fix #795.
2024-09-04 16:32:22 +02:00
Nick Wellnhofer
bd9eed4694 parser: Make unsupported encodings an error in declarations
This was changed in 45157261, but in encoding declarations, unsupported
encodings should raise a fatal error.

Fixes #794.
2024-09-02 19:29:39 +02:00
Nick Wellnhofer
40abebbc73 python: Fix SAX driver with character streams
This apparently broke with Python 3.5 which introduced character
streams.

Fixes #790.
2024-08-29 01:31:26 +02:00
Nick Wellnhofer
8ae06d5223 SAX2: Don't merge CDATA sections
The Document Object Model (DOM) Level 3 Core Specification says:

> Adjacent CDATASection nodes are not merged by use of the normalize
> method of the Node interface.

Fixes #412.
2024-08-29 01:31:19 +02:00
Nick Wellnhofer
dde62ae5d5 parser: Align push parsing of CDATA sections with pull parser
Remove special handling of CDATA sections in push parser. This makes
sure that only a single callback is generated for large sections.

Fixes #22 and needed for #412.
2024-08-29 01:28:49 +02:00
Nick Wellnhofer
4d10e53af1 parser: Make sure to set and increment input id
Revert part of commits 410931e3 and b9d2f3c9.
2024-08-28 22:47:20 +02:00
Nick Wellnhofer
6d365ca02c doc: XML_PARSE_NO_XXE is available since 2.13.0 2024-08-28 22:09:30 +02:00
Nick Wellnhofer
8ad618d2d6 doc: Document all xmllint options
Remove --pushsmall.

Fixes #785.
2024-08-28 22:03:30 +02:00
triallax
67ff748c3e
io: don't set the executable bit when creating files
Issue seems to have been introduced in
0bef93bf24.
2024-08-26 23:53:29 +01:00