1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 03:55:04 +03:00
Commit Graph

7043 Commits

Author SHA1 Message Date
Nick Wellnhofer
b52a3044aa parser: Use counted_by attribute if supported
We only have a single struct with a flexible array member.
2024-10-24 18:18:47 +02:00
Nick Wellnhofer
944e5fe8df nanohttp: Fix another stdout file descriptor 2024-10-23 16:46:03 +02:00
Nick Wellnhofer
607ada90b8 nanohttp: Fix stdout file descriptor
Fixes #813.
2024-10-23 14:19:01 +02:00
Nick Wellnhofer
b7c0f9d2dd string: Fix va_copy fallback
Fix va_copy fallback reworked in 5cffba83.

Should fix #812.
2024-10-19 14:53:25 +02:00
Nick Wellnhofer
a870088f94 xpath: Hide internal sort functions 2024-10-19 14:53:25 +02:00
Yegor Yefremov
513949293d python/tests: fix typos
Typos were found with codespell.
2024-10-15 11:11:38 +02:00
Nick Wellnhofer
f9a6469a47 Update NEWS 2024-10-14 16:15:11 +02:00
Satadru Pramanik
c7b2786676 Avoid Python 'licence' distribution option is deprecated; use 'license' error 2024-10-12 11:55:50 +00:00
Nick Wellnhofer
bf3619c328 fuzz: Don't unlink DTD when replacing nodes
OP_XML_REPLACE_NODE needs the same check as OP_XML_UNLINK_NODE.
2024-10-10 12:14:47 +02:00
Nick Wellnhofer
a4c16a140c xmllint: Improve --memory and --testIO options
Support --memory and --testIO in SAX mode.

Keep memory-mapped file across repetitions.

Options `--sax --memory --noout --repeat` can now be used to benchmark
the core parser without building a DOM tree or repeatedly reading files
from disk.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
3ac214f01e xmllint: Support --html --sax 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
225ed70737 html: Accelerate htmlParseCharData 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
74dfc49b5f parser: Clarify logic in xmlParseStartTag2 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
207999793f html: Handle numeric character references directly 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0bc4608c50 html: Use hash table to check for duplicate attributes 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
24a6149fc4 html: Make sure that character data mode is reset 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c32397d51f html: Improve character class macros 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
e840655414 html: Rewrite parsing of most data 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
f77ec16db0 html: Optimize htmlParseCharData 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
440bd64c69 html: Optimize htmlParseHTMLName 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c34d0ae9cc html: Deprecate htmlIsBooleanAttr 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
6040785ac4 html: Deprecate AutoClose API 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
188cad68a4 html: Remove obsolete content model 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0144f662d7 html: Remove obsolete code 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0ce7bfe559 html: Try to avoid passing XML options to HTML parser 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
76cc63942a test: Fix XML_PARSE_HTML constant 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
575be6c1f1 html: Fix line numbers with CRs 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
be874d7831 html: Ignore unexpected DOCTYPE declarations 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
462bf0b7a5 html: Rework options
Introduce htmlCtxtSetOptions, see similar changes made to XML parser.

Add HTML_PARSE_HUGE alias. Support HTML_PARSE_BIG_LINES.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
16de1346eb parser: Make new options actually work 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
42c3823df0 html: Update comment 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
9f04cce695 html: Remove unused or useless return codes
htmlParseStartTag should always succeed (except for malloc failures).
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
e179f3ec0e html: Stop reporting syntax errors
It doesn't make much sense to keep the old syntax error handling which
doesn't conform to HTML5.

Handling HTML5 parser errors is rather involved and not essential for
parsers.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c6af101728 html: Test tokenizer against html5lib test suite 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
27752f75ca html: Fix EOF handling in start tags 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
b19d353970 html: Fix EOF handling in comments 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
17e56ac54a html: Fix parsing of end tags 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
24a09033c9 html: Fix bogus end tags 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
bca6485476 html: Allow U+000C FORM FEED as whitespace 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
6edf1a645e html: Fix DOCTYPE parsing 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
9678163f54 html: Don't check for valid XML characters 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
a6955c13c7 html: Parse numeric character references according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
4eeac30944 html: Start to fix EOF and U+0000 handling 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
e062a4a9b3 html: Add HTML5 parser option
This option passes tokenizer output directly to the SAX callbacks,
making it possible to test the tokenizer against the html5lib test
suite.

This will produce unbalanced calls to the startElement and endElement
callbacks, but it's the only way to support a SAX like interface for
HTML5. It can be used for filtering or rewriting HTML5, for example.

A HTML5 tree builder could then be implemented on top of the SAX
callbacks.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
17da54c522 html: Normalize newlines 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
341dc78f24 html: Deduplicate code in htmlCurrentChar 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
3adb396d87 html: Parse bogus comments instead of ignoring them
Also treat XML processing instructions as bogus comments.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
8444017578 html: Add missing calls to htmlCheckParagraph() 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
86d6b9b051 html: Deduplicate some code 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
0d324bde36 html: Simplify node info accounting 2024-10-06 18:13:05 +02:00