1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-02-21 17:57:22 +03:00

parser: Document that XML_PARSE_NOBLANKS is broken

Long text content can generate multiple "characters" callbacks which can
lead to NOBLANKS removing whitespace in non-whitespace text nodes. So
the NOBLANKS option doesn't even work reliably with the pull parser.
This would be extremely hard to fix.

Unfortunately, `xmllint --format` relies on this option which is another
reason why this feature never really worked.
This commit is contained in:
Nick Wellnhofer 2025-01-31 14:55:29 +01:00
parent 40e423d6c2
commit 7a8722f557
2 changed files with 12 additions and 5 deletions

View File

@ -283,6 +283,10 @@
environment variable controls the indentation. The default value is two
spaces " ").
</para>
<para>
Especially in the absence of a DTD, this feature has never worked reliably
and is fundamentally broken.
</para>
</listitem>
</varlistentry>

View File

@ -4914,6 +4914,11 @@ get_more_space:
(ctxt->disableSAX == 0) &&
(ctxt->sax->ignorableWhitespace !=
ctxt->sax->characters)) {
/*
* Calling areBlanks with only parts of a text node
* is fundamentally broken, making the NOBLANKS option
* essentially unusable.
*/
if (areBlanks(ctxt, tmp, nbchar, 1)) {
if (ctxt->sax->ignorableWhitespace != NULL)
ctxt->sax->ignorableWhitespace(ctxt->userData,
@ -13715,11 +13720,9 @@ xmlCtxtSetOptionsInternal(xmlParserCtxtPtr ctxt, int options, int keepMask)
*
* XML_PARSE_NOBLANKS
*
* Remove some text nodes containing only whitespace from the
* result document. Which nodes are removed depends on DTD
* element declarations or a conservative heuristic. The
* reindenting feature of the serialization code relies on this
* option to be set when parsing. Use of this option is
* Remove some whitespace from the result document. Where to
* remove whitespace depends on DTD element declarations or a
* broken heuristic with unfixable bugs. Use of this option is
* DISCOURAGED.
*
* Not supported by the push parser.