1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-01-29 21:46:59 +03:00

26 Commits

Author SHA1 Message Date
Nick Wellnhofer
76d6b0d768 html: Don't escape ASCII chars in href attributes
In several cases, href attributes can contain ASCII characters which are
illegal in URIs. Escaping them often does more harm than good.

Fixes #321.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
93ce33c2b8 Fix several quadratic runtime issues in HTML push parser
Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.

Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.

Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.

Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.

Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.

Fix usage of IS_CHAR macro in htmlParseScript.
2020-07-23 20:47:35 +02:00
Daniel Veillard
f933c89813 Keep non-significant blanks node in HTML parser
For https://bugzilla.gnome.org/show_bug.cgi?id=681822

Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
are removed from a HTML document, for example:

<html>
  <head>
    <title>This is a test.</title>
  </head>
  <body>
    <p>This is a test.</p>
  </body>
</html>

is read as:

<html><head><title>This is a test.</title></head><body>
    <p>This is a test.</p>
  </body></html>

This changes the default behaviour but the old behaviour is available
as expected when using the parser flag HTML_PARSE_NOBLANKS

Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>

* HTMLparser.c: change various places in the parser where ignorable_space
  SAX callback was called without checking for the parser flag preference
* xmllint.c: make sure we use the new flag even for HTML parsing
* result/HTML/*: this modifies the output of a number of tests
2012-09-07 19:32:12 +08:00
Daniel Veillard
36d73403ff Applied the last patch from Gary Coady for #304637 changing the behaviour
* HTMLparser.c: Applied the last patch from Gary Coady for #304637
  changing the behaviour when text nodes are found in body
* result/HTML/*: this changes the output of some tests
Daniel
2005-09-01 09:52:30 +00:00
Daniel Veillard
42fd412637 change --html to make sure we use the HTML serialization rule by default
* xmllint.c: change --html to make sure we use the HTML serialization
  rule by default when HTML parser is used, add --xmlout to allow to
  force the XML serializer on HTML.
* HTMLtree.c: ugly tweak to fix the output on <p> element and
  solve #125093
* result/HTML/*: this changes the output of some tests
Daniel
2003-11-04 08:47:48 +00:00
Daniel Veillard
4b1577f14a removing the SAXresults tree, keeping result in the same tree, added
* Makefile.am results/*.sax SAXResult/*: removing the SAXresults
  tree, keeping result in the same tree, added SAXtests to the
  default "make tests"
Daniel
2003-09-03 13:10:37 +00:00
Daniel Veillard
8265a18a6a do not generate &quot; for " outside of attributes this changes the output
* entities.c: do not generate &quot; for " outside of attributes
* result//*: this changes the output of some tests
Daniel
2003-06-13 10:05:56 +00:00
Daniel Veillard
ef0b450163 fixed some problems related to #75813 about handling of Result Value Trees
* xpath.c: fixed some problems related to #75813 about handling
  of Result Value Trees
Daniel
2003-03-24 13:57:34 +00:00
Daniel Veillard
8c9872ca2e trying to fix 87235 about discarded white spaces in the HTML parser. this
* HTMLparser.c: trying to fix 87235 about discarded white
  spaces in the HTML parser.
* result/HTML/*: this changes the output of a number of HTML
  regression tests
Daniel
2002-07-05 18:17:10 +00:00
Daniel Veillard
6231e84559 fixed & serialization bug introduced in 2.4.20 this changes a few things
* HTMLtree.c: fixed & serialization bug introduced in 2.4.20
* result/HTML/*: this changes a few things in the results
Daniel
2002-04-18 11:54:04 +00:00
Daniel Veillard
eb475a37df fixing bug #78662 i.e. add proper escaping of URI when saving HTML files.
* HTMLtree.c uri.c: fixing bug #78662 i.e. add proper
  escaping of URI when saving HTML files.
* result/HTML/*: this impacted some tests
Daniel
2002-04-14 22:00:22 +00:00
Daniel Veillard
02bb170a8b - HTMLparser.[ch] HTMLtree.c: stored the inline/block property
of element and use it to avoid outputting formatting spaces at
  the wrong place. Implemented the format parameter for HTML save.
- result/HTML/doc2.htm result/HTML/doc3.htm result/HTML/fp40.htm
  result/HTML/script.html result/HTML/test2.html result/HTML/test3.html
  result/HTML/wired.html: of course this impact the result of a
  number of HTML tests
Daniel
2001-06-13 21:11:59 +00:00
Daniel Veillard
0a2a163d2e - HTMLparser.c: Patch from Jonas Borgstrm
(htmlGetEndPriority): New function, returns
the priority of a certain element.
(htmlAutoCloseOnClose): Only close inline elements if they
all have lower or equal priority.
- result/HTML: this of course changed a number of tests results.
Daniel
2001-05-11 14:18:03 +00:00
Daniel Veillard
56098d4f35 - HTMLparser.c : HTML parsing still sucks ... trying to deal
with madness
- result/HTML/ : this modified the result of the regression tests
  a lot.
Daniel
2001-04-24 12:51:09 +00:00
Daniel Veillard
a3bfca59bf parsing real HTML is a nightmare.
- HTMLparser.c result/HTML/*: revamped the way the HTML
  parser handles end of tags or end of input
Daniel
2001-04-12 15:42:58 +00:00
Daniel Veillard
760f4426f7 Couple of fixes, getting ready for 2.3.1:
- configure.in: applied patch from Daniel van Balen for OpenBSD
  and bumped version to 2.3.1
- HTMLtree.c result/HTML/doc3.htm result/HTML/wired.html: the
  attempt to find autoclosing was simply broken, removed it,
  updated the examples, this is better
Daniel
2001-02-15 14:59:48 +00:00
Daniel Veillard
f41fbbf6a9 testing and bug fixing related to XSLT:
- xpath.c result/XPath/tests/chaptersprefol: bugfixes on order and
  on predicate
- HTMLparser.[ch] HTMLtree.c result/HTML/doc3.htm.err
  result/HTML/doc3.htm.sax result/HTML/wired.html: sometimes one
  really want to have tags closed on output even if we accept
  unclosed ones on input
Daniel
2001-02-13 17:05:35 +00:00
Daniel Veillard
126f27992d Bunch of fixes, finishing moving datastructures to the hash stuff:
- hash.[ch] debugXML.c: expanded/enhanced the API, added
  multikey tuples, made hash structure opaque
- valid.[ch]: moved elements, attributes, notations decalarations
  as well as ID and refs to hash tables.
- entities.c: hash cleanup
- xmlmemory.c: fixed a dump problem in debug mode
- include/Makefile.am: problem passing in DESTDIR= values patch
  from Marc Christensen <marc@calderasystems.com>
- nanohttp.c: removed debugging remains
- HTMLparser.c: the bogus tag should be ignored (Wayne)
- HTMLparser.c parser.c: fixing a number of problems with the
  macros in the *parser.c files (Wayne).
- HTMLparser.c: close the previous option when opening a new one
  (Marc Sanfacon).
- result/HTML/*: updated the HTML results accordingly
Daniel
2000-10-24 17:10:12 +00:00
Daniel Veillard
970112a914 Stupid bug fix on the HTML parser:
- HTMLparser.c: Doohhh, attribute name parsing was still case
  sensitive ! Fixed this ...
- result/HTML/* : updated the tests results accordingly
Daniel
2000-10-03 09:33:21 +00:00
Daniel Veillard
32bc74ef98 - doc/encoding.html doc/xml.html: added I18N doc
- encoding.[ch] HTMLtree.[ch] parser.c HTMLparser.c: I18N encoding
  improvements, both parser and filters, added ASCII & HTML,
  fixed the ISO-Latin-1 one
- xmllint.c testHTML.c: added/made visible --encode
- debugXML.c : cleanup
- most .c files: applied patches due to warning on Windows and
  when using Sun Pro cc compiler
- xpath.c : cleanup memleaks
- nanoftp.c : added a TESTING preprocessor flag for standalong
  compile so that people can report bugs more easilly
- nanohttp.c : ditched socklen_t which was a portability mess
  and replaced it with unsigned int.
- tree.[ch]: added xmlHasProp()
- TODO: updated
- test/ : added more test for entities, NS, encoding, HTML, wap
- configure.in: preparing for 2.2.0 release
Daniel
2000-07-14 14:49:25 +00:00
Daniel Veillard
be803967db - Large resync between W3C and Gnome tree
- configure.in: 2.1.0 prerelease
- example/Makefile.am example/gjobread.c tree.h: work on
  libxml1 libxml2 convergence.
- nanoftp, nanohttp.c: fixed stalled connections probs
- HTMLtree.c SAX.c : support for attribute without values in
  HTML for andersca
- valid.c: Fixed most validation + namespace problems
- HTMLparser.c: start document callback for andersca
- debugXML.c xpath.c: lots of XPath fixups from Picdar Technology
- parser.h, SAX.c: serious speed improvement for large
  CDATA blocks
- encoding.[ch] xmlIO.[ch]: Improved seriously saving to
  different encoding
- config.h.in parser.c xmllint.c: added xmlCheckVersion()
  and the LIBXML_TEST_VERSION macro
Daniel
2000-06-28 23:40:59 +00:00
Daniel Veillard
71b656e067 - added xmlRemoveID() and xmlRemoveRef()
- added check and handling when possibly removing an ID
- fixed some entities problems
- added xmlParseTryOrFinish()
- changed the way struct aredeclared to allow gtk-doc to expose those
- closed #4960
- fixes to libs detection from Albert Chin-A-Young
- preparing 1.8.3 release
Daniel
2000-01-05 14:46:17 +00:00
Daniel Veillard
5cb5ab8d94 - release 1.8.2 - HTML handling improvement - new tree handling functions
- release 1.8.2
- HTML handling improvement
- new tree handling functions
- default namespace on attribute bug fixed
- libxml use for C++ fixed (for good this time !)
Daniel
1999-12-21 15:35:29 +00:00
Daniel Veillard
4a53eca27c - Updated HTML test outputs
- Fixed taht f....g problem with C++ and includes,
Daniel
1999-12-12 13:03:50 +00:00
Daniel Veillard
af78a0e1b9 Large commit of changes done while travelling to XML'99
- cleanups on memory use and parsers
- start of Link interfaces HTML and XLink
- rebuild the doc
- released as 1.8.0
Daniel
1999-12-12 13:03:50 +00:00
Daniel Veillard
3500838f65 BUG FIXED #2784 HTML parsing/output improvements Rebuilt, updated the docs
BUG FIXED #2784
HTML parsing/output improvements
Rebuilt, updated the docs
Improvement of regression scripts, make testall should look clean
Released as 1.7.4
1999-10-25 13:15:52 +00:00