IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
In commit 21ca8829, we started to ignore namespaces in HTML element
names but we still called xmlSplitQName, effectively stripping the
namespace prefix. This would cause elements like <o:p> being parsed
as <p>. Now we leave the name untouched.
Fixes#508.
Commit 4fd69f3e fixed handling of '<' characters not followed by an
ASCII letter. But a '<!' sequence followed by invalid characters should
be treated as bogus comment and skipped.
Fixes#380.
Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.
Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.
Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.
Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.
Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.
Fix usage of IS_CHAR macro in htmlParseScript.
Commit eeb99329 removed an important optimization avoiding quadratic
runtime when repeatedly scanning the input buffer for terminating
characters in the HTML push parser. The related bug is
https://bugzilla.gnome.org/show_bug.cgi?id=444994
Make sure that ctxt->checkIndex is always written and store additional
parser state in ctxt->inSubset which is unused in the HTML parser.
Found by OSS-Fuzz.
test/HTML/758518-entity.html exposed a bug in pushParseTest() in
runtest.c which assumed that an input file was at least 4 bytes long.
That test case is only 3 bytes, so we now take the minimum of 4 bytes
or the length of the test input. We also now use 'chunkSize' in place
of the hard-coded value '1024' later in the function.
For https://bugzilla.gnome.org/show_bug.cgi?id=758606
* parserInternals.c:
(xmlNextChar): Add an test to catch other issues on ctxt->input
corruption proactively.
For non-UTF-8 charsets, xmlNextChar() failed to check for the end
of the input buffer and would continuing reading. Fix this by
pulling out the check for the end of the input buffer into common
code, and return if we reach the end of the input buffer
prematurely.
* result/HTML/758606.html: Added.
* result/HTML/758606.html.err: Added.
* result/HTML/758606.html.sax: Added.
* result/HTML/758606_2.html: Added.
* result/HTML/758606_2.html.err: Added.
* result/HTML/758606_2.html.sax: Added.
* test/HTML/758606.html: Added test case.
* test/HTML/758606_2.html: Added test case.
From https://bugzilla.gnome.org/show_bug.cgi?id=758518
Happens when a file has a name getting parsed, but no valid encoding
set, so libxml has to guess what the encoding is. This patch detects
when the buffer location changes, and if it does, restarts the parsing
of the name.
This slightly change a couple of regression tests output
Reviewed by David Kilzer.
* HTMLparser.c:
(htmlParseName): Add bounds check.
(htmlParseNameComplex): Ditto.
* result/HTML/758605.html: Added.
* result/HTML/758605.html.err: Added.
* result/HTML/758605.html.sax: Added.
* runtest.c:
(pushParseTest): The input for the new test case was so small
(4 bytes) that htmlParseChunk() was never called after
htmlCreatePushParserCtxt(), thereby creating a false positive
test failure. Fixed by using a do-while loop so we always call
htmlParseChunk() at least once.
* test/HTML/758605.html: Added.
For https://bugzilla.gnome.org/show_bug.cgi?id=681822
Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
are removed from a HTML document, for example:
<html>
<head>
<title>This is a test.</title>
</head>
<body>
<p>This is a test.</p>
</body>
</html>
is read as:
<html><head><title>This is a test.</title></head><body>
<p>This is a test.</p>
</body></html>
This changes the default behaviour but the old behaviour is available
as expected when using the parser flag HTML_PARSE_NOBLANKS
Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>
* HTMLparser.c: change various places in the parser where ignorable_space
SAX callback was called without checking for the parser flag preference
* xmllint.c: make sure we use the new flag even for HTML parsing
* result/HTML/*: this modifies the output of a number of tests
For https://bugzilla.gnome.org/show_bug.cgi?id=615785
When the <noscript> is found, <head> is closed and a <body> element is created.
The real <body id="xxx"> gets skipped over, so I can't see any of the
body's attributes.
Just don't close <head> when encountering a <noscript>
Add a regression test too
For https://bugzilla.gnome.org/show_bug.cgi?id=655218http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element
"""
The charset attribute specifies the character encoding used by the document.
This is a character encoding declaration. If the attribute is present in an XML
document, its value must be an ASCII case-insensitive match for the string
"UTF-8" (and the document is therefore forced to use UTF-8 as its
encoding).
"""
However, while <meta http-equiv="Content-Type" content="text/html;
charset=utf8"> works, <meta charset="utf8"> does not.
While libxml2 HTML parser is not tuned for HTML5, this is a simple
addition
Also added a testcase
* HTMLparser.c: don't default value of HTML boolean attributes in the
parser
* SAX2.c: move this to SAX2 tree building backend
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
result/HTML/wired.html.sax: this changes a few HTML SAX regression
tests
* HTMLparser.c: fix an HTML parsing error on large data sections
reported by Mike Day
* test/HTML/utf8bug.html result/HTML/utf8bug.html.err
result/HTML/utf8bug.html.sax result/HTML/utf8bug.html: add the
reproducer to the test suite
daniel
svn path=/trunk/; revision=3797
* HTMLparser.c: change the way script/style are parsed to
not try to detect comments, reported by Mike Day
* result/HTML/doc3.*: affects the result of that test
Daniel
svn path=/trunk/; revision=3598
* HTMLparser.c: fixing HTML minimized attribute values to be generated
internally if not present, fixes bug #332124
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
result/HTML/wired.html.sax: this affects the SAX event strem for
a few test cases
Daniel
* HTMLparser.c: fixing HTML entities in attributes parsing bug #362552
* result/HTML/entities2.html* test/HTML/entities2.html: added to
the regression suite
Daniel
* HTMLparser.c: script HTML parser error fix, corrects bug #319715
* result/HTML/53867* test/HTML/53867.html: added test from Michael Day
to the regression suite
Daniel
* HTMLparser.c: Applied the last patch from Gary Coady for #304637
changing the behaviour when text nodes are found in body
* result/HTML/*: this changes the output of some tests
Daniel
* HTMLtree.c: fixed bug #310333 with a patch close to the provided
patch for HTML UTF-8 serialization
* result/HTML/script2.html: this changed the output of that test
Daniel
* HTMLparser.c: applied UTF-8 script parsing bug #310229 fix from
Jiri Netolicky
* result/HTML/script2.html* test/HTML/script2.html: added the test
case from the regression suite
Daniel
* HTMLparser.c: applied patch from James Bursa fixing an html parsing
bug in push mode
* result/HTML/repeat.html* test/HTML/repeat.html: added the test to the
regression suite
Daniel
* xmlIO.c: fix to the fix for #141864 from Paul Elseth
* HTMLparser.c result/HTML/doc3.htm: apply fix from David Gatwood for
#141195 about text between comments.
Daniel
* xmllint.c: change --html to make sure we use the HTML serialization
rule by default when HTML parser is used, add --xmlout to allow to
force the XML serializer on HTML.
* HTMLtree.c: ugly tweak to fix the output on <p> element and
solve #125093
* result/HTML/*: this changes the output of some tests
Daniel
* HTMLparser.c: Fix#124907 by simply backporting the same
fix as for the XML parser
* result/HTML/doc3.htm.err: change to ID detecting modified one
test result.
Daniel
* HTMLparser.c: fixed to not send NULL to %s printing
* python/tests/error.py result/HTML/doc3.htm.err
result/HTML/test3.html.err result/HTML/wired.html.err
result/valid/t8.xml.err result/valid/t8a.xml.err: cleaning
up some of the regression tests error
Daniel
* HTMLparser.c Makefile.am legacy.c parser.c parserInternals.c
include/libxml/xmlerror.h: more code cleanup, especially around
error messages, the HTML parser has now been upgraded to the new
handling.
* result/HTML/*: a few changes in the resulting error messages
Daniel
* Makefile.am results/*.sax SAXResult/*: removing the SAXresults
tree, keeping result in the same tree, added SAXtests to the
default "make tests"
Daniel
* tree.c: fixing HTML attribute serialization bug #118763
applying a modified version of the patch from Bacek
* result/HTML/doc3.htm*: this modifies the output from one test
Daniel
* HTMLparser.c parser.c parserInternals.c: patch from
johan@evenhuis.nl for #107937 fixing some line counting
problems, and some other cleanups.
* result/HTML/: this result in some line number changes
Daniel
* HTMLparser.c: final touch at closing #87235 </p> end tags
need to be generated.
* result/HTML/cf_128.html result/HTML/test2.html result/HTML/test3.html:
this change slightly the output of a few tests
* doc/*: regenerated
Daniel
* HTMLparser.c: Mikhail Sogrine pointed out a bug in HTML
parsing, applied his patch
* result/HTML/attrents.html result/HTML/attrents.html.err
result/HTML/attrents.html.sax test/HTML/attrents.html:
added the test and result case provided by Mikhail Sogrine
Daniel
* HTMLparser.c: trying to fix 87235 about discarded white
spaces in the HTML parser.
* result/HTML/*: this changes the output of a number of HTML
regression tests
Daniel