1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-03-13 20:58:16 +03:00

4171 Commits

Author SHA1 Message Date
Conrad Irwin
7d553f834e Use buffers when constructing string node lists.
Hi Veillard and all,

Firstly, thanks for libxml: it's awesome!

I noticed recently that libxml was taking a surprisingly long time to perform some
operations (many minutes instead of milliseconds), and so I did some digging. It turns out
that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can
be called many (many) times when assigning some content into a node.

For background, I'm dealing with XML that contains emails, these can have large
attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends
with 
. This means that xmlNodeAddContentLen() is being called about 200,000 times,
and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic
example of this at https://gist.github.com/2656940)

The attached patch works around that problem by using the existing buffer API to merge the
strings together before even creating the text node, this keeps the number of realloc()s
at a managable level.

I'd love feedback on the patch, and am happy to fix problems with it, or explore other
solutions if you think that this is barking up the wrong tree :).

Thanks,

Conrad

P.S. Should I create a bug for this too?

------8<------

Before this change xmlStringGetNodeList would perform a realloc() of the
entire new content for every XML entity in the assigned text in order to
merge together adjacent text nodes. This had the effect of making
xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on
inputs that contained a large number of XML entities.

After this change the memory management is done by the buffer API,
avoiding the need to continually re-measure and realloc() the string.

For my test data (6MB of 80 character lines, each ending with &#13;)
this takes the time to xmlSetNodeContent from about 500 seconds to
around 50ms. I have not profiled smaller cases, though I tried to
minimize the performance impact of my change by avoiding unnecessary
string copying.

Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>
2012-05-14 13:51:30 +08:00
Denis Pauk
a0cd075d94 HTML parser error with <noscript> in the <head>
For https://bugzilla.gnome.org/show_bug.cgi?id=615785
When the <noscript> is found, <head> is closed and a <body> element is created.
The real <body id="xxx"> gets skipped over, so I can't see any of the
body's attributes.
Just don't close <head> when encountering a <noscript>
Add a regression test too
2012-05-11 19:31:12 +08:00
Remi Gacogne
4609e6c980 XSD: optional element in complex type extension
For https://bugzilla.gnome.org/show_bug.cgi?id=609796
Libxml2 fails to validate an instance document against a schema if an element
whose type is a complex extension of some base type with an optional child
element and that child element is not specified in the instance document.  For
example, suppose I have some complex type BaseType that is defined to have one
child element in a sequence group that has minOccurs set to 0
2012-05-11 15:31:05 +08:00
Daniel Veillard
39d027cdb7 Fix html serialization error and htmlSetMetaEncoding()
For https://bugzilla.gnome.org/show_bug.cgi?id=630682
The python tests were reporting errors, some of it was due to
a small change in case encoding, but the main one was about
htmlSetMetaEncoding(doc, NULL) being broken by not removing
the associated meta tag anymore
2012-05-11 12:38:23 +08:00
Daniel Veillard
2c437da7f0 Fix a wrong return value in previous patch 2012-05-11 12:08:15 +08:00
Daniel Veillard
ed35d3d7c3 Fix an uninitialized variable use
When compiled without SAX1 support
2012-05-11 10:52:27 +08:00
Brandon Slack
0c7109c81f Fix a compilation problem with --minimum
For https://bugzilla.gnome.org/show_bug.cgi?id=636750
Moved a #endif /* LIBXML_OUTPUT_ENABLED */ a few lines down
to avoid reference an undefined variable
2012-05-11 10:50:59 +08:00
Daniel Veillard
399aaba14b Remove redundant and ungarded include of resolv.h
For https://bugzilla.gnome.org/show_bug.cgi?id=617053
This broke the build on Interix-6.0
2012-05-11 10:09:32 +08:00
Christian Dywan
040dcb5995 Remove git error message during configure
For https://bugzilla.gnome.org/show_bug.cgi?id=635531
If git is not installed but .git was found configure would emit an
error message
2012-05-10 22:55:07 +08:00
Patrick R. Gansterer
023206fc08 xmllint: Build fix for endTimer if !defined(HAVE_GETTIMEOFDAY)
For https://bugzilla.gnome.org/show_bug.cgi?id=638649
code was broken !
2012-05-10 22:17:51 +08:00
John Hein
a4fe9b26d3 emove a bashism in confgure.in
Not portable, broke on old FreeBSD
2012-05-10 22:12:46 +08:00
Shaun McCance
4cf7325e1f xinclude with parse="text" does not use the entity loader
For https://bugzilla.gnome.org/show_bug.cgi?id=552479

The code for xinclude parse="text" was not using the registered
entity loader, defeating attempts to control loading of files.
2012-05-10 20:59:33 +08:00
Denis Pauk
fdf990c2ef Allow to parse 1 byte HTML files
For https://bugzilla.gnome.org/show_bug.cgi?id=605740

File 1 byte long were not accepted by the HTML push parser
2012-05-10 20:40:49 +08:00
Patrick R. Gansterer
204f1f144c undef ERROR if already defined 2012-05-10 20:24:00 +08:00
Martin Schröder
b91111b475 Patch that fixes the skipping of the HTML_PARSE_NOIMPLIED flag
For https://bugzilla.gnome.org/show_bug.cgi?id=642916

I just noticed that the HTML_PARSE_NOIMPLIED flag that you can pass to the
HTML-Parser methods doesn't do anything. Its intended purpose is to stop the
HTML-parser from forcibly adding a pair of html/body tags if the stream does
not contain any.

This is highly useful when you don't need this level of strictness.
Unfortunately, specifying it doesn't work, because the option is not
copied into the parsing context.
2012-05-10 18:52:37 +08:00
Lin Yi-Li
24464be639 Avoid memory leak if xmlParserInputBufferCreateIO fails
For https://bugzilla.gnome.org/show_bug.cgi?id=643949

In case of error on an IO creation input the given context
is terminated with the given close function, except if the
error happened in xmlParserInputBufferCreateIO. This can
lead to a resource leak which is fixed by this patch.
2012-05-10 16:14:55 +08:00
Denis Pauk
868d92da89 Add HTML parser support for HTML5 meta charset encoding declaration
For https://bugzilla.gnome.org/show_bug.cgi?id=655218

http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element

"""
The charset attribute specifies the character encoding used by the document.
This is a character encoding declaration. If the attribute is present in an XML
document, its value must be an ASCII case-insensitive match for the string
"UTF-8" (and the document is therefore forced to use UTF-8 as its
encoding).
"""

However, while <meta http-equiv="Content-Type" content="text/html;
charset=utf8"> works, <meta charset="utf8"> does not.

While libxml2 HTML parser is not tuned for HTML5, this is a simple
addition

Also added a testcase
2012-05-10 15:34:57 +08:00
Michael Cronenworth
1eabc31401 Fix library problems with mingw-w64
For https://bugzilla.gnome.org/show_bug.cgi?id=663588
Fix a windows only issue when compiling the library with
MingW (64 bits) using Fedora cross-compiler chain.
Change the dllexport for data
2012-05-10 11:30:07 +08:00
Rob Richards
aa0be5f269 fix windows build.
ifdef addition from bug 666491 makes no sense
2012-05-09 12:42:51 -04:00
Sam Thursfield
115581ae2d prefer native threads on win32
For https://bugzilla.gnome.org/show_bug.cgi?id=665526

When building on Win32 configure the suport to use native Windows
threads since there is support for it unless pthreads are found
and asked for explicitely
2012-05-09 18:46:56 +08:00
Thomas Lemm
066c697772 Allow to compile with Visual Studio 2010
For https://bugzilla.gnome.org/show_bug.cgi?id=666491

This patch adds project files to compile and debug libxml2 using Visual
Studio 2010. Only few minor changes have been made to the actual source
code.

This patch also requires for the iconv package to be compiled with visual
studio 2010 which has been submitted to the iconv project (see:
https://savannah.gnu.org/bugs/?35088)
2012-05-09 18:27:04 +08:00
Timothy Elliott
689408bd86 Prevent an infinite loop when dumping a node with encoding problems
When a node is dumped with a new encoding, we may encounter characters
that are not supported in the new encoding. libxml2 handles this by
replacing the character with character references, but in some encodings
this can result in an infinite loop when the character references
themselves contain unsupported characters.

This fixes the infinite loop by undoing a character reference substitution
when it cannot be inserted, and returning an encoder error.

This bug was noticed when looking into an infinite loop bug report for
the Ruby Nokogiri project. The original bug report, "nokogiri process
hangs on call to inner_html" is here:
https://github.com/tenderlove/nokogiri/issues/400
2012-05-08 22:03:22 +08:00
Bryan Henderson
8658d27d4f wrong message for double hyphen in comment XML error
The error message when you have a double hyphen in a comment is "comment
not terminated" and should be "double hyphen in comment".
2012-05-08 16:39:05 +08:00
Tim Elliott
71a243d5b4 xmlParseNodeInContext problems with an empty document
When you call xmlParseNodeInContext on a fragment node with an
empty document, the parser associates the first new node twice --
once with the document, and once with the fragment node.

This fixes the issue by only associating the new node with the
fragment node.
2012-05-08 13:19:40 +08:00
Pavel Andrejs
8ad4da5f56 HTML element position is not detected propperly
The data in node_seq in xmlParserCtxt was not updated properly
when parsing HTML. This patch fixes the accounting for both
pull and push mode of HTML parsing.
2012-05-08 11:01:12 +08:00
Daniel Veillard
48f0f3f29f Fix "make tst" to grab lzma lib too 2012-05-08 10:59:41 +08:00
Andoni Morales
fda5717c4d Fix mingw's snprintf configure check
For mingw, snprintf is defined as _snprintf and therefore the check
should be for _snprintf. This applies to _vsnprintf too.
2012-05-08 10:46:09 +08:00
Ryan
0cd29a3a25 Add "whereis" command to xmllint shell
When playing with xpath in the xmllint shell, it's really handy to be
able to ask where the returned nodes live in the tree, in the same
way "pwd" asks where the current node lives.

The feature is actually quite easy to implement by combining the
functionality of the existing dir/ls and pwd commands (see proposed patch).

Example usage:

/ > whereis //last_name
/clinical_study/overall_official/last_name
/clinical_study/location/contact/last_name
/clinical_study/location/investigator/last_name
2012-05-07 19:53:19 +08:00
Marcus Meissner
996449273f fixed a 64bit big endian issue
For https://bugzilla.gnome.org/show_bug.cgi?id=671176
patch fixes a 64bit endian issue, making libxml2 work (again) on ppc64
unsigned int and size_t are differently sized on 64bit.
2012-05-07 18:41:42 +08:00
Ryan
40db1eeb36 Improve xmllint shell
For https://bugzilla.gnome.org/show_bug.cgi?id=310222

adds namespace support to ls, du and the element named in
the command shell prompt. It also fixes du to actually dump
the requested path, if the user gives one, rather than always
dumping the whole file.
2012-05-07 17:04:04 +08:00
Ville Skyttä
267b945a63 xmlcatalog: Add uri and delegateURI to possible add types in man page. 2012-05-07 15:34:37 +08:00
Daniel Veillard
9c56dd04ec Update README.tests
document make check, make valgrind and fix a typo pointed out by
Daniel Neel <dneelyep@gmail.com>
Fixes: https://bugzilla.gnome.org/show_bug.cgi?id=617019

Daniel
2012-05-07 15:23:25 +08:00
Jüri Aedla
d8e1faeaa9 Fix an off by one pointer access
getting out of the range of memory allocated for xpointer decoding
2012-05-07 15:06:56 +08:00
Daniel Veillard
fc74a6f5c2 URI handling code is not OOM resilient
as pointed out by Dan Berrange, add a small comment in the header
2012-05-07 15:02:25 +08:00
Daniel Veillard
288bb6274f Fix an error in comment
nsWarn handler is not about parser fatal errors
2012-05-07 15:01:29 +08:00
Javier Jardón
eacf6bc627 Remove vestigial de-ANSI-fication support.
configure.ac (AM_C_PROTOTYPES): Remove call to this macro.
The support for automatic de-ANSI-fication has been deprecated in
automake 1.11.2, and will be removed altogether in automake 1.12.0
2012-04-02 18:18:39 +01:00
Javier Jardón
05fd0285bf autogen.sh: Fix typo 2012-04-02 17:39:26 +01:00
Daniel Veillard
72789ef21f Do not use unsigned but unsigned int
as this breaks the API generator
2012-04-02 17:52:20 +08:00
Daniel Veillard
4aa68abb1c Try to fix a problem with entities in SAX mode
this is a problem which hit the raptor code and that small
patch should be a reliable workaround
2012-04-02 17:50:54 +08:00
Daniel Veillard
d95b689fd9 Fix portability failure if netdb.h lacks NO_ADDRESS 2012-04-02 17:48:53 +08:00
Daniel Veillard
ac17e5939c Remove two references to u_short 2012-04-02 15:45:13 +08:00
Daniel Veillard
bdc64d6d5f Fix a crash with xmllint --path on empty results
If the returned node set is empty, it is possible for the nodetab
to be null
2012-03-27 14:41:37 +08:00
Noam Postavsky
1579499025 add function xmlTextReaderRelaxNGValidateCtxt()
Since there is xmlTextReaderSchemaValidateCtxt() it seems like there
should be an equivalent RelaxNG function. The attached patch adds it.
The code is essentially the same as Schema implementation, but I'm
uncertain as to how to add things to the documentation and test suite:
there seems to be a lot of auto-generation going on.
2012-03-22 10:32:11 +08:00
Rob Richards
2d84ea149b Fix windows build from lzma addition 2012-03-21 10:37:06 -04:00
Daniel Mustieles
fabbca8c16 Fixed bug #617016 2012-03-19 21:42:00 +01:00
Daniel Mustieles
bde9c353fb Fixed bug #667946 2012-03-19 21:39:58 +01:00
Daniel Neel
38812b6fca Fixed two typos in the README document
Changes should be self-explanatory by viewing the diff
2012-03-16 15:12:25 -04:00
Nico Weber
cedf84d35a Fix -Wempty-body warning from clang
clang recently grew a warning on `for (...);`. This patch
fixes all two instances of this pattern in libxml. The changes
don't modify the code semantic.
2012-03-05 16:36:59 +08:00
Ryan Sleevi
5cf1deb080 Fix a logic error in Schemas Component Constraints 2012-02-29 10:56:32 +08:00
Nico Weber
aae48e64df Fix a wrong enum type use in Schemas Types 2012-02-29 09:44:35 +08:00