libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-03-14 22:50:08 +03:00

Author	SHA1	Message	Date
Conrad Irwin	7d553f834e	Use buffers when constructing string node lists. Hi Veillard and all, Firstly, thanks for libxml: it's awesome! I noticed recently that libxml was taking a surprisingly long time to perform some operations (many minutes instead of milliseconds), and so I did some digging. It turns out that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can be called many (many) times when assigning some content into a node. For background, I'm dealing with XML that contains emails, these can have large attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends with . This means that xmlNodeAddContentLen() is being called about 200,000 times, and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic example of this at https://gist.github.com/2656940) The attached patch works around that problem by using the existing buffer API to merge the strings together before even creating the text node, this keeps the number of realloc()s at a managable level. I'd love feedback on the patch, and am happy to fix problems with it, or explore other solutions if you think that this is barking up the wrong tree :). Thanks, Conrad P.S. Should I create a bug for this too? ------8<------ Before this change xmlStringGetNodeList would perform a realloc() of the entire new content for every XML entity in the assigned text in order to merge together adjacent text nodes. This had the effect of making xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on inputs that contained a large number of XML entities. After this change the memory management is done by the buffer API, avoiding the need to continually re-measure and realloc() the string. For my test data (6MB of 80 character lines, each ending with ) this takes the time to xmlSetNodeContent from about 500 seconds to around 50ms. I have not profiled smaller cases, though I tried to minimize the performance impact of my change by avoiding unnecessary string copying. Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>	2012-05-14 13:51:30 +08:00
Denis Pauk	a0cd075d94	HTML parser error with <noscript> in the <head> For https://bugzilla.gnome.org/show_bug.cgi?id=615785 When the <noscript> is found, <head> is closed and a <body> element is created. The real <body id="xxx"> gets skipped over, so I can't see any of the body's attributes. Just don't close <head> when encountering a <noscript> Add a regression test too	2012-05-11 19:31:12 +08:00
Remi Gacogne	4609e6c980	XSD: optional element in complex type extension For https://bugzilla.gnome.org/show_bug.cgi?id=609796 Libxml2 fails to validate an instance document against a schema if an element whose type is a complex extension of some base type with an optional child element and that child element is not specified in the instance document. For example, suppose I have some complex type BaseType that is defined to have one child element in a sequence group that has minOccurs set to 0	2012-05-11 15:31:05 +08:00
Daniel Veillard	39d027cdb7	Fix html serialization error and htmlSetMetaEncoding() For https://bugzilla.gnome.org/show_bug.cgi?id=630682 The python tests were reporting errors, some of it was due to a small change in case encoding, but the main one was about htmlSetMetaEncoding(doc, NULL) being broken by not removing the associated meta tag anymore	2012-05-11 12:38:23 +08:00
Daniel Veillard	2c437da7f0	Fix a wrong return value in previous patch	2012-05-11 12:08:15 +08:00
Daniel Veillard	ed35d3d7c3	Fix an uninitialized variable use When compiled without SAX1 support	2012-05-11 10:52:27 +08:00
Brandon Slack	0c7109c81f	Fix a compilation problem with --minimum For https://bugzilla.gnome.org/show_bug.cgi?id=636750 Moved a #endif /* LIBXML_OUTPUT_ENABLED */ a few lines down to avoid reference an undefined variable	2012-05-11 10:50:59 +08:00
Daniel Veillard	399aaba14b	Remove redundant and ungarded include of resolv.h For https://bugzilla.gnome.org/show_bug.cgi?id=617053 This broke the build on Interix-6.0	2012-05-11 10:09:32 +08:00
Christian Dywan	040dcb5995	Remove git error message during configure For https://bugzilla.gnome.org/show_bug.cgi?id=635531 If git is not installed but .git was found configure would emit an error message	2012-05-10 22:55:07 +08:00
Patrick R. Gansterer	023206fc08	xmllint: Build fix for endTimer if !defined(HAVE_GETTIMEOFDAY) For https://bugzilla.gnome.org/show_bug.cgi?id=638649 code was broken !	2012-05-10 22:17:51 +08:00
John Hein	a4fe9b26d3	emove a bashism in confgure.in Not portable, broke on old FreeBSD	2012-05-10 22:12:46 +08:00
Shaun McCance	4cf7325e1f	xinclude with parse="text" does not use the entity loader For https://bugzilla.gnome.org/show_bug.cgi?id=552479 The code for xinclude parse="text" was not using the registered entity loader, defeating attempts to control loading of files.	2012-05-10 20:59:33 +08:00
Denis Pauk	fdf990c2ef	Allow to parse 1 byte HTML files For https://bugzilla.gnome.org/show_bug.cgi?id=605740 File 1 byte long were not accepted by the HTML push parser	2012-05-10 20:40:49 +08:00
Patrick R. Gansterer	204f1f144c	undef ERROR if already defined	2012-05-10 20:24:00 +08:00
Martin Schröder	b91111b475	Patch that fixes the skipping of the HTML_PARSE_NOIMPLIED flag For https://bugzilla.gnome.org/show_bug.cgi?id=642916 I just noticed that the HTML_PARSE_NOIMPLIED flag that you can pass to the HTML-Parser methods doesn't do anything. Its intended purpose is to stop the HTML-parser from forcibly adding a pair of html/body tags if the stream does not contain any. This is highly useful when you don't need this level of strictness. Unfortunately, specifying it doesn't work, because the option is not copied into the parsing context.	2012-05-10 18:52:37 +08:00
Lin Yi-Li	24464be639	Avoid memory leak if xmlParserInputBufferCreateIO fails For https://bugzilla.gnome.org/show_bug.cgi?id=643949 In case of error on an IO creation input the given context is terminated with the given close function, except if the error happened in xmlParserInputBufferCreateIO. This can lead to a resource leak which is fixed by this patch.	2012-05-10 16:14:55 +08:00
Denis Pauk	868d92da89	Add HTML parser support for HTML5 meta charset encoding declaration For https://bugzilla.gnome.org/show_bug.cgi?id=655218 http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element """ The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding). """ However, while <meta http-equiv="Content-Type" content="text/html; charset=utf8"> works, <meta charset="utf8"> does not. While libxml2 HTML parser is not tuned for HTML5, this is a simple addition Also added a testcase	2012-05-10 15:34:57 +08:00
Michael Cronenworth	1eabc31401	Fix library problems with mingw-w64 For https://bugzilla.gnome.org/show_bug.cgi?id=663588 Fix a windows only issue when compiling the library with MingW (64 bits) using Fedora cross-compiler chain. Change the dllexport for data	2012-05-10 11:30:07 +08:00
Rob Richards	aa0be5f269	fix windows build. ifdef addition from bug 666491 makes no sense	2012-05-09 12:42:51 -04:00
Sam Thursfield	115581ae2d	prefer native threads on win32 For https://bugzilla.gnome.org/show_bug.cgi?id=665526 When building on Win32 configure the suport to use native Windows threads since there is support for it unless pthreads are found and asked for explicitely	2012-05-09 18:46:56 +08:00
Thomas Lemm	066c697772	Allow to compile with Visual Studio 2010 For https://bugzilla.gnome.org/show_bug.cgi?id=666491 This patch adds project files to compile and debug libxml2 using Visual Studio 2010. Only few minor changes have been made to the actual source code. This patch also requires for the iconv package to be compiled with visual studio 2010 which has been submitted to the iconv project (see: https://savannah.gnu.org/bugs/?35088)	2012-05-09 18:27:04 +08:00
Timothy Elliott	689408bd86	Prevent an infinite loop when dumping a node with encoding problems When a node is dumped with a new encoding, we may encounter characters that are not supported in the new encoding. libxml2 handles this by replacing the character with character references, but in some encodings this can result in an infinite loop when the character references themselves contain unsupported characters. This fixes the infinite loop by undoing a character reference substitution when it cannot be inserted, and returning an encoder error. This bug was noticed when looking into an infinite loop bug report for the Ruby Nokogiri project. The original bug report, "nokogiri process hangs on call to inner_html" is here: https://github.com/tenderlove/nokogiri/issues/400	2012-05-08 22:03:22 +08:00
Bryan Henderson	8658d27d4f	wrong message for double hyphen in comment XML error The error message when you have a double hyphen in a comment is "comment not terminated" and should be "double hyphen in comment".	2012-05-08 16:39:05 +08:00
Tim Elliott	71a243d5b4	xmlParseNodeInContext problems with an empty document When you call xmlParseNodeInContext on a fragment node with an empty document, the parser associates the first new node twice -- once with the document, and once with the fragment node. This fixes the issue by only associating the new node with the fragment node.	2012-05-08 13:19:40 +08:00
Pavel Andrejs	8ad4da5f56	HTML element position is not detected propperly The data in node_seq in xmlParserCtxt was not updated properly when parsing HTML. This patch fixes the accounting for both pull and push mode of HTML parsing.	2012-05-08 11:01:12 +08:00
Daniel Veillard	48f0f3f29f	Fix "make tst" to grab lzma lib too	2012-05-08 10:59:41 +08:00
Andoni Morales	fda5717c4d	Fix mingw's snprintf configure check For mingw, snprintf is defined as _snprintf and therefore the check should be for _snprintf. This applies to _vsnprintf too.	2012-05-08 10:46:09 +08:00
Ryan	0cd29a3a25	Add "whereis" command to xmllint shell When playing with xpath in the xmllint shell, it's really handy to be able to ask where the returned nodes live in the tree, in the same way "pwd" asks where the current node lives. The feature is actually quite easy to implement by combining the functionality of the existing dir/ls and pwd commands (see proposed patch). Example usage: / > whereis //last_name /clinical_study/overall_official/last_name /clinical_study/location/contact/last_name /clinical_study/location/investigator/last_name	2012-05-07 19:53:19 +08:00
Marcus Meissner	996449273f	fixed a 64bit big endian issue For https://bugzilla.gnome.org/show_bug.cgi?id=671176 patch fixes a 64bit endian issue, making libxml2 work (again) on ppc64 unsigned int and size_t are differently sized on 64bit.	2012-05-07 18:41:42 +08:00
Ryan	40db1eeb36	Improve xmllint shell For https://bugzilla.gnome.org/show_bug.cgi?id=310222 adds namespace support to ls, du and the element named in the command shell prompt. It also fixes du to actually dump the requested path, if the user gives one, rather than always dumping the whole file.	2012-05-07 17:04:04 +08:00
Ville Skyttä	267b945a63	xmlcatalog: Add uri and delegateURI to possible add types in man page.	2012-05-07 15:34:37 +08:00
Daniel Veillard	9c56dd04ec	Update README.tests document make check, make valgrind and fix a typo pointed out by Daniel Neel <dneelyep@gmail.com> Fixes: https://bugzilla.gnome.org/show_bug.cgi?id=617019 Daniel	2012-05-07 15:23:25 +08:00
Jüri Aedla	d8e1faeaa9	Fix an off by one pointer access getting out of the range of memory allocated for xpointer decoding	2012-05-07 15:06:56 +08:00
Daniel Veillard	fc74a6f5c2	URI handling code is not OOM resilient as pointed out by Dan Berrange, add a small comment in the header	2012-05-07 15:02:25 +08:00
Daniel Veillard	288bb6274f	Fix an error in comment nsWarn handler is not about parser fatal errors	2012-05-07 15:01:29 +08:00
Javier Jardón	eacf6bc627	Remove vestigial de-ANSI-fication support. configure.ac (AM_C_PROTOTYPES): Remove call to this macro. The support for automatic de-ANSI-fication has been deprecated in automake 1.11.2, and will be removed altogether in automake 1.12.0	2012-04-02 18:18:39 +01:00
Javier Jardón	05fd0285bf	autogen.sh: Fix typo	2012-04-02 17:39:26 +01:00
Daniel Veillard	72789ef21f	Do not use unsigned but unsigned int as this breaks the API generator	2012-04-02 17:52:20 +08:00
Daniel Veillard	4aa68abb1c	Try to fix a problem with entities in SAX mode this is a problem which hit the raptor code and that small patch should be a reliable workaround	2012-04-02 17:50:54 +08:00
Daniel Veillard	d95b689fd9	Fix portability failure if netdb.h lacks NO_ADDRESS	2012-04-02 17:48:53 +08:00
Daniel Veillard	ac17e5939c	Remove two references to u_short	2012-04-02 15:45:13 +08:00
Daniel Veillard	bdc64d6d5f	Fix a crash with xmllint --path on empty results If the returned node set is empty, it is possible for the nodetab to be null	2012-03-27 14:41:37 +08:00
Noam Postavsky	1579499025	add function xmlTextReaderRelaxNGValidateCtxt() Since there is xmlTextReaderSchemaValidateCtxt() it seems like there should be an equivalent RelaxNG function. The attached patch adds it. The code is essentially the same as Schema implementation, but I'm uncertain as to how to add things to the documentation and test suite: there seems to be a lot of auto-generation going on.	2012-03-22 10:32:11 +08:00
Rob Richards	2d84ea149b	Fix windows build from lzma addition	2012-03-21 10:37:06 -04:00
Daniel Mustieles	fabbca8c16	Fixed bug #617016	2012-03-19 21:42:00 +01:00
Daniel Mustieles	bde9c353fb	Fixed bug #667946	2012-03-19 21:39:58 +01:00
Daniel Neel	38812b6fca	Fixed two typos in the README document Changes should be self-explanatory by viewing the diff	2012-03-16 15:12:25 -04:00
Nico Weber	cedf84d35a	Fix -Wempty-body warning from clang clang recently grew a warning on `for (...);`. This patch fixes all two instances of this pattern in libxml. The changes don't modify the code semantic.	2012-03-05 16:36:59 +08:00
Ryan Sleevi	5cf1deb080	Fix a logic error in Schemas Component Constraints	2012-02-29 10:56:32 +08:00
Nico Weber	aae48e64df	Fix a wrong enum type use in Schemas Types	2012-02-29 09:44:35 +08:00

... 3 4 5 6 7 ...

4171 Commits