mirror of
https://gitlab.gnome.org/GNOME/libxml2.git
synced 2025-03-11 12:58:16 +03:00
Hi Veillard and all, Firstly, thanks for libxml: it's awesome! I noticed recently that libxml was taking a surprisingly long time to perform some operations (many minutes instead of milliseconds), and so I did some digging. It turns out that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can be called many (many) times when assigning some content into a node. For background, I'm dealing with XML that contains emails, these can have large attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends with . This means that xmlNodeAddContentLen() is being called about 200,000 times, and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic example of this at https://gist.github.com/2656940) The attached patch works around that problem by using the existing buffer API to merge the strings together before even creating the text node, this keeps the number of realloc()s at a managable level. I'd love feedback on the patch, and am happy to fix problems with it, or explore other solutions if you think that this is barking up the wrong tree :). Thanks, Conrad P.S. Should I create a bug for this too? ------8<------ Before this change xmlStringGetNodeList would perform a realloc() of the entire new content for every XML entity in the assigned text in order to merge together adjacent text nodes. This had the effect of making xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on inputs that contained a large number of XML entities. After this change the memory management is done by the buffer API, avoiding the need to continually re-measure and realloc() the string. For my test data (6MB of 80 character lines, each ending with ) this takes the time to xmlSetNodeContent from about 500 seconds to around 50ms. I have not profiled smaller cases, though I tried to minimize the performance impact of my change by avoiding unnecessary string copying. Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>
XML toolkit from the GNOME project Full documentation is available on-line at http://xmlsoft.org/ This code is released under the MIT Licence see the Copyright file. To build on an Unixised setup: ./configure ; make ; make install To build on Windows: see instructions on win32/Readme.txt To assert build quality: on an Unixised setup: run make tests otherwise: There is 3 standalone tools runtest.c runsuite.c testapi.c, which should compile as part of the build or as any application would. Launch them from this directory to get results, runtest checks the proper functionning of libxml2 main APIs while testapi does a full coverage check. Report failures to the list. To report bugs, follow the instructions at: http://xmlsoft.org/bugs.html A mailing-list xml@gnome.org is available, to subscribe: http://mail.gnome.org/mailman/listinfo/xml The list archive is at: http://mail.gnome.org/archives/xml/ All technical answers asked privately will be automatically answered on the list and archived for public access unless privacy is explicitly required and justified. Daniel Veillard $Id$
Description
Languages
C
88.2%
HTML
5.8%
Python
3.6%
M4
0.5%
CMake
0.5%
Other
1.3%