1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 20:25:14 +03:00
Commit Graph

845 Commits

Author SHA1 Message Date
Daniel Veillard
8fc913fcc9 Fix compilation on older Visual Studio
For https://bugzilla.gnome.org/show_bug.cgi?id=666491

Reported by Matt Budd <matt.budd@gmail.com>, the added support
for VS 2010 broke older version 2005 and 2008 because it assumed
some of the defines where present in all versions, fix that
to check the version of VS
2012-06-06 11:29:29 +08:00
Daniel Veillard
379ebc1d77 Cleanup on randomization
tsan reported that rand() is not thread safe, so create
a thread safe wrapper, use rand_r() if available.
Consolidate the function, initialization and cleanup in
dict.c and make sure it is initialized in xmlInitParser()
2012-05-18 15:41:31 +08:00
Daniel Veillard
0d51cfebc9 Fix a race in xmlNewInputStream
For https://bugzilla.gnome.org/show_bug.cgi?id=643148
Reported by Bill Clarke <llib@computer.org>, it used a global variable
as a counter for the input id and this was not thread safe. To avoid the
race without adding unneeded locking in the parser path, move the id to
the parser context instead.
2012-05-15 11:18:40 +08:00
Conrad Irwin
7d0d2a50ac Use a hybrid allocation scheme in xmlNodeSetContent
On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote:
>  Hi Conrad,
>
> that's interesting ! I was initially afraid of a sudden explosion of
> memory allocations for building a tree since by default buffers tend to
> "waste" memory by using doubling allocations, but that's not the case.
>  xmllint --noout doc/libxml2-api.xml
> when compiled with memory debug produce
>
> paphio:~/XML -> cat .memdump
>      MEMORY ALLOCATED : 0, MAX was 12756699
>
> and without your patch 12755657, i.e. the increase is minimal.

Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This
is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the
same amount of space as XML_ALLOC_DOUBLEIT on this example (see below).

So it turns out that the default realloc() on my system actually handles this case really
well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the
underlying realloc() after all (sorry for misleading you). If you replace the realloc()
with a bad one (like valgrind's), then the performance degrades severely.

This patch implements a HYBRID allocator which has the behaviour you describe (it's
like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT
after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the
performance of the synthetic pathological example under valgrind.

In summary:

     max_memory on ./xmllint --noout doc/libxml2-api.xml,
     valgrind time on https://gist.github.com/2656940

            max_memory    valgrind time
before   |  12755657    | 29:18.2
EXACT    |  12756699    |  2:58.6 <-- this is the state after the first patch.
DOUBLEIT |  12756727    |  0:02.7
HYBRID   |  12755754    |  0:02.7 <-- this is the state with both patches.

>
> There is also the cost of creating the buffers all the time.
> I need to read the code and check but I may be interested in an hybrid
> approach where we switch to buffer only when the text node starts to
> become too big (4k would remove nearly all usuall types of "document"
> usage, i.e. not blocks of data)

I tried to avoid too much buffer creation by introducing the xmlBufferDetach function,
which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack"
in API terms though I thought the gains would be worth it.

Conrad

------8<------

To keep memory usage tight in normal conditions it's desirable to only
allocate as much space as is needed. Unfortunately this can lead to
problems when constructing a long string out of small chunks, because
every chunk you add will need to resize the buffer.

To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big)
from using exact allocations to doubling buffer size every time it is
full. This limits the number of buffer resizes to O(log n) (down from
O(n)), and thus greatly increases the performance of constructing very
large strings in this manner.
2012-05-14 14:18:58 +08:00
Conrad Irwin
7d553f834e Use buffers when constructing string node lists.
Hi Veillard and all,

Firstly, thanks for libxml: it's awesome!

I noticed recently that libxml was taking a surprisingly long time to perform some
operations (many minutes instead of milliseconds), and so I did some digging. It turns out
that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can
be called many (many) times when assigning some content into a node.

For background, I'm dealing with XML that contains emails, these can have large
attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends
with &#13;. This means that xmlNodeAddContentLen() is being called about 200,000 times,
and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic
example of this at https://gist.github.com/2656940)

The attached patch works around that problem by using the existing buffer API to merge the
strings together before even creating the text node, this keeps the number of realloc()s
at a managable level.

I'd love feedback on the patch, and am happy to fix problems with it, or explore other
solutions if you think that this is barking up the wrong tree :).

Thanks,

Conrad

P.S. Should I create a bug for this too?

------8<------

Before this change xmlStringGetNodeList would perform a realloc() of the
entire new content for every XML entity in the assigned text in order to
merge together adjacent text nodes. This had the effect of making
xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on
inputs that contained a large number of XML entities.

After this change the memory management is done by the buffer API,
avoiding the need to continually re-measure and realloc() the string.

For my test data (6MB of 80 character lines, each ending with &#13;)
this takes the time to xmlSetNodeContent from about 500 seconds to
around 50ms. I have not profiled smaller cases, though I tried to
minimize the performance impact of my change by avoiding unnecessary
string copying.

Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>
2012-05-14 13:51:30 +08:00
Michael Cronenworth
1eabc31401 Fix library problems with mingw-w64
For https://bugzilla.gnome.org/show_bug.cgi?id=663588
Fix a windows only issue when compiling the library with
MingW (64 bits) using Fedora cross-compiler chain.
Change the dllexport for data
2012-05-10 11:30:07 +08:00
Thomas Lemm
066c697772 Allow to compile with Visual Studio 2010
For https://bugzilla.gnome.org/show_bug.cgi?id=666491

This patch adds project files to compile and debug libxml2 using Visual
Studio 2010. Only few minor changes have been made to the actual source
code.

This patch also requires for the iconv package to be compiled with visual
studio 2010 which has been submitted to the iconv project (see:
https://savannah.gnu.org/bugs/?35088)
2012-05-09 18:27:04 +08:00
Noam Postavsky
1579499025 add function xmlTextReaderRelaxNGValidateCtxt()
Since there is xmlTextReaderSchemaValidateCtxt() it seems like there
should be an equivalent RelaxNG function. The attached patch adds it.
The code is essentially the same as Schema implementation, but I'm
uncertain as to how to add things to the documentation and test suite:
there seems to be a lot of auto-generation going on.
2012-03-22 10:32:11 +08:00
Anders F Bjorklund
eae5261779 add lzma compression support 2012-01-27 22:19:52 +08:00
Daniel Veillard
f5048b3e71 Hardening of XPath evaluation
Add a mechanism of frame for XPath evaluation when entering a function
or a scoped evaluation, also fix a potential problem in predicate
evaluation.
2011-08-19 11:07:51 +08:00
Daniel Veillard
c62efc847c Add options to ignore the internal encoding
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.

* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
  for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
  HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
  defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
2011-05-26 11:47:37 +08:00
Daniel Veillard
4c2e7c651f Release of libxml2-2.7.8 2010-11-04 18:35:57 +01:00
Giuseppe Iuculano
48f7dcb724 480323 add code to plug in ICU converters by default
This is not configured in by default but after some serious massaging
incorporate that patch from Chromium/Chrome.
2010-11-04 17:42:42 +01:00
Ozkan Sezer
f99d222316 614087 Fix Socket API usage to allow Windows64 compilation
In Windows 64 a socket is no more represented by an int,
this breaks the nanoftp API and nanoftp/nanohttp, the patch
changes this and fix the API for Win64
Regenerated the XML and documentation as a result too.
2010-11-04 12:08:08 +01:00
Adrian Bunk
64b0d60c28 Switch from the obsolete mkinstalldirs to AC_PROG_MKDIR_P
This was obsoleted in 2005 so we should be safe.
But keep AC_PREREQ to 2.59 as it's still widely deployed.
2010-11-04 09:43:31 +01:00
Adam Spragg
d2e62311cd Add xmlSaveOption XML_SAVE_WSNONSIG
non destructive indentation option using spaces within markup
constructs and hence not modifying content
* include/libxml/xmlsave.h: new option
* xmlsave.c: some refactoring and new code for the new option
* xmllint.c: adds --pretty option where option 2 uses the new formatting
2010-11-03 15:33:40 +01:00
Daniel Veillard
f1121c48af Add an HTML parser option to avoid a default doctype
- include/libxml/HTMLparser.h: defines the new HTML parser option
  HTML_PARSE_NODEFDTD
- HTMLparser.c: if option is set don't add a default DTD
- xmllint.c: add the corresponding --nodefdtd option in xmllint
2010-07-26 14:02:42 +02:00
Eugene Pimenov
615904f582 Switch the HTML parser to be non-recursive
* HTMLparser.c: new htmlParseElementInternal non recursive, with
  htmlParseContentInternal and new function to handle node info
  and element end.
* include/libxml/parser.h: add new stack for element info in parser
  context
* parserInternals.c: fee element info stack
2010-03-15 15:16:02 +01:00
Roumen Petrov
120a269976 Fix build with mingw
- include/libxml/xmlexports.h: restore export decoration otherwise
  xsltproc and xmlsec crash
- libxml.h: define LIBXML_STATIC for static build
- configure.in: enable modules support for mingw* builds
- Makefile.am: flags for testdso if modules support enabled
2010-03-10 10:07:49 +01:00
Daniel Veillard
e20fb5a72c Fix xmlParseInNodeContext for HTML content
xmlParseInNodeContext notices that the enclosing document is
an HTML document, so invoke the HTML parser for that fragment, and
the HTML parser finding a "<p>hello world!</p>" document automatically
augment it with defaulted <html> and <body>. This defaulting should
be turned off in the HTML parser for this to work, but there is no
such HTML parser option. There is an htmlOmittedDefaultValue global
variable that you could use, but really we should not rely on global
variable for processing options anymore, best is to add an
HTML_PARSE_NOIMPLIED.
* include/libxml/HTMLparser.h: add the HTML_PARSE_NOIMPLIED parser flag
* HTMLparser.c: do add implied element if HTML_PARSE_NOIMPLIED is set
* parser.c: add HTML_PARSE_NOIMPLIED to options for xmlParseInNodeContext
  on HTML documents
2010-01-29 20:47:08 +01:00
Daniel Veillard
57f71aed7d 594250 rename ATTRIBUTE_ALLOC_SIZE to avoid clashes
* include/libxml/xmlmemory.h include/libxml/xmlversion.h.in: rename it
  to LIBXML_ATTR_ALLOC_SIZE to avoid conflicts in public headers
2009-09-09 18:57:26 +02:00
Paul Smith
65d359e3a5 Fix the globals.h to use XMLPUBFUN
* include/libxml/globals.h: in addition to the extern extern
  Paul Smith noted that XMLPUBFUN should be used instead of
  LIBXML_DLL_IMPORT
2009-09-07 15:24:24 +02:00
Daniel Veillard
82cf412da8 Problem with extern extern in header
* include/libxml/globals.h: LIBXML_DLL_IMPORT should not be
  followed by extern
* include/libxml/xmlmemory.h: fix the same problem but in a comment
2009-09-07 15:20:24 +02:00
Stefan Behnel
b9590e9cd2 440226 Add xmlXIncludeProcessTreeFlagsData API
* xinclude.c include/libxml/xinclude.h: new function similar to
  xmlXIncludeProcessFlagsData but operating on a subtree
2009-08-24 19:45:54 +02:00
Wang Lam
1de382eb06 Fix SetGenericErrorFunc and SetStructured clash
* include/libxml/globals.h globals.c global.data: define a new global
  variable (per thread) for structured error reporting, to not conflict
  with generic one
* error.c: when defined use the structured error report over any generic
  one
2009-08-24 17:34:25 +02:00
Daniel Veillard
029a04d265 541335 HTML avoid creating 2 head or 2 body element
* HTMLparser.c: check when we see an head or a body tag and avoid
  autogenerating them
* include/libxml/parser.h: the values for ctxt->html change depending
  on the head or body tags being seen
2009-08-24 12:50:23 +02:00
Daniel Veillard
f39eafaa90 Make xmlRecoverDoc const (Martin Trappel)
* include/libxml/parser.h parser.c: just make the parameter a const
2009-08-20 19:15:08 +02:00
Daniel Veillard
fcf2457d20 Both args of xmlStrcasestr are const
* include/libxml/xmlstring.h xmlstring.c: fix the constness of the
  second arg of xmlStrcasestr()
2009-08-12 23:02:08 +02:00
Daniel Veillard
a194ccb8d1 Try to avoid __imp__xmlFree link trouble on msys
* include/libxml/xmlexports.h: when compiling with mingw/MSYS or linking
  to an precompiled library this _imp__xmlFree missing at runtime is a
  common problem. Igor and various people faced it and this seems the
  minimal fix for it, should resolve 590302 and 561340
2009-08-10 10:08:41 +02:00
Aleksey Sanin
175beba061 Fix a couple of ABI issues with C14N 1.1
* include/libxml/c14n.h c14n.c: fix API to not include enum xmlC14NMode
  in the arguments, and do a bit more check on input
2009-07-09 22:54:00 +02:00
Aleksey Sanin
838682478c Aleksey Sanin support for c14n 1.1
* c14n.c include/libxml/c14n.h: adds support for C14N 1.1,
  new flags at the API level
* runtest.c Makefile.am testC14N.c xmllint.c: add support in CLI
  tools and test binaries
* result/c14n/1-1-without-comments/* test/c14n/1-1-without-comments/*:
  add a new batch of tests
2009-07-09 10:26:22 +02:00
Daniel Veillard
f076f348c4 change ATTRIBUTE_PRINTF into LIBXML_ATTR_FORMAT to avoid macro name
* include/libxml/parser.h include/libxml/xmlwriter.h
  include/libxml/relaxng.h include/libxml/xmlversion.h.in
  include/libxml/xmlwin32version.h.in include/libxml/valid.h
  include/libxml/xmlschemas.h include/libxml/xmlerror.h: change
  ATTRIBUTE_PRINTF into LIBXML_ATTR_FORMAT to avoid macro name
  collisions with other packages and headers as reported by
  Belgabor and Mike Hommey
daniel

svn path=/trunk/; revision=3827
2009-04-15 09:20:25 +00:00
Daniel Veillard
48b3eb22c2 fixes for Borland/CodeGear/Embarcadero compilers by Eric Zurcher Daniel
* include/wsockcompat.h win32/Makefile.bcb xpath.c: fixes for
  Borland/CodeGear/Embarcadero compilers by Eric Zurcher
Daniel

svn path=/trunk/; revision=3822
2009-03-25 09:51:19 +00:00
Daniel Veillard
97ff9b367a preparing 0.7.3 release fix a typo in a name Daniel
* configure.in doc/xml.html doc/*: preparing 0.7.3 release
* include/libxml/parserInternals.h SAX2.c: fix a typo in a name
Daniel

svn path=/trunk/; revision=3814
2009-01-18 21:43:30 +00:00
Daniel Veillard
f63085de5e port patch from Marcus Meissner to add gcc checking for printf like
* include/libxml/parser.h include/libxml/xmlwriter.h
  include/libxml/relaxng.h include/libxml/xmlversion.h.in
  include/libxml/xmlwin32version.h.in include/libxml/valid.h
  include/libxml/xmlschemas.h include/libxml/xmlerror.h:
  port patch from Marcus Meissner to add gcc checking for
  printf like functions parameters, should fix #65068
* doc/apibuild.py doc/*: modified the script accordingly
  and regenerated
* xpath.c xmlmemory.c threads.c: fix a few warnings
Daniel

svn path=/trunk/; revision=3813
2009-01-18 20:53:59 +00:00
Daniel Veillard
d032a5bc21 windows header should get the same define Daniel
* include/libxml/xmlwin32version.h.in: windows header should
  get the same define
Daniel

svn path=/trunk/; revision=3812
2009-01-18 19:41:26 +00:00
Daniel Veillard
d4d4705780 apply patch from Marcus Meissner to add gcc attribute alloc_size should
* include/libxml/xmlversion.h.in include/libxml/xmlmemory.h:
  apply patch from Marcus Meissner to add gcc attribute alloc_size
  should fix #552505
* doc/apibuild.py doc/* testapi.c: regenerate the API
* include/libxml/parserInternals.h: fix a comment problem raised
  by apibuild.py
daniel

svn path=/trunk/; revision=3811
2009-01-18 17:26:02 +00:00
Daniel Veillard
1fb2e0dfc6 add a new define XML_MAX_TEXT_LENGHT limiting the maximum size of a single
* include/libxml/parserInternals.h SAX2.c: add a new define
  XML_MAX_TEXT_LENGHT limiting the maximum size of a single text
  node, the defaultis 10MB and can be removed with the HUGE
  parsing option
Daniel

svn path=/trunk/; revision=3808
2009-01-18 14:08:36 +00:00
Rob Richards
b9ed017d31 add XML_PARSE_OLDSAX parser option to enable pre 2.7 SAX behavior.
* include/libxml/parser.h parser.c: add XML_PARSE_OLDSAX parser 
  option to enable pre 2.7 SAX behavior.

svn path=/trunk/; revision=3807
2009-01-05 17:28:50 +00:00
Daniel Veillard
be2bd6ac6f adds element traversal support avoid a warning regenerated daniel
* include/libxml/tree.h tree.c python/generator.py: adds
  element traversal support
* valid.c: avoid a warning
* doc/*: regenerated
daniel

svn path=/trunk/; revision=3804
2008-11-27 15:26:28 +00:00
Daniel Veillard
856d92818b new options to serialize as XML/HTML/XHTML and restore old entry point
* include/libxml/xmlsave.h xmlsave.c: new options to serialize
  as XML/HTML/XHTML and restore old entry point behaviours
Daniel

svn path=/trunk/; revision=3794
2008-09-25 14:31:40 +00:00
Daniel Veillard
e83e93e715 make a new kind of buffer where shrinking and adding in head can avoid
* include/libxml/tree.h tree.c: make a new kind of buffer where
  shrinking and adding in head can avoid reallocation or full
  buffer memmoves
* encoding.c xmlIO.c: use the new kind of buffers for output
  buffers
Daniel

svn path=/trunk/; revision=3787
2008-08-30 12:52:26 +00:00
Daniel Veillard
0161e638c6 completely different fix for the recursion detection based on entity
* parser.c include/libxml/parser.h: completely different fix for
  the recursion detection based on entity density, big cleanups
  in the entity parsing code too
* result/*.sax*: the parser should not ask for used defined versions
  of the predefined entities
* testrecurse.c: automatic test for entity recursion checks
* Makefile.am: added testrecurse
* test/recurse/lol* test/recurse/good*: a first set of tests for
  the recursion
Daniel

svn path=/trunk/; revision=3783
2008-08-28 15:36:32 +00:00
Daniel Veillard
49d4405a6d a bit of cleanup and added checks based on the regression tests of the
* include/libxml/xmlerror.h parser.c: a bit of cleanup and
  added checks based on the regression tests of the xmlconf suite
Daniel

svn path=/trunk/; revision=3782
2008-08-27 19:57:06 +00:00
Daniel Veillard
a8f09ce8d3 cleanup entity pushing error handling based on a patch from Ashwin daniel
* include/libxml/parserInternals.h parser.c: cleanup entity
  pushing error handling based on a patch from Ashwin
daniel

svn path=/trunk/; revision=3779
2008-08-27 13:02:01 +00:00
Daniel Veillard
8915c150b5 strengthen some of the internal parser limits, add an XML_PARSE_HUGE
* include/libxml/parser.h parser.c xmllint.c: strengthen some
  of the internal parser limits, add an XML_PARSE_HUGE option
  to bypass them all. More internal parser limits will still need
  to be added.
Daniel

svn path=/trunk/; revision=3777
2008-08-26 13:05:34 +00:00
Daniel Veillard
54bd29b79b patch based on Wieant Nielander contribution to add the option of not
* include/libxml/parser.h xinclude.c xmllint.c: patch based on
  Wieant Nielander contribution to add the option of not doing
  URI base fixup in XInclude
Daniel

svn path=/trunk/; revision=3775
2008-08-26 07:26:55 +00:00
Daniel Veillard
aa6de47ebf applied patch from Aswin to fix tree skipping fixed a comment and added a
* xmlreader.c: applied patch from Aswin to fix tree skipping
* include/libxml/entities.h entities.c: fixed a comment and
  added a new xmlNewEntity() entry point
* runtest.c: be less verbose
* tree.c: space and tabs cleanups
daniel

svn path=/trunk/; revision=3774
2008-08-25 14:53:31 +00:00
Daniel Veillard
f4f4e4853a rework the patch to avoid some ABI issue with people allocating entities
* include/libxml/entities.h entities.c SAX2.c parser.c: rework
  the patch to avoid some ABI issue with people allocating
  entities structure directly
Daniel

svn path=/trunk/; revision=3773
2008-08-25 08:57:48 +00:00
Daniel Veillard
4bf899bf1b fix for CVE-2008-3281 Daniel
* include/libxml/parser.h include/libxml/entities.h entities.c
  parserInternals.c parser.c: fix for CVE-2008-3281
Daniel

svn path=/trunk/; revision=3772
2008-08-20 17:04:30 +00:00