1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-03-13 20:58:16 +03:00

4171 Commits

Author SHA1 Message Date
Daniel Veillard
5130481646 Impose a reasonable limit on PI size
Unless the XML_PARSE_HUGE option is given to the parser,
the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a
text node within content.
Also cleanup some unsigned int used for memory size.
2012-07-23 14:24:28 +08:00
Daniel Veillard
0de1f3114a first version of testlimits new test
Used to check behaviour on various parsing limits
2012-07-23 14:24:28 +08:00
Daniel Veillard
6568645164 Avoid quadratic behaviour in some push parsing cases
avoid rescanning over and over a very long input, just check
the incoming chunks
2012-07-23 14:24:28 +08:00
Daniel Veillard
58f73aca1a Impose a reasonable limit on comment size
Unless the XML_PARSE_HUGE option is given to the parser,
the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a
text node within content.
Also cleanup some unsigned int used for memory size.
2012-07-23 14:24:28 +08:00
Daniel Veillard
e17db9946c Impose a reasonable limit on attribute size
Unless the XML_PARSE_HUGE option is given to the parser,
the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a
text node within content.
2012-07-23 14:24:27 +08:00
Daniel Veillard
b60e612e87 Small cleanup of unused variables in test 2012-07-23 14:24:27 +08:00
Daniel Veillard
9ee02f80a4 Harden the buffer code and make it more compatible
Mimic the old xmlBuffer strcture in xmlBuf to avaoid catastrophic
failures in case of old code directly reading ctxt->input->buf->buffer

Check on all buffer entry points if an error previously occured on
the buffer, and fail the operation if this is the case, the buffer
becomes immutable and unreadable.
2012-07-23 14:24:27 +08:00
Daniel Veillard
00ac0d3b96 More cleanups for input/buffers code
When calling xmlParserInputBufferPush, the buffer may be reallocated
and at the input level the pointers for base, cur and end need to
be reevaluated.
* buf.c buf.h: add two new functions, one to get the base from the
  input of the buffer, and another one to reset the pointers based
  on the cur and base inded
* HTMLparser.c parser.c: cleanup to use the new helper functions
  as well as making sure size_t is used for the indexes computations
2012-07-23 14:24:27 +08:00
Daniel Veillard
61551a1eb7 Cleanup function xmlBufResetInput() to set input from Buffer
This was scattered in a number of modules, xmlParserInputPtr
have usually their base, cur and end pointer set from an
xmlBuf used as input.
* buf.c buf.h: add a new function implementing this setup
* parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c
  use the new function instead of digging into the buffer in
  all those modules
2012-07-23 14:24:27 +08:00
Daniel Veillard
145477d8ab Swicth the test program for characters to new input buffers
it was manipulating the buffer content and structures directly
this cleans it up
2012-07-23 14:24:27 +08:00
Daniel Veillard
7b9b07198f Convert the HTML tree module to the new buffers
The new input buffers induced a couple of changes, the others
are related to the switch to xmlBuf in saving routines.
2012-07-23 14:24:27 +08:00
Daniel Veillard
a78d803639 Convert of the HTML parser to new input buffers
Changes similar to the ones done in the XML parser for the
routines which are not shared.
2012-07-23 14:24:27 +08:00
Daniel Veillard
dbf5411b21 Convert the writer to new output buffer and save APIs
Only a handful of places had to be converted for xmlBuf and
the new saving entry point.
2012-07-23 14:24:27 +08:00
Daniel Veillard
8aebce3ec6 Convert XMLReader to the new input buffers
A few direct access were replaced, and also one internal
xmlBuffer structure is converted to use xmlBuf instead
2012-07-23 14:24:27 +08:00
Daniel Veillard
50cdab5552 New saving functions using xmlBuf and conversion
* save.h: new header providing new functions currently internal
          and xmlBuf counterparts of old xmlBuffer based ones
* xmlsave.c: convert functions to use xmlBuf as much as possible
2012-07-23 14:24:27 +08:00
Daniel Veillard
dddeede060 Provide new xmlBuf based saving functions
* include/libxml/tree.h: adds xmlBufGetNodeContent and xmlBufNodeDump
  as xmlBuf based equivalents of xmlNodeGetContent and xmlNodeDump
* tree.c: implements one new routine and converts xmlNodeBufGetContent
  to use the xmlBuf equivalent. It should behave better as a result
  in case of data larger than 2GB.
2012-07-23 14:24:27 +08:00
Daniel Veillard
345ee8b620 Convert XInclude to the new input buffers
A few xmlBuffer...() calls changed to their xmlBuf...() counterparts
2012-07-23 14:24:27 +08:00
Daniel Veillard
2a1d2422a4 Convert catalog code to the new input buffers
Only one place where the buffers fields where accessed directly
2012-07-23 14:24:27 +08:00
Daniel Veillard
53aa293dd3 Convert C14N to the new Input buffer
one case of direct access cleaned up
2012-07-23 14:24:27 +08:00
Daniel Veillard
a6a6e70c47 Convert xmlIO.c to the new input and output buffers
Relatively mechanical changes, this also led to a couple of fixes
upon review of the I/O code on buffer usage.
2012-07-23 14:24:26 +08:00
Daniel Veillard
768eb3b82d Convert XML parser to the new input buffers
The main changes are when the internal of the buffers structure
were adressed directly, we now use routines coming from buf.h
The routine xmlParserInputRead() which wasn't used anywhere is
deprecated too.
2012-07-23 14:24:26 +08:00
Daniel Veillard
65c7d3b2e6 Incompatible change to the Input and Output buffers
Since the whole set of structures was public, the only way
to switch to size_t clean buffer is to introduce an incompatible
API change. Modifying the xmlParserInputBuffer and xmlOutputBuffer
structures is the best place to make this change as those
structures are deep into the parser feeding data, and no public
API suggest to build those manually.
2012-07-23 14:24:26 +08:00
Daniel Veillard
18d0db2503 Adding new encoding function to deal with the new structures
* encoding.c: adds xmlCharEncFirstLineInput, xmlCharEncInput and
  xmlCharEncOutput
* enc.h: the functions are not made public but added to this new header
2012-07-23 14:24:26 +08:00
Daniel Veillard
ade10f2c57 Convert XPath to xmlBuf
Easy as no buffer was exported in the APIs
2012-07-23 14:24:26 +08:00
Daniel Veillard
bca22f40c3 Adding a new buf module for buffers
This also add converter functions between xmlBuf and xmlBuffer
* buf.c buf.h: the old xmlBuffer routines but modified for size_t
  and using xmlBuf instead of xmlBuffer
* Makefile.am: add the 2 new files
* include/libxml/xmlerror.h: add an entry for the new module
* include/libxml/tree.h: expose the xmlBufPtr type but not the
  structure which stay private
2012-07-23 14:24:26 +08:00
Daniel Veillard
4629ee02ac Do not fetch external parsed entities
Unless explicietely asked for when validating or replacing entities
with their value. Problem pointed out by Tom Lane <tgl@redhat.com>

* parser.c: do not load external parsed entities unless needed
* test/errors/extparsedent.xml result/errors/extparsedent.xml*:
  add a regression test to avoid change of the behaviour in the future
2012-07-23 14:15:40 +08:00
Aron Xu
baaf03f80f Fix an error in previous commit 2012-07-20 15:41:34 +08:00
Daniel Veillard
4f9fdc709c Fix entities local buffers size problems 2012-07-18 17:54:05 +08:00
Daniel Veillard
459eeb9dc7 Fix parser local buffers size problems 2012-07-18 17:54:04 +08:00
Daniel Veillard
740cb1a450 Memory error within SAX2 reuse common framework
There is no reason for that class of errors to not use
the same handling allowing strctured error processing.
2012-07-18 17:48:32 +08:00
Daniel Veillard
c508fa3f0b Fix a failure to report xmlreader parsing failures
Related to https://bugzilla.gnome.org/show_bug.cgi?id=654567
the problem is that the provided patch failed to raise an error
on xmlTextReaderRead() return when an actual parsing error occured
2012-07-18 17:48:06 +08:00
Daniel Veillard
549f06a8bd Expand .gitignore with more files 2012-07-11 15:21:12 +08:00
Daniel Veillard
8fc913fcc9 Fix compilation on older Visual Studio
For https://bugzilla.gnome.org/show_bug.cgi?id=666491

Reported by Matt Budd <matt.budd@gmail.com>, the added support
for VS 2010 broke older version 2005 and 2008 because it assumed
some of the defines where present in all versions, fix that
to check the version of VS
2012-06-06 11:29:29 +08:00
Daniel Veillard
2e1eaca637 Fix xmllint --xpath node initialization
By default it's more sensible to initialize it to the document itself
than the root element
2012-05-25 16:44:20 +08:00
Daniel Veillard
c943f708f1 Release of libxml2-2.8.0
- Makefile.am: don't package .git
- configure.in : update to new release
- doc/xml.html: added the new release
- doc/* testapi.c: regenerated
v2.8.0
2012-05-23 17:10:59 +08:00
Daniel Veillard
22030ef888 Restore code for Windows compilation
Try to keep as close to rc1 but still allow the change from Roumen for
mingw
2012-05-23 15:52:45 +08:00
Daniel Veillard
ee8f1d4cda Cleanups before 2.8.0-rc2
new symbols, a missing comment and a fix on symbol release
v2.8.0-rc2
2012-05-21 11:16:12 +08:00
Roumen Petrov
978ff224b2 use mingw C99 compatible functions {v}snprintf instead those from MSVC runtime 2012-05-21 10:20:09 +08:00
Daniel Veillard
f27c6683e6 New symbols added for the next release 2012-05-21 10:20:09 +08:00
Daniel Veillard
59df1e4f92 Avoid an extra operation
In the catalog code, tsan also complained of testing
the variable without locking and that was done a few lines below
2012-05-21 10:19:21 +08:00
Daniel Veillard
d495e6a845 Part for rand_r checking missing
Forgot to push that change in previous commit
2012-05-20 20:48:34 +08:00
Daniel Veillard
379ebc1d77 Cleanup on randomization
tsan reported that rand() is not thread safe, so create
a thread safe wrapper, use rand_r() if available.
Consolidate the function, initialization and cleanup in
dict.c and make sure it is initialized in xmlInitParser()
2012-05-18 15:41:31 +08:00
Andy Lutomirski
9d9685ad88 xmlTextReader bails too quickly on error
For https://bugzilla.gnome.org/show_bug.cgi?id=654567
I use xmlTextReader to parse failed that might be incomplete.  These files are
the beginning of a well-formed file, but the end is missing so the file as a
whole is not well-formed.

The problem is that xmlTextReader starts returning errors when it encounters
the early EOF, even though I haven't finished reading all of the valid data in
the file.  It would be helpful if xmlTextReader kept working until the very
end.
v2.8.0-rc1
2012-05-15 20:10:25 +08:00
Pacho Ramos
1ea6b14125 Fix undefined reference in python module
For https://bugzilla.gnome.org/show_bug.cgi?id=622023
when compiled with LDFLAGS="${LDFLAGS} -Wl,-z,-defs -Wl,--no-undefined"
the python module would failed due to the undefined. This add an
explicit reference to python lib.
2012-05-15 19:36:02 +08:00
Daniel Veillard
0d51cfebc9 Fix a race in xmlNewInputStream
For https://bugzilla.gnome.org/show_bug.cgi?id=643148
Reported by Bill Clarke <llib@computer.org>, it used a global variable
as a counter for the input id and this was not thread safe. To avoid the
race without adding unneeded locking in the parser path, move the id to
the parser context instead.
2012-05-15 11:18:40 +08:00
Noam
9313ae8517 Fix weird streaming RelaxNG errors
For https://bugzilla.gnome.org/show_bug.cgi?id=512454
The bug was to use compiled determinitic automata when
the content model was found to be non-deterministic, leading
to random parsing errors.
2012-05-15 11:03:46 +08:00
Daniel Veillard
94431ecba6 Fix various bugs in new code raised by the API checking
* testapi.c: regenerated and covering new APIs
* tree.c: xmlBufferDetach can't work on immutable buffers
* xzlib.c: fix a deallocation error
2012-05-15 10:45:05 +08:00
Daniel Veillard
79ee284abb Fix various problems with "make dist"
* tree.c: missing documentation for xmlBufferDetach
* doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt
                   and xmlBufferDetach
* doc/apibuild.py: ignore internal header xzlib.h
2012-05-15 10:25:31 +08:00
Daniel Veillard
9f3cdef08a Fix a memory leak in the xzlib code
The freeing function wasn't called due to a bogus #ifdef surrounding
value. Also switch the code to use the normal libxml2 allocation and
freeing routines.
2012-05-15 09:38:13 +08:00
Conrad Irwin
7d0d2a50ac Use a hybrid allocation scheme in xmlNodeSetContent
On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote:
>  Hi Conrad,
>
> that's interesting ! I was initially afraid of a sudden explosion of
> memory allocations for building a tree since by default buffers tend to
> "waste" memory by using doubling allocations, but that's not the case.
>  xmllint --noout doc/libxml2-api.xml
> when compiled with memory debug produce
>
> paphio:~/XML -> cat .memdump
>      MEMORY ALLOCATED : 0, MAX was 12756699
>
> and without your patch 12755657, i.e. the increase is minimal.

Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This
is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the
same amount of space as XML_ALLOC_DOUBLEIT on this example (see below).

So it turns out that the default realloc() on my system actually handles this case really
well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the
underlying realloc() after all (sorry for misleading you). If you replace the realloc()
with a bad one (like valgrind's), then the performance degrades severely.

This patch implements a HYBRID allocator which has the behaviour you describe (it's
like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT
after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the
performance of the synthetic pathological example under valgrind.

In summary:

     max_memory on ./xmllint --noout doc/libxml2-api.xml,
     valgrind time on https://gist.github.com/2656940

            max_memory    valgrind time
before   |  12755657    | 29:18.2
EXACT    |  12756699    |  2:58.6 <-- this is the state after the first patch.
DOUBLEIT |  12756727    |  0:02.7
HYBRID   |  12755754    |  0:02.7 <-- this is the state with both patches.

>
> There is also the cost of creating the buffers all the time.
> I need to read the code and check but I may be interested in an hybrid
> approach where we switch to buffer only when the text node starts to
> become too big (4k would remove nearly all usuall types of "document"
> usage, i.e. not blocks of data)

I tried to avoid too much buffer creation by introducing the xmlBufferDetach function,
which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack"
in API terms though I thought the gains would be worth it.

Conrad

------8<------

To keep memory usage tight in normal conditions it's desirable to only
allocate as much space as is needed. Unfortunately this can lead to
problems when constructing a long string out of small chunks, because
every chunk you add will need to resize the buffer.

To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big)
from using exact allocations to doubling buffer size every time it is
full. This limits the number of buffer resizes to O(log n) (down from
O(n)), and thus greatly increases the performance of constructing very
large strings in this manner.
2012-05-14 14:18:58 +08:00