libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-03-14 22:50:08 +03:00

Author	SHA1	Message	Date
Daniel Veillard	5130481646	Impose a reasonable limit on PI size Unless the XML_PARSE_HUGE option is given to the parser, the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a text node within content. Also cleanup some unsigned int used for memory size.	2012-07-23 14:24:28 +08:00
Daniel Veillard	0de1f3114a	first version of testlimits new test Used to check behaviour on various parsing limits	2012-07-23 14:24:28 +08:00
Daniel Veillard	6568645164	Avoid quadratic behaviour in some push parsing cases avoid rescanning over and over a very long input, just check the incoming chunks	2012-07-23 14:24:28 +08:00
Daniel Veillard	58f73aca1a	Impose a reasonable limit on comment size Unless the XML_PARSE_HUGE option is given to the parser, the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a text node within content. Also cleanup some unsigned int used for memory size.	2012-07-23 14:24:28 +08:00
Daniel Veillard	e17db9946c	Impose a reasonable limit on attribute size Unless the XML_PARSE_HUGE option is given to the parser, the value is XML_MAX_TEXT_LENGTH, i.e. the same than for a text node within content.	2012-07-23 14:24:27 +08:00
Daniel Veillard	b60e612e87	Small cleanup of unused variables in test	2012-07-23 14:24:27 +08:00
Daniel Veillard	9ee02f80a4	Harden the buffer code and make it more compatible Mimic the old xmlBuffer strcture in xmlBuf to avaoid catastrophic failures in case of old code directly reading ctxt->input->buf->buffer Check on all buffer entry points if an error previously occured on the buffer, and fail the operation if this is the case, the buffer becomes immutable and unreadable.	2012-07-23 14:24:27 +08:00
Daniel Veillard	00ac0d3b96	More cleanups for input/buffers code When calling xmlParserInputBufferPush, the buffer may be reallocated and at the input level the pointers for base, cur and end need to be reevaluated. * buf.c buf.h: add two new functions, one to get the base from the input of the buffer, and another one to reset the pointers based on the cur and base inded * HTMLparser.c parser.c: cleanup to use the new helper functions as well as making sure size_t is used for the indexes computations	2012-07-23 14:24:27 +08:00
Daniel Veillard	61551a1eb7	Cleanup function xmlBufResetInput() to set input from Buffer This was scattered in a number of modules, xmlParserInputPtr have usually their base, cur and end pointer set from an xmlBuf used as input. * buf.c buf.h: add a new function implementing this setup * parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c use the new function instead of digging into the buffer in all those modules	2012-07-23 14:24:27 +08:00
Daniel Veillard	145477d8ab	Swicth the test program for characters to new input buffers it was manipulating the buffer content and structures directly this cleans it up	2012-07-23 14:24:27 +08:00
Daniel Veillard	7b9b07198f	Convert the HTML tree module to the new buffers The new input buffers induced a couple of changes, the others are related to the switch to xmlBuf in saving routines.	2012-07-23 14:24:27 +08:00
Daniel Veillard	a78d803639	Convert of the HTML parser to new input buffers Changes similar to the ones done in the XML parser for the routines which are not shared.	2012-07-23 14:24:27 +08:00
Daniel Veillard	dbf5411b21	Convert the writer to new output buffer and save APIs Only a handful of places had to be converted for xmlBuf and the new saving entry point.	2012-07-23 14:24:27 +08:00
Daniel Veillard	8aebce3ec6	Convert XMLReader to the new input buffers A few direct access were replaced, and also one internal xmlBuffer structure is converted to use xmlBuf instead	2012-07-23 14:24:27 +08:00
Daniel Veillard	50cdab5552	New saving functions using xmlBuf and conversion * save.h: new header providing new functions currently internal and xmlBuf counterparts of old xmlBuffer based ones * xmlsave.c: convert functions to use xmlBuf as much as possible	2012-07-23 14:24:27 +08:00
Daniel Veillard	dddeede060	Provide new xmlBuf based saving functions * include/libxml/tree.h: adds xmlBufGetNodeContent and xmlBufNodeDump as xmlBuf based equivalents of xmlNodeGetContent and xmlNodeDump * tree.c: implements one new routine and converts xmlNodeBufGetContent to use the xmlBuf equivalent. It should behave better as a result in case of data larger than 2GB.	2012-07-23 14:24:27 +08:00
Daniel Veillard	345ee8b620	Convert XInclude to the new input buffers A few xmlBuffer...() calls changed to their xmlBuf...() counterparts	2012-07-23 14:24:27 +08:00
Daniel Veillard	2a1d2422a4	Convert catalog code to the new input buffers Only one place where the buffers fields where accessed directly	2012-07-23 14:24:27 +08:00
Daniel Veillard	53aa293dd3	Convert C14N to the new Input buffer one case of direct access cleaned up	2012-07-23 14:24:27 +08:00
Daniel Veillard	a6a6e70c47	Convert xmlIO.c to the new input and output buffers Relatively mechanical changes, this also led to a couple of fixes upon review of the I/O code on buffer usage.	2012-07-23 14:24:26 +08:00
Daniel Veillard	768eb3b82d	Convert XML parser to the new input buffers The main changes are when the internal of the buffers structure were adressed directly, we now use routines coming from buf.h The routine xmlParserInputRead() which wasn't used anywhere is deprecated too.	2012-07-23 14:24:26 +08:00
Daniel Veillard	65c7d3b2e6	Incompatible change to the Input and Output buffers Since the whole set of structures was public, the only way to switch to size_t clean buffer is to introduce an incompatible API change. Modifying the xmlParserInputBuffer and xmlOutputBuffer structures is the best place to make this change as those structures are deep into the parser feeding data, and no public API suggest to build those manually.	2012-07-23 14:24:26 +08:00
Daniel Veillard	18d0db2503	Adding new encoding function to deal with the new structures * encoding.c: adds xmlCharEncFirstLineInput, xmlCharEncInput and xmlCharEncOutput * enc.h: the functions are not made public but added to this new header	2012-07-23 14:24:26 +08:00
Daniel Veillard	ade10f2c57	Convert XPath to xmlBuf Easy as no buffer was exported in the APIs	2012-07-23 14:24:26 +08:00
Daniel Veillard	bca22f40c3	Adding a new buf module for buffers This also add converter functions between xmlBuf and xmlBuffer * buf.c buf.h: the old xmlBuffer routines but modified for size_t and using xmlBuf instead of xmlBuffer * Makefile.am: add the 2 new files * include/libxml/xmlerror.h: add an entry for the new module * include/libxml/tree.h: expose the xmlBufPtr type but not the structure which stay private	2012-07-23 14:24:26 +08:00
Daniel Veillard	4629ee02ac	Do not fetch external parsed entities Unless explicietely asked for when validating or replacing entities with their value. Problem pointed out by Tom Lane <tgl@redhat.com> * parser.c: do not load external parsed entities unless needed * test/errors/extparsedent.xml result/errors/extparsedent.xml*: add a regression test to avoid change of the behaviour in the future	2012-07-23 14:15:40 +08:00
Aron Xu	baaf03f80f	Fix an error in previous commit	2012-07-20 15:41:34 +08:00
Daniel Veillard	4f9fdc709c	Fix entities local buffers size problems	2012-07-18 17:54:05 +08:00
Daniel Veillard	459eeb9dc7	Fix parser local buffers size problems	2012-07-18 17:54:04 +08:00
Daniel Veillard	740cb1a450	Memory error within SAX2 reuse common framework There is no reason for that class of errors to not use the same handling allowing strctured error processing.	2012-07-18 17:48:32 +08:00
Daniel Veillard	c508fa3f0b	Fix a failure to report xmlreader parsing failures Related to https://bugzilla.gnome.org/show_bug.cgi?id=654567 the problem is that the provided patch failed to raise an error on xmlTextReaderRead() return when an actual parsing error occured	2012-07-18 17:48:06 +08:00
Daniel Veillard	549f06a8bd	Expand .gitignore with more files	2012-07-11 15:21:12 +08:00
Daniel Veillard	8fc913fcc9	Fix compilation on older Visual Studio For https://bugzilla.gnome.org/show_bug.cgi?id=666491 Reported by Matt Budd <matt.budd@gmail.com>, the added support for VS 2010 broke older version 2005 and 2008 because it assumed some of the defines where present in all versions, fix that to check the version of VS	2012-06-06 11:29:29 +08:00
Daniel Veillard	2e1eaca637	Fix xmllint --xpath node initialization By default it's more sensible to initialize it to the document itself than the root element	2012-05-25 16:44:20 +08:00
Daniel Veillard	c943f708f1	Release of libxml2-2.8.0 - Makefile.am: don't package .git - configure.in : update to new release - doc/xml.html: added the new release - doc/* testapi.c: regenerated v2.8.0	2012-05-23 17:10:59 +08:00
Daniel Veillard	22030ef888	Restore code for Windows compilation Try to keep as close to rc1 but still allow the change from Roumen for mingw	2012-05-23 15:52:45 +08:00
Daniel Veillard	ee8f1d4cda	Cleanups before 2.8.0-rc2 new symbols, a missing comment and a fix on symbol release v2.8.0-rc2	2012-05-21 11:16:12 +08:00
Roumen Petrov	978ff224b2	use mingw C99 compatible functions {v}snprintf instead those from MSVC runtime	2012-05-21 10:20:09 +08:00
Daniel Veillard	f27c6683e6	New symbols added for the next release	2012-05-21 10:20:09 +08:00
Daniel Veillard	59df1e4f92	Avoid an extra operation In the catalog code, tsan also complained of testing the variable without locking and that was done a few lines below	2012-05-21 10:19:21 +08:00
Daniel Veillard	d495e6a845	Part for rand_r checking missing Forgot to push that change in previous commit	2012-05-20 20:48:34 +08:00
Daniel Veillard	379ebc1d77	Cleanup on randomization tsan reported that rand() is not thread safe, so create a thread safe wrapper, use rand_r() if available. Consolidate the function, initialization and cleanup in dict.c and make sure it is initialized in xmlInitParser()	2012-05-18 15:41:31 +08:00
Andy Lutomirski	9d9685ad88	xmlTextReader bails too quickly on error For https://bugzilla.gnome.org/show_bug.cgi?id=654567 I use xmlTextReader to parse failed that might be incomplete. These files are the beginning of a well-formed file, but the end is missing so the file as a whole is not well-formed. The problem is that xmlTextReader starts returning errors when it encounters the early EOF, even though I haven't finished reading all of the valid data in the file. It would be helpful if xmlTextReader kept working until the very end. v2.8.0-rc1	2012-05-15 20:10:25 +08:00
Pacho Ramos	1ea6b14125	Fix undefined reference in python module For https://bugzilla.gnome.org/show_bug.cgi?id=622023 when compiled with LDFLAGS="${LDFLAGS} -Wl,-z,-defs -Wl,--no-undefined" the python module would failed due to the undefined. This add an explicit reference to python lib.	2012-05-15 19:36:02 +08:00
Daniel Veillard	0d51cfebc9	Fix a race in xmlNewInputStream For https://bugzilla.gnome.org/show_bug.cgi?id=643148 Reported by Bill Clarke <llib@computer.org>, it used a global variable as a counter for the input id and this was not thread safe. To avoid the race without adding unneeded locking in the parser path, move the id to the parser context instead.	2012-05-15 11:18:40 +08:00
Noam	9313ae8517	Fix weird streaming RelaxNG errors For https://bugzilla.gnome.org/show_bug.cgi?id=512454 The bug was to use compiled determinitic automata when the content model was found to be non-deterministic, leading to random parsing errors.	2012-05-15 11:03:46 +08:00
Daniel Veillard	94431ecba6	Fix various bugs in new code raised by the API checking * testapi.c: regenerated and covering new APIs * tree.c: xmlBufferDetach can't work on immutable buffers * xzlib.c: fix a deallocation error	2012-05-15 10:45:05 +08:00
Daniel Veillard	79ee284abb	Fix various problems with "make dist" * tree.c: missing documentation for xmlBufferDetach * doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt and xmlBufferDetach * doc/apibuild.py: ignore internal header xzlib.h	2012-05-15 10:25:31 +08:00
Daniel Veillard	9f3cdef08a	Fix a memory leak in the xzlib code The freeing function wasn't called due to a bogus #ifdef surrounding value. Also switch the code to use the normal libxml2 allocation and freeing routines.	2012-05-15 09:38:13 +08:00
Conrad Irwin	7d0d2a50ac	Use a hybrid allocation scheme in xmlNodeSetContent On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote: > Hi Conrad, > > that's interesting ! I was initially afraid of a sudden explosion of > memory allocations for building a tree since by default buffers tend to > "waste" memory by using doubling allocations, but that's not the case. > xmllint --noout doc/libxml2-api.xml > when compiled with memory debug produce > > paphio:~/XML -> cat .memdump > MEMORY ALLOCATED : 0, MAX was 12756699 > > and without your patch 12755657, i.e. the increase is minimal. Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the same amount of space as XML_ALLOC_DOUBLEIT on this example (see below). So it turns out that the default realloc() on my system actually handles this case really well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the underlying realloc() after all (sorry for misleading you). If you replace the realloc() with a bad one (like valgrind's), then the performance degrades severely. This patch implements a HYBRID allocator which has the behaviour you describe (it's like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the performance of the synthetic pathological example under valgrind. In summary: max_memory on ./xmllint --noout doc/libxml2-api.xml, valgrind time on https://gist.github.com/2656940 max_memory valgrind time before \| 12755657 \| 29:18.2 EXACT \| 12756699 \| 2:58.6 <-- this is the state after the first patch. DOUBLEIT \| 12756727 \| 0:02.7 HYBRID \| 12755754 \| 0:02.7 <-- this is the state with both patches. > > There is also the cost of creating the buffers all the time. > I need to read the code and check but I may be interested in an hybrid > approach where we switch to buffer only when the text node starts to > become too big (4k would remove nearly all usuall types of "document" > usage, i.e. not blocks of data) I tried to avoid too much buffer creation by introducing the xmlBufferDetach function, which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack" in API terms though I thought the gains would be worth it. Conrad ------8<------ To keep memory usage tight in normal conditions it's desirable to only allocate as much space as is needed. Unfortunately this can lead to problems when constructing a long string out of small chunks, because every chunk you add will need to resize the buffer. To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big) from using exact allocations to doubling buffer size every time it is full. This limits the number of buffer resizes to O(log n) (down from O(n)), and thus greatly increases the performance of constructing very large strings in this manner.	2012-05-14 14:18:58 +08:00

... 2 3 4 5 6 ...

4171 Commits