IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Consolidate code paths evaluating XPath predicates and filters.
Don't push context node on stack when evaluating predicates. I have no
idea why this was done. It seems completely useless and trying to pop
the context node from a corrupted stack has already caused security
issues.
Filter nodesets in-place and don't create node sets with NULL gaps which
allows to simplify merging a great deal. Simply move matched nodes
backward and create a compact node set.
Merge xmlXPathCompOpEvalPositionalPredicate into
xmlXPathCompOpEvalPredicate.
Useful to avoid call stack overflows when fuzzing. Note that parsing a
parenthesized expression currently consumes more than 10 stack frames,
so this limit should be set rather low.
Optionally limit the maximum numbers of XPath operations when evaluating
an expression. Useful to avoid timeouts when fuzzing. The following
operations count towards the limit:
- XPath operations
- Location step iterations
- Union operations
Enabled by setting opLimit to a non-zero value. Note that it's the user's
responsibility to reset opCount. This allows to enforce the operation
limit across multiple reuses of an XPath context.
Check that there's exactly one return value on the stack after calling
XPath functions. Otherwise, functions that corrupt the stack without
signaling an error could lead to memory errors.
Found with libFuzzer and UBSan.
Rewrite conversion of double to int in xmlXPathSubstringFunction, adding
range checks to avoid undefined behavior. Make sure to add start and
length as floating-point numbers before converting to int. Fix a bug
when rounding negative start indices.
Remove unneeded calls to xmlXPathIs{Inf,NaN} and rely on IEEE math
instead. Avoid computing the string length. xmlUTF8Strsub works as
expected if the length of the requested substring exceeds the input.
Found with libFuzzer and UBSan.
If a nodeset to be filtered is empty, it can be returned without popping
it from the stack.
Make sure to restore the context node in all error paths and never set
it to NULL.
Save and restore the context node in RANGETO operations.
On some pre-C99 compilers, the NAN and INFINITY macros don't expand to
constant expressions.
Some MSVC versions complain about floating point division by zero in
constants.
Thanks to Fabrice Manfroi for the report.
Use C99 macros NAN, INFINITY, isnan, isinf. If they're not available:
- Assume that (0.0 / 0.0) generates a NaN and !(x == x) tests for NaN.
- Use C89's HUGE_VAL for INFINITY.
Remove manual handling of NaN, infinity and negative zero in functions
xmlXPathValueFlipSign and xmlXPathDivValues.
Remove xmlXPathGetSign. All the tests for negative zero can be replaced
with a test for negative or positive zero.
Simplify xmlXPathRoundFunction.
Remove Trio dependency.
This should work on IEEE 754 compliant implementations even if the C99
macros aren't available, but will likely break some ancient platforms.
If problems arise, my plan is to port the relevant trionan.c solution
to xpath.c. Note that non-compliant implementations are impossible
to fully support, anyway, since XPath requires IEEE 754.
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.
Fixes bug 784861.
On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer.
Switch to ptrdiff_t instead which should be the same size as a pointer
on every somewhat sane platform without requiring C99 types like
intptr_t.
Fixes bug 788312.
Thanks to J. Peter Mugaas for the report and initial patch.
Fix two bugs in xmlXPathNodeValHash which could lead to errors when
comparing nodesets to strings:
- Only use contents of text nodes to compute the hash for element nodes.
Comments, PIs, and other node types don't affect the string-value and
must be ignored.
- Reset `string` to NULL for node types other than text.
Reported by Aleksei on the mailing list:
https://mail.gnome.org/archives/xml/2017-September/msg00016.html
Move the calls to xmlXPathSetFrame and xmlXPathPopFrame around in
xmlXPathCompOpEvalPositionalPredicate to make sure that the context
object on the stack is actually protected. Otherwise, memory corruption
can occur when calling sloppily coded XPath extension functions.
Fixes bug 783160.
Commit c851970 removed a redundant error message if XPath evaluation
failed. This uncovered a case where an undefined XPath variable error
wasn't reported correctly.
Thanks to Petr Pisar for the report.
Fixes bug 787941.
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
ask conversion of code to ISO Latin 1 to avoid having the compiler assume
EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
Unix/Windows
Don't count leading zeros towards the fraction size limit. This allows
to parse numbers like
0.0000000000000000000000000000000000000000000000000000000001
which is the only standard-conformant way to represent such numbers, as
scientific notation isn't allowed in XPath 1.0. (It is allowed in XPath
2.0 and in libxml2 as an extension, though.)
Overall accuracy is still bad, see bug 783238.
Use the C library's floor and ceil functions. The old code was overly
complicated for no apparent reason and could result in undefined
behavior when handling NaNs (found with afl-fuzz and UBSan).
Fix wrong comment in xmlXPathRoundFunction. The implementation was
already following the spec and rounding half up.
When traversing the "preceding" axis from an attribute node, we must
first go up to the attribute's containing element. Otherwise, text
children of other attributes could be returned. This made it possible
to hit a code path in xmlXPathNextAncestor which contained another bug:
The attribute node was initialized with the context node instead of the
current node. Normally, this code path is only hit via
xmlXPathNextAncestorOrSelf in which case the current and context node
are the same.
The combination of the two bugs could result in an infinite loop, found
with libFuzzer.
Traversing the "following" and the "preceding" axis from namespace nodes
should be handled similarly. This wasn't supported at all previously.
Move the check for trailing characters from xmlXPathEval to
xmlXPathEvalExpr. Otherwise, a valid portion of a syntactically invalid
expression would be evaluated before returning an error.
Move cleanup of XPath stack to xmlXPathFreeParserContext. This avoids
memory leaks if valuePop fails in some error cases. Found with
libFuzzer and ASan.
Rework handling of the final XPath result object in
xmlXPathCompiledEvalInternal and xmlXPathEval to avoid useless error
messages.
The old code would invoke the broken xmlXPtrRangeToFunction. range-to
isn't really a function but a special kind of location step. Remove
this function and always handle range-to in the XPath code.
The old xmlXPtrRangeToFunction could also be abused to trigger a
use-after-free error with the potential for remote code execution.
Found with afl-fuzz.
Fixes CVE-2016-5131.
Make sure that xmlXPathNodeSetAddNs is called for namespace nodes when
matched with a namespace::node() step. This correctly sets the parent
of namespace nodes. Note that xmlXPathNodeSetAddNs must only be called
if working on the namespace axis. Otherwise, the context node is not
the parent of the namespace node and the standard XP_TEST_HIT macro
must be invoked. This explains the errors in the C14N tests that the
old TODO comment mentioned.
The NCName parser would allow any NameChar as start character. For
example, the following XPath expressions would compile:
self::-abc
self::0abc
self::.abc
If the context node is an attribute, the attribute itself is on the
descendant-or-self axis. The principal node type of this axis is element,
so the only node test that can return the attribute is "node()". In other
words, "@attr/descendant-or-self::node()" is equivalent to "@attr".
This matches the behavior of Saxon-CE.
The XPath engine tries to guarantee that every XPath function can pop
'nargs' non-NULL values off the stack. libxslt, for example, relies on
this assumption. But the check isn't thorough enough if there are errors
during the evaluation of arguments. This can lead to segfaults:
https://mail.gnome.org/archives/xslt/2013-December/msg00005.html
This commit makes the handling of function arguments more robust.
* Bail out early when evaluation of XPath function arguments fails.
* Make sure that there are 'nargs' arguments in the current call frame.
My attempt to optimize XPath expressions containing '//' caused a
regression reported in bug #695699. This commit disables the
optimization for expressions of the form '//foo[predicate]'.
This patch adds xmlXPathSetContextNode and xmlXPathNodeEval,
which make it easier to evaluation XPath expressions with a
context node other than the document root without poking about
inside the internals of the context.
This patch is compile-tested only, and is my first libxml2
contribution, so please go easy.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
looping 1000 time on an error stating that a nodeset has
grown out of control is useless, make sure we percolate
error up to the various loops and break when errors occurs
I use libxml xpath engine on quite large (and mostly "flat") xml files.
It seems that Shellsort, that is used in xmlXPathNodeSetSort is a
performance bottleneck for my case. I have read some posts about sorting
in libxml in the libxml archive, but I agree that qsort was not the way
to go. I experimented with Timsort instead and my results were good for
me. For about 10000 nodes, my test was about 5x faster with Timsort,
for 1000 nodes about 10% faster, for small data files, the difference
was not measurable.
* timsort.h: the algorithm, kept in a separate header
* xpath.c: plug in the new algorithm in xmlXPathNodeSetSort
* Makefile.am: add the header to the EXTRA_DIST
* doc/apibuild.py: avoid indexing the new header
When investigating the libxslt performance problem reported in bug
#657665, I found that '//' in XPath expressions can be very slow when
working on large subtrees.
One of the reasons is the seemingly quadratic time complexity of the
duplicate checks when merging result nodes. The other is a missed
optimization for expressions of the form
'descendant-or-self::node()/axis::test'. Since '//' is expanded to
'/descendant-or-self::node()/', this type of expression is quite common.
Depending on the axis of the expression following the
'descendant-or-self' step, the following replacements can be made:
from descendant-or-self::node()/child::test
to descendant::test
from descendant-or-self::node()/descendant::test
to descendant::test
from descendant-or-self::node()/self::test
to descendant-or-self::test
from descendant-or-self::node()/descendant-or-self::test
to descendant-or-self::test
'test' can be any kind of node test.
With these replacements the possibly huge result of
'descendant-or-self::node()' doesn't have to be stored temporarily, but
can be processsed in one pass. If the resulting nodeset is small, the
duplicate checks aren't a problem.
I found that there already is a function called
xmlXPathRewriteDOSExpression which performs this optimization for a very
limited set of cases. It employs a complicated iteration scheme for
rewritten expressions. AFAICS, this can be avoided by simply changing
the axis of the expression like described above.
With the attached patch against libxml2 and the files from bug #657665 I
got the following results.
Before:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 2m56.213s
user 2m56.123s
sys 0m0.080s
After:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 0m3.836s
user 0m3.764s
sys 0m0.060s
I also ran the libxml2 and libxslt test suites with the patch and
couldn't detect any breakage.
Nick
>From e0f5a8261760e4f257b90410be27657e984237c8 Mon Sep 17 00:00:00 2001
From: Nick Wellnhofer <wellnhofer@aevum.de>
Date: Sun, 19 Aug 2012 18:20:22 +0200
Subject: [PATCH] Optimizations for descendant-or-self::node()
Currently, the function xmlXPathRewriteDOSExpression optimizes expressions
of type '//child'. Instead of adding a 'rewriteType' and doing a compound
traversal, the same can be achieved simply by setting the axis of the node
test from 'child' to 'descendant'.
There are also many other cases that can be optimized similarly. This
commit augments xmlXPathRewriteDOSExpression to essentially rewrite the
following subexpressions:
- descendant-or-self::node()/child:: to descendant::
- descendant-or-self::node()/descendant:: to descendant::
- descendant-or-self::node()/self:: to descendant-or-self::
- descendant-or-self::node()/descendant-or-self:: to descendant-or-self::
Since the '//' shortcut in XPath is translated to
'/descendant-or-self::node()/', this greatly speeds up expressions using
'//' on large subtrees.
This adds some internal limitationson XPath expression complexity,
and limits at runtime like depth of the stack and maximum size
for nodeset.
* xpath.c: implement the above as well as the maximum Name lenght
The count was incremented before the allocation
and not fixed in case of failure
* xpath.c: corrects a few instances where the available count of some
structure is updated before we know the allocation actually
succeeds
This was introduced in the prevous fix, while preceding-sibling and
following sibling axis are empty for attributes and namespaces,
preceding and following axis should still work based on the parent
element. However the parent element is not available for a namespace
node, so we keep the axis empty in that case.
* SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c
relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c
xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations,
but this led to a few real bugs and some part not yet understood
(relaxng/interleave)
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
fix unused variables, or unneeded increments as well as a couple
of space issues
* runtest.c: check for NULL before calling unlink()
* include/wsockcompat.h win32/Makefile.bcb xpath.c: fixes for
Borland/CodeGear/Embarcadero compilers by Eric Zurcher
Daniel
svn path=/trunk/; revision=3822
* include/libxml/parser.h include/libxml/xmlwriter.h
include/libxml/relaxng.h include/libxml/xmlversion.h.in
include/libxml/xmlwin32version.h.in include/libxml/valid.h
include/libxml/xmlschemas.h include/libxml/xmlerror.h:
port patch from Marcus Meissner to add gcc checking for
printf like functions parameters, should fix#65068
* doc/apibuild.py doc/*: modified the script accordingly
and regenerated
* xpath.c xmlmemory.c threads.c: fix a few warnings
Daniel
svn path=/trunk/; revision=3813
* schematron.c xpath.c: applied a couple of patches from Martin
avoiding some leaks, fixinq QName checks in XPath, XPath debugging
and schematron code cleanups.
* python/tests/Makefile.am python/tests/xpathleak.py: add the
specific regression tests, just tweak it to avoid output by default
Daniel
svn path=/trunk/; revision=3791
* nanohttp.c: fixed problem on gzip streams (bug #438045)
* xpath.c: fixed minor spot of redundant code - no logic change.
svn path=/trunk/; revision=3616
* xpath.c: enhanced the coding for xmlXPathCastNumberToString
in order to produce the required number of significant digits
(bug #437179)
svn path=/trunk/; revision=3615
* xpath.c: fixed xmlXPathCmpNodes for incorrect result on certain
cases when comparing identical nodes (bug 415567) with patch
from Oleg Paraschenko
svn path=/trunk/; revision=3587
* xpath.c: Changed xmlXPathCollectAndTest() to use
xmlXPathNodeSetAddNs() when adding a ns-node in case of
NODE_TEST_TYPE (the ns-node was previously added plainly
to the list). Since for NODE_TEST_ALL and NODE_TEST_NAME
this specialized ns-addition function was already used,
I assume it was missed to be used with NODE_TEST_TYPE.
* xpath.c: Enhanced xmlXPathCompOpEvalToBoolean() to be also
usable outside predicate evaluation; the intention is to
use it via xmlXPathCompiledEvalToBoolean() for XSLT tests,
like in <xsl:if test="/foo">.
* xpath.c: Added xmlXPathCompiledEvalToBoolean() to the API and
adjusted/added xmlXPathRunEval(), xmlXPathRunStreamEval(),
xmlXPathCompOpEvalToBoolean(), xmlXPathNodeCollectAndTest()
to be aware of a boolean result request. The new function
is now used to evaluate predicates.
* xpath.c: Fixed an bug in xmlXPathCompExprAdd(): the newly
introduced field @rewriteType on xmlXPathStepOp was not
initialized to zero here; this could lead to the activation
of the axis rewrite code in xmlXPathNodeCollectAndTest() when
@rewriteType is randomly set to the value 1. A test
(hardcoding the intial value to 1) revealed that the
resulting incorrect behaviour is similar to the behaviour
as described by Arnold Hendriks on the mailing list; so I
hope that will fix the issue.
* xpath.c: Fixed an error in xmlXPathEvalExpr(), which
was introduced with the addition of the d-o-s rewrite
and made xpath.c unable to compile if XPATH_STREAMING
was not defined (reported by Kupriyanov Anatolij -
#345752). Fixed the check for d-o-s rewrite
to work on the correct XPath string, which is ctxt->base
and not comp->expr in this case.
* xpath.c: Fixed self-invented a segfault in xmlXPathCtxtCompile(),
when the expression was not valid and @comp was NULL and I
tried to do the d-o-s rewrite.
* xpath.c: Enabled the compound traversal again; I added a
check to use this only if the have an expression starting
with the document node; so in the case of "//foo", we
already know at compilation-time, that there will be only
1 initial context node. Added the rewrite also to
xmlXPathEvalExpr().
* xpath.c include/libxml/xpath.h runsuite.c:
Changed the name of the recently added public function
xmlXPathContextSetObjectCache() to
xmlXPathContextSetCache(); so a more generic one, in
case we decide to cache more things than only XPath
objects.
* xpath.c: Optimized xmlXPathNodeCollectAndTest() and
xmlXPathNodeCollectAndTestNth() to evaluate a compound
traversal of 2 axes when we have a "//foo" expression.
This is done with a rewrite of the XPath AST in
xmlXPathRewriteDOSExpression(); I added an additional field
to xmlXPathStepOp for this (but the field's name should be
changed). The mechanism: the embracing descendant-or-self
axis traversal (also optimized to return only nodes which
can hold elements), will produce context nodes for the
inner traversal of the child axis. This way we avoid a full
node-collecting traversal of the descendant-or-self axis.
Some tests indicate that this can reduce execution time of
"//foo" to 50%. Together with the XPath object cache this
all significantly speeds up libxslt.
* xpath.c: Enhanced xmlXPathNodeCollectAndTest() to avoid
recreation (if possible) of the node-set which is used to
collect the nodes in the current axis for the currect context
node. Especially for "//foo" this will decrease dramatically
the number of created node-sets, since for each node in the
result node-set of the evaluation of descendant-or-self::node()
a new temporary node-set was created. Added node iterator
xmlXPathNextChildElement() as a tiny optimization for
child::foo.
* xpath.c include/libxml/xpath.h: Added an XPath object cache.
It sits on an xmlXPathContext and need to be explicitely
activated (or deactivated again) with
xmlXPathContextSetObjectCache(). The cache consists of 5
lists for node-set, string, number, boolean and misc XPath
objects. Internally the xpath.c module will use object-
deposition and -acquisition functions which will try to reuse
as many XPath objects as possible, and fallback to normal
free/create behaviour if no cache is available or if the cache
is full.
* runsuite.c: Adjusted to deactivate the cache for XML Schema
tests if a cache-creation is turned on by default for the whole
library, e.g. for testing purposes of the cache. It is
deactivated here in order to avoid confusion of the memory leak
detection in runsuite.c.
* xpath.c: Removed a memcpy if xmlXPathNodeSetMerge(); it
seems we really need to walk the whole list, since those
nastly namespace nodes need to be added with
xmlXPathNodeSetDupNs(); thus a pure memcpy is not possible.
A flag on the node-set indicating if namespace nodes are in
the set would help here; this is the 3rd flag which would
be usefull with node-sets. The current flags I have in mind:
1) Is a node-set already sorted?
This would allow for rebust and optimizable sorting
behaviour.
2) Of what type are the nodes in the set (or of mixed type)?
This would allow for faster merging of node-sets.
3) Are namespace nodes in the set?
This would allow to skipp all the namespace node specific
special handling. Faster node-set merging if the first
set is empty; just memcpy the set.
* xpath.c: Optimization of count(): eliminated sorting
(see bug #165547). Optimization of XPATH_OP_FILTER if the
predicate is a [1] (disable with XP_OPTIMIZED_FILTER_FIRST if
it produces trouble). Tiny opt in xmlXPathNodeSetMerge().
* xpath.c: Substituted all remaining calls to xmlXPathCmpNodes()
for xmlXPathCmpNodesExt(). Tiny further enhancement of
xmlXPathCmpNodesExt(). Added additional checks in various code
parts to avoid calling sorting or merging functions if the
node-set(s) don't need them; i.e., if they are empty or contain
just one node.
* xpath.c: Optimized the comparison for non-element nodes
in xmlXPathCmpNodesExt(); the comparison is used for sorting
of node-sets. This enhancement is related to bug #165547.
There are other places where the old comparison function
xmlXPathCmpNodes() is still called, but I currently don't
know exactly what those calls are for; thus if they can be
substituted (if it makes sense) for the new function.
* xpath.c: Applied patch from Rob Richards, fixing a potential
memory leak in xmlXPathTryStreamCompile(), when a list of
namespaces was assigned to the XPath compilation context;
here a new namespace list was created and passed to
xmlPatterncompile(); but this list was not freed afterwards.
Additionally we avoid now in xmlXPathTryStreamCompile() to
compile the expression, if it has a colon - indicating
prefixed name tests - and no namespace list was given. The
streaming XPath mechanism needs a namespace list at
compilation time (unlike normal XPath, where we can bind
namespace names to prefixes at execution time).
* pattern.c: Enhanced to use a string dict for local-names,
ns-prefixes and and namespace-names.
Fixed xmlStreamPushInternal() not to use string-pointer
comparison if a dict is available; this won't work, since
one does not know it the given strings originate from the
same dict - and they normally don't do, since e.g.
namespaces are hold on xmlNs->href. I think this would be
worth an investigation: if we can add a @doc field to xmlNs
and put the @href in to a additionan namespace dict hold
in xmlDoc. Daniel will surely not like this idea :-) But
evaluation of tons of elements/attributes in namespaces
with xmlStrEqual() isn't the way we should go forever.
* runtest.c schematron.c testAutomata.c tree.c valid.c xinclude.c
xmlcatalog.c xmlreader.c xmlregexp.c xpath.c: end of first
pass on coverity reports.
Daniel
* pattern.c xpath.c include/libxml/pattern.h:
Fixed bug #322928, reported by Erich Schubert: The bug was
in pattern.c, which is used for a tiny subset of xpath
expression which can be evaluated in an optimized way.
The doc-node was never considered when evaluating "//"
expressions. Additionally, we fixed resolution
to nodes of any type in pattern.c; i.e. a "//." didn't work
yet, as it did select only element-nodes. Due to this
issue the pushing of nodes in xpath.c needed to be adjusted
as well.
* pattern.c xpath.c include/libxml/pattern.h: fixing yet another
pattern induced XPath bug #314282
* relaxng.c: reverted back last change it was seriously broken
Daniel
* xpath.c: removed a potentially uninitialized variable error
* python/generator.py: fixed a deprecation warning
* python/tests/tstLastError.py: silent the damn test when Okay !
Daniel
* HTMLparser.c SAX2.c encoding.c globals.c parser.c relaxng.c
runsuite.c runtest.c schematron.c testHTML.c testReader.c
testRegexp.c testSAX.c testThreads.c valid.c xinclude.c xmlIO.c
xmllint.c xmlmodule.c xmlschemas.c xpath.c xpointer.c: a lot of
small cleanups based on Linus' sparse check output.
Daniel
* pattern.c include/libxml/pattern.h: changed xmlPatterncompile
signature to pass an int and not an enum since it can generate
ABI compat troubles.
* include/libxml/schematron.h schematron.c: adding the new
schematron code, work in progress lots to be left and needing
testing
* include/libxml/xmlversion.h.in include/libxml/xmlwin32version.h.in
Makefile.am configure.in: integration of schematron into the
build
* xpath.c include/libxml/xpath.h: adding flags to control compilation
options right now just XML_XPATH_CHECKNS.
Daniel
* xpath.c: Changed the behaviour of xmlXPathEqualNodeSetFloat to
return TRUE if a nodeset with a numeric value of NaN is compared
for inequality with any numeric value (bug 309914).
* xstc/Makefile.am README README.tests Makefile.tests Makefile.am:
preparing to make testsuite releases along with code source releases
* gentest.py testapi.c: fixed a couple of problem introduced by
the new Schemas support for Readers
* xpath.c: fixed the XPath attribute:: bug #309580, #309864 in a crude
but simple way.
* xmlschemas.c include/libxml/tree.h: fixed a couple of problems
raised by the doc builder.
* doc/*: made rebuild
Daniel