IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
1. Setting entity loader does not increment the refcount on the Python object
passed in. This works only if the object is not deleted. For example, the
following code results in segmentation fault in Python interpreter when
attempting to process any document:
[[[
def register_entity_loader():
def entity_loader(URL, ID, ctxt):
...
libxml2.setEntityLoader(entity_loader
register_entity_loader()
]]]
2. setEntityLoader() does not verify if the passed object is callable. If it
is not, current implementation attempts to call it anyway and failing that,
silently moves on to default entity loader. Attached patch makes
setEntityLoader raise ValueError exception if non-callable object is
passed.
3. In debug mode, pythonExternalEntityLoader() outputs the result object to
stderr, while the messages before and after the object (description + newline)
go to stdout. Attached patch makes them all go to stdout.
It is possible to make xmlIO handle any protocol by means of
xmlRegisterInputCallback(). However, that function is currently only
available in C API. So, the natural solution seems to be implementing Python
bindings for the xmlRegisterInputCallback.
* python/generator.py: skip xmlPopInputCallbacks
* python/libxml.c python/libxml.py python/libxml_wrap.h: implement the
wrappers
* python/tests/input_callback.py python/tests/Makefile.am: also add a test case
I noticed another issue with Python bindings of libxml: the access methods do
not cast the pointers to specific classes such as xmlDtd, xmlEntityDecl, etc.
For example, with the following document:
<?xml version="1.0"?>
<!DOCTYPE root [<!ELEMENT root EMPTY>]>
<root/>
the following script:
import libxml2
doc = libxml2.readFile("c.xml", None, libxml2.XML_PARSE_DTDLOAD)
print repr(doc.children)
prints:
<xmlNode (root) object at 0xb74963ec>
With properly cast nodes, it outputs the following:
<xmlDtd (root) object at 0xb746352c>
The latter object (xmlDtd) enables one to use DTD-specific methods such as
debugDumpDTD(), copyDTD(), and so on.
If entities expansion in the XML parser is asked for,
it is possble to craft relatively small input document leading
to excessive on-the-fly content generation.
This patch accounts for those replacement and stop parsing
after a given threshold. it can be bypassed as usual with the
HUGE parser option.
https://bugzilla.gnome.org/show_bug.cgi?id=692915
the new set of converting functions tried to limit the encoding
conversion of the raw buffer to the consumption one to work in
a more progressive fashion. Unfortunately this was bad for
performances and led to errors on progressive parsing when
a very large chunk was close to the end of the document. Fix
the new internal function and switch back to the old way of
converting. Fix another bug in the process.
https://bugzilla.gnome.org/show_bug.cgi?id=690202
Buffer overflow errors originating from xmlBufGetInputBase in 2.9.0
The pointers from the context input were not properly reset after
that call which can do reallocations.
Otherwise, direct calls to xmlFree() etc. from the application will
use a different set of allocation functions to what was used to allocate
the memory internally.
Building 2.9.0 on MSVC7.1 was failing
This is because HAVE_CONFIG_H is not #defined
The patch addresses the above, adds testrecurse.exe and the
standard "make check" suite of tests to the MSVC makefile, and also
fixes the following (MSVC7.1) warnings:
buf.c(674) : warning C4028: formal parameter 1 different from
declaration
libxml2\timsort.h(71) : warning C4028: formal parameter 1 different from
declaration
cannot compile libxml2-2.9.0 using studio 12.1 compiler on solaris 10
I.M.O. structure initializer (as PTHREAD_ONCE_INIT) cannot be used in
a structure assignment anyway
For https://bugzilla.gnome.org/show_bug.cgi?id=683933
rand_seed should be a static variable in dict.c
We ran into a problem with another library that exports rand_seed as a
function. Combined with 2.7.8 this was not a problem but later versions
have this problem.
For https://bugzilla.gnome.org/show_bug.cgi?id=681822
Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
are removed from a HTML document, for example:
<html>
<head>
<title>This is a test.</title>
</head>
<body>
<p>This is a test.</p>
</body>
</html>
is read as:
<html><head><title>This is a test.</title></head><body>
<p>This is a test.</p>
</body></html>
This changes the default behaviour but the old behaviour is available
as expected when using the parser flag HTML_PARSE_NOBLANKS
Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>
* HTMLparser.c: change various places in the parser where ignorable_space
SAX callback was called without checking for the parser flag preference
* xmllint.c: make sure we use the new flag even for HTML parsing
* result/HTML/*: this modifies the output of a number of tests
configure.am:
* Explicitly disallow --enable-rebuild-docs when builddir != srcdir, per
what you said about needing to build docs with an in-source build
doc/Makefile.am:
* Ensure that xmlversion.h is in the source tree before running
apibuild.py, to avoid generating an incomplete libxml2-api.xml
* Update the .PHONY target (forgot to do this earlier)
doc/devhelp/Makefile.am:
* Wrap the doc-generating rule in an "if REBUILD_DOCS" conditional so it
doesn't cause trouble for regular users
* Added a handy-dandy "rebuild" target
doc/examples/index.py:
* NOTE: You need to run this script to regenerate the files it creates,
and then commit the newly-updated files! The generated files currently
in git master (e.g. doc/examples/Makefile.am) are out of date even
before this patch!
* index.html really needs to be in EXTRA_DIST
* Wrap the doc-generating rules in an "if REBUILD_DOCS" conditional,
because they shouldn't be active otherwise
so we've got this patch to libxml2 2.7.6 in the LibreOffice code base,
inherited from OOo. it fixes a definite problem, which is that Windows
has a rather low maximum path length restriction, and there is a special
trick on NT whereby path names can be prefixed with "\\?\", in which
case the maximum length is 32k, which ought to be sufficient even for
bloated office suites :)
I'll attach the patch to the xmlCanonicPath function. note that i
didn't write this and am by no means an expert on either Microsoftean
platforms or libxml so maybe it's not the best way to do it.
looping 1000 time on an error stating that a nodeset has
grown out of control is useless, make sure we percolate
error up to the various loops and break when errors occurs
Handle special cases of &{...} constructs as hinted in the spec
http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
While xmlCleanupParser() should not be used unless complete control
is insured over the programe making sure libxml2 is not in use anywhere
It should still be usable, and allow a sequence of
xmlInitParser();
xmlCleanupParser();
calls if needed, the problem is that the thread key wasn't reallocated
on subsequent xmlinitParser() calls leading to corruption of pthread
keys used by the program.
* threads.c: make sure xmlCleanupParser() reset the pthread_once()
global variable driving thread key allocation.
Related to https://bugs.launchpad.net/lxml/+bug/502959
Basically the core of the issue is that if an entity references another
entity, then in case we are replacing entities content, we should always
do so by copying the referenced content as long as the reference is
done within the entity. Otherwise, if for some reason there is a later
parsing error that entity content may be freed.
Complex scenario exposed by command:
thinkpad:~/XML/diveintopython-5.4/xml -> valgrind --db-attach=yes
../../xmllint --loaddtd --noout --noent diveintopython.xml
Document references &a;
a references &b;
we references b content directly in by linking in the a content
a has an error further down
we free a, freeing the chunk from b
Document references &b; after &a;
we try to copy b content, but it was freed already => segfault
* parser.c: never reference directly entity content without copying if
we aren't in the document main entity