mirror of
https://gitlab.gnome.org/GNOME/libxml2.git
synced 2024-12-24 21:33:51 +03:00
some cleanups extended the document to cover RelaxNG and tree operations
* relaxng.c: some cleanups * doc/xmlreader.html: extended the document to cover RelaxNG and tree operations * python/tests/Makefile.am python/tests/reader[46].py: added some xmlReader example/regression tests * result/relaxng/tutor*.err: updated the output of a number of tests Daniel
This commit is contained in:
parent
621636042b
commit
ac297930c2
@ -1,3 +1,12 @@
|
||||
Thu Apr 17 14:51:57 CEST 2003 Daniel Veillard <daniel@veillard.com>
|
||||
|
||||
* relaxng.c: some cleanups
|
||||
* doc/xmlreader.html: extended the document to cover RelaxNG and
|
||||
tree operations
|
||||
* python/tests/Makefile.am python/tests/reader[46].py: added some
|
||||
xmlReader example/regression tests
|
||||
* result/relaxng/tutor*.err: updated the output of a number of tests
|
||||
|
||||
Thu Apr 17 11:35:37 CEST 2003 Daniel Veillard <daniel@veillard.com>
|
||||
|
||||
* relaxng.c: valgrind pointed out an uninitialized variable error.
|
||||
|
@ -13,6 +13,8 @@ H3 {font-family: Verdana,Arial,Helvetica}
|
||||
A:link, A:visited, A:active { text-decoration: underline }-->
|
||||
|
||||
|
||||
|
||||
|
||||
</style>
|
||||
<title>Libxml2 XmlTextReader Interface tutorial</title>
|
||||
</head>
|
||||
@ -42,6 +44,9 @@ examples using both C and the Python bindings:</p>
|
||||
attributes</a></li>
|
||||
<li><a href="#Validating">Validating a document</a></li>
|
||||
<li><a href="#Entities">Entities substitution</a></li>
|
||||
<li><a href="#L1142">Relax-NG Validation</a></li>
|
||||
<li><a href="#Mixing">Mixing the reader and tree or XPath
|
||||
operations</a></li>
|
||||
</ul>
|
||||
|
||||
<p></p>
|
||||
@ -147,8 +152,7 @@ def streamFile(filename):
|
||||
ret = reader.Read()
|
||||
|
||||
if ret != 0:
|
||||
print "%s : failed to parse" % (filename)
|
||||
</pre>
|
||||
print "%s : failed to parse" % (filename)</pre>
|
||||
|
||||
<p>The only things worth adding are that the <a
|
||||
href="http://dotgnu.org/pnetlib-doc/System/Xml/XmlTextReader.html">xmlTextReader
|
||||
@ -390,9 +394,79 @@ the validation feature is just:</p>
|
||||
|
||||
<h2><a name="Entities">Entities substitution</a></h2>
|
||||
|
||||
<p>@@TODO@@</p>
|
||||
<p>By default the xmlReader will report entities as such and not replace them
|
||||
with their content. This default behaviour can however be overriden using:</p>
|
||||
|
||||
<p> </p>
|
||||
<p><code>reader.SetParserProp(libxml2.PARSER_SUBST_ENTITIES,1)</code></p>
|
||||
|
||||
<h2><a name="L1142">Relax-NG Validation</a></h2>
|
||||
|
||||
<p style="font-size: 10pt">Introduced in version 2.5.7</p>
|
||||
|
||||
<p>Libxml2 can now validate the document being read using the xmlReader using
|
||||
Relax-NG schemas. While the Relax NG validator can't always work in a
|
||||
streamable mode, only subsets which cannot be reduced to regular expressions
|
||||
need to have their subtree expanded for validation. In practice it means
|
||||
that, unless the schemas for the top level element content is not expressable
|
||||
as a regexp, only chunk of the document needs to be parsed while
|
||||
validating.</p>
|
||||
|
||||
<p>The steps to do so are:</p>
|
||||
<ul>
|
||||
<li>create a reader working on a document as usual</li>
|
||||
<li>before any call to read associate it to a Relax NG schemas, either the
|
||||
preparsed schemas or the URL to the schemas to use</li>
|
||||
<li>errors will be reported the usual way, and the validity status can be
|
||||
obtained using the IsValid() interface of the reader like for DTDs.</li>
|
||||
</ul>
|
||||
|
||||
<p>Example, assuming the reader has already being created and that the schema
|
||||
string contains the Relax-NG schemas:</p>
|
||||
|
||||
<p><code>rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))<br>
|
||||
rngs = rngp.relaxNGParse()<br>
|
||||
reader.RelaxNGSetSchema(rngs)<br>
|
||||
ret = reader.Read()<br>
|
||||
while ret == 1:<br>
|
||||
ret = reader.Read()<br>
|
||||
if ret != 0:<br>
|
||||
print "Error parsing the document"<br>
|
||||
if reader.IsValid() != 1:<br>
|
||||
print "Document failed to validate"</code><br>
|
||||
See <code>reader6.py</code> in the sources or documentation for a complete
|
||||
example.</p>
|
||||
|
||||
<h2><a name="Mixing">Mixing the reader and tree or XPath operations</a></h2>
|
||||
|
||||
<p style="font-size: 10pt">Introduced in version 2.5.7</p>
|
||||
|
||||
<p>While the reader is a streaming interface, its underlying implementation
|
||||
is based on the DOM builder of libxml2. As a result it is relatively simple
|
||||
to mix operations based on both models under some constraints. To do so the
|
||||
reader has an Expand() operation allowing to grow the subtree under the
|
||||
current node. It returns a pointer to a standard node wich can be manipulated
|
||||
in the usual ways. The node will get all its ancestors and the full subtree
|
||||
available. Usual operations like XPath queries can be used on that reduced
|
||||
view of the document. Here is an example extracted from reader5.py in the
|
||||
sources which extract and prints the bibliography for the "Dragon" compiler
|
||||
book from the XML 1.0 recommendation:</p>
|
||||
<pre>f = open('../../test/valid/REC-xml-19980210.xml')
|
||||
input = libxml2.inputBuffer(f)
|
||||
reader = input.newTextReader("REC")
|
||||
res=""
|
||||
while reader.Read():
|
||||
while reader.Name() == 'bibl':
|
||||
node = reader.Expand() # expand the subtree
|
||||
if node.xpathEval("@id = 'Aho'"): # use XPath on it
|
||||
res = res + node.serialize()
|
||||
if reader.Next() != 1: # skip the subtree
|
||||
break;</pre>
|
||||
|
||||
<p>Note however that the node instance returned by the Expand() call is only
|
||||
valid until the next Read() operation. The Expand() operation does not
|
||||
affects the Read() ones, however usually once processed the full subtree is
|
||||
not useful anymore, and the Next() operation allows to skip it completely and
|
||||
process to the successor or return 0 if the document end is reached. </p>
|
||||
|
||||
<p><a href="mailto:veillard@redhat.com">Daniel Veillard</a></p>
|
||||
|
||||
|
@ -23,6 +23,9 @@ PYTESTS= \
|
||||
reader.py \
|
||||
reader2.py \
|
||||
reader3.py \
|
||||
reader4.py \
|
||||
reader5.py \
|
||||
reader6.py \
|
||||
ctxterror.py\
|
||||
readererr.py\
|
||||
relaxng.py
|
||||
|
45
python/tests/reader4.py
Executable file
45
python/tests/reader4.py
Executable file
@ -0,0 +1,45 @@
|
||||
#!/usr/bin/python -u
|
||||
#
|
||||
# this tests the basic APIs of the XmlTextReader interface
|
||||
#
|
||||
import libxml2
|
||||
import StringIO
|
||||
import sys
|
||||
|
||||
# Memory debug specific
|
||||
libxml2.debugMemory(1)
|
||||
|
||||
def tst_reader(s):
|
||||
f = StringIO.StringIO(s)
|
||||
input = libxml2.inputBuffer(f)
|
||||
reader = input.newTextReader("tst")
|
||||
res = ""
|
||||
while reader.Read():
|
||||
res=res + "%s (%s) [%s] %d\n" % (reader.NodeType(),reader.Name(),
|
||||
reader.Value(), reader.IsEmptyElement())
|
||||
if reader.NodeType() == 1: # Element
|
||||
while reader.MoveToNextAttribute():
|
||||
res = res + "-- %s (%s) [%s]\n" % (reader.NodeType(),
|
||||
reader.Name(),reader.Value())
|
||||
return res
|
||||
|
||||
expect="""1 (test) [None] 0
|
||||
1 (b) [None] 1
|
||||
1 (c) [None] 1
|
||||
15 (test) [None] 0
|
||||
"""
|
||||
|
||||
res = tst_reader("""<test><b/><c/></test>""")
|
||||
|
||||
if res != expect:
|
||||
print "Did not get the expected error message:"
|
||||
print res
|
||||
sys.exit(1)
|
||||
|
||||
# Memory debug specific
|
||||
libxml2.cleanupParser()
|
||||
if libxml2.debugMemory(1) == 0:
|
||||
print "OK"
|
||||
else:
|
||||
print "Memory leak %d bytes" % (libxml2.debugMemory(1))
|
||||
libxml2.dumpMemory()
|
118
python/tests/reader6.py
Executable file
118
python/tests/reader6.py
Executable file
@ -0,0 +1,118 @@
|
||||
#!/usr/bin/python -u
|
||||
#
|
||||
# this tests the entities substitutions with the XmlTextReader interface
|
||||
#
|
||||
import sys
|
||||
import StringIO
|
||||
import libxml2
|
||||
|
||||
schema="""<element name="foo" xmlns="http://relaxng.org/ns/structure/1.0"
|
||||
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
|
||||
<oneOrMore>
|
||||
<element name="label">
|
||||
<text/>
|
||||
</element>
|
||||
<optional>
|
||||
<element name="opt">
|
||||
<empty/>
|
||||
</element>
|
||||
</optional>
|
||||
<element name="item">
|
||||
<data type="byte"/>
|
||||
</element>
|
||||
</oneOrMore>
|
||||
</element>
|
||||
"""
|
||||
# Memory debug specific
|
||||
libxml2.debugMemory(1)
|
||||
|
||||
#
|
||||
# Parse the Relax NG Schemas
|
||||
#
|
||||
rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))
|
||||
rngs = rngp.relaxNGParse()
|
||||
del rngp
|
||||
|
||||
#
|
||||
# Parse and validate the correct document
|
||||
#
|
||||
docstr="""<foo>
|
||||
<label>some text</label>
|
||||
<item>100</item>
|
||||
</foo>"""
|
||||
|
||||
f = StringIO.StringIO(docstr)
|
||||
input = libxml2.inputBuffer(f)
|
||||
reader = input.newTextReader("correct")
|
||||
reader.RelaxNGSetSchema(rngs)
|
||||
ret = reader.Read()
|
||||
while ret == 1:
|
||||
ret = reader.Read()
|
||||
|
||||
if ret != 0:
|
||||
print "Error parsing the document"
|
||||
sys.exit(1)
|
||||
|
||||
if reader.IsValid() != 1:
|
||||
print "Document failed to validate"
|
||||
sys.exit(1)
|
||||
|
||||
#
|
||||
# Parse and validate the incorrect document
|
||||
#
|
||||
docstr="""<foo>
|
||||
<label>some text</label>
|
||||
<item>1000</item>
|
||||
</foo>"""
|
||||
|
||||
err=""
|
||||
expect="""RNG validity error: file error line 3 element text
|
||||
Type byte doesn't allow value '1000'
|
||||
RNG validity error: file error line 3 element text
|
||||
Error validating datatype byte
|
||||
RNG validity error: file error line 3 element text
|
||||
Element item failed to validate content
|
||||
"""
|
||||
|
||||
def callback(ctx, str):
|
||||
global err
|
||||
err = err + "%s" % (str)
|
||||
libxml2.registerErrorHandler(callback, "")
|
||||
|
||||
f = StringIO.StringIO(docstr)
|
||||
input = libxml2.inputBuffer(f)
|
||||
reader = input.newTextReader("error")
|
||||
reader.RelaxNGSetSchema(rngs)
|
||||
ret = reader.Read()
|
||||
while ret == 1:
|
||||
ret = reader.Read()
|
||||
|
||||
if ret != 0:
|
||||
print "Error parsing the document"
|
||||
sys.exit(1)
|
||||
|
||||
if reader.IsValid() != 0:
|
||||
print "Document failed to detect the validation error"
|
||||
sys.exit(1)
|
||||
|
||||
if err != expect:
|
||||
print "Did not get the expected error message:"
|
||||
print err
|
||||
sys.exit(1)
|
||||
|
||||
#
|
||||
# cleanup
|
||||
#
|
||||
del f
|
||||
del input
|
||||
del reader
|
||||
del rngs
|
||||
libxml2.relaxNGCleanupTypes()
|
||||
|
||||
# Memory debug specific
|
||||
libxml2.cleanupParser()
|
||||
if libxml2.debugMemory(1) == 0:
|
||||
print "OK"
|
||||
else:
|
||||
print "Memory leak %d bytes" % (libxml2.debugMemory(1))
|
||||
libxml2.dumpMemory()
|
43
relaxng.c
43
relaxng.c
@ -8,11 +8,9 @@
|
||||
|
||||
/**
|
||||
* TODO:
|
||||
* - error reporting
|
||||
* - handle namespace declarations as attributes.
|
||||
* - add support for DTD compatibility spec
|
||||
* http://www.oasis-open.org/committees/relax-ng/compatibility-20011203.html
|
||||
* - report better mem allocations at runtime and abort immediately.
|
||||
* - report better mem allocations pbms at runtime and abort immediately.
|
||||
*/
|
||||
|
||||
#define IN_LIBXML
|
||||
@ -836,7 +834,6 @@ xmlRelaxNGFreeDefine(xmlRelaxNGDefinePtr define)
|
||||
* @size: the default size for the container
|
||||
*
|
||||
* Allocate a new RelaxNG validation state container
|
||||
* TODO: keep a pool in the ctxt
|
||||
*
|
||||
* Returns the newly allocated structure or NULL in case or error
|
||||
*/
|
||||
@ -1989,7 +1986,7 @@ xmlRelaxNGGetErrorString(xmlRelaxNGValidErr err, const xmlChar *arg1,
|
||||
case XML_RELAXNG_ERR_EXTRADATA:
|
||||
return(xmlCharStrdup("Extra data in the document"));
|
||||
default:
|
||||
TODO
|
||||
return(xmlCharStrdup("Unknown error !"));
|
||||
}
|
||||
if (msg[0] == 0) {
|
||||
snprintf(msg, 1000, "Unknown error code %d", err);
|
||||
@ -2279,12 +2276,6 @@ xmlRelaxNGSchemaTypeCheck(void *data ATTRIBUTE_UNUSED,
|
||||
xmlSchemaTypePtr typ;
|
||||
int ret;
|
||||
|
||||
/*
|
||||
* TODO: the type should be cached ab provided back, interface subject
|
||||
* to changes.
|
||||
* TODO: handle facets, may require an additional interface and keep
|
||||
* the value returned from the validation.
|
||||
*/
|
||||
if ((type == NULL) || (value == NULL))
|
||||
return(-1);
|
||||
typ = xmlSchemaGetPredefinedType(type,
|
||||
@ -2956,9 +2947,9 @@ xmlRelaxNGCompile(xmlRelaxNGParserCtxtPtr ctxt, xmlRelaxNGDefinePtr def) {
|
||||
case XML_RELAXNG_LIST:
|
||||
case XML_RELAXNG_PARAM:
|
||||
case XML_RELAXNG_VALUE:
|
||||
TODO /* This should not happen and generate an internal error */
|
||||
printf("trying to compile %s\n", xmlRelaxNGDefName(def));
|
||||
|
||||
/* This should not happen and generate an internal error */
|
||||
fprintf(stderr, "RNG internal error trying to compile %s\n",
|
||||
xmlRelaxNGDefName(def));
|
||||
break;
|
||||
}
|
||||
return(ret);
|
||||
@ -3302,7 +3293,6 @@ xmlRelaxNGParseValue(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) {
|
||||
}
|
||||
}
|
||||
}
|
||||
/* TODO check ahead of time that the value is okay per the type */
|
||||
return(def);
|
||||
}
|
||||
|
||||
@ -4878,10 +4868,9 @@ xmlRelaxNGParseAttribute(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) {
|
||||
ctxt->nbErrors++;
|
||||
break;
|
||||
case XML_RELAXNG_NOOP:
|
||||
TODO
|
||||
if (ctxt->error != NULL)
|
||||
ctxt->error(ctxt->userData,
|
||||
"Internal error, noop found\n");
|
||||
"RNG Internal error, noop found in attribute\n");
|
||||
ctxt->nbErrors++;
|
||||
break;
|
||||
}
|
||||
@ -5199,16 +5188,27 @@ xmlRelaxNGParseElement(xmlRelaxNGParserCtxtPtr ctxt, xmlNodePtr node) {
|
||||
ret->attrs = cur;
|
||||
break;
|
||||
case XML_RELAXNG_START:
|
||||
if (ctxt->error != NULL)
|
||||
ctxt->error(ctxt->userData,
|
||||
"RNG Internal error, start found in element\n");
|
||||
ctxt->nbErrors++;
|
||||
break;
|
||||
case XML_RELAXNG_PARAM:
|
||||
if (ctxt->error != NULL)
|
||||
ctxt->error(ctxt->userData,
|
||||
"RNG Internal error, param found in element\n");
|
||||
ctxt->nbErrors++;
|
||||
break;
|
||||
case XML_RELAXNG_EXCEPT:
|
||||
TODO
|
||||
if (ctxt->error != NULL)
|
||||
ctxt->error(ctxt->userData,
|
||||
"RNG Internal error, except found in element\n");
|
||||
ctxt->nbErrors++;
|
||||
break;
|
||||
case XML_RELAXNG_NOOP:
|
||||
TODO
|
||||
if (ctxt->error != NULL)
|
||||
ctxt->error(ctxt->userData,
|
||||
"Internal error, noop found\n");
|
||||
"RNG Internal error, noop found in element\n");
|
||||
ctxt->nbErrors++;
|
||||
break;
|
||||
}
|
||||
@ -5438,9 +5438,6 @@ xmlRelaxNGCheckReference(xmlRelaxNGDefinePtr ref,
|
||||
name);
|
||||
ctxt->nbErrors++;
|
||||
}
|
||||
/*
|
||||
* TODO: make a closure and verify there is no loop !
|
||||
*/
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -1,2 +1,2 @@
|
||||
RNG validity error: file ./test/relaxng/tutor10_7_3.xml line 2 element card
|
||||
Element addressBook has extra content: card
|
||||
Element card failed to validate attributes
|
||||
|
@ -1,2 +1,2 @@
|
||||
RNG validity error: file ./test/relaxng/tutor10_8_3.xml line 2 element card
|
||||
Element addressBook has extra content: card
|
||||
Element card failed to validate attributes
|
||||
|
@ -1,4 +1,2 @@
|
||||
RNG validity error: file ./test/relaxng/tutor3_2_1.xml line 1 element email
|
||||
Expecting element name, got email
|
||||
RNG validity error: file ./test/relaxng/tutor3_2_1.xml line 1 element email
|
||||
Element card failed to validate content
|
||||
Did not expect element email there
|
||||
|
@ -1,2 +1,4 @@
|
||||
RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element card
|
||||
Element addressBook has extra content: card
|
||||
RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element email
|
||||
Expecting element name, got email
|
||||
RNG validity error: file ./test/relaxng/tutor3_5_2.xml line 2 element email
|
||||
Element card failed to validate content
|
||||
|
@ -1,2 +1,4 @@
|
||||
RNG validity error: file ./test/relaxng/tutor9_5_2.xml line 2 element card
|
||||
Element addressBook has extra content: card
|
||||
Invalid sequence in interleave
|
||||
RNG validity error: file ./test/relaxng/tutor9_5_2.xml line 2 element card
|
||||
Element card failed to validate attributes
|
||||
|
@ -1,2 +1,2 @@
|
||||
RNG validity error: file ./test/relaxng/tutor9_5_3.xml line 2 element card
|
||||
Element addressBook has extra content: card
|
||||
Invalid attribute error for element card
|
||||
|
@ -1,2 +1,2 @@
|
||||
RNG validity error: file ./test/relaxng/tutor9_6_2.xml line 2 element card
|
||||
Element addressBook has extra content: card
|
||||
Element card failed to validate attributes
|
||||
|
@ -1,2 +1,2 @@
|
||||
RNG validity error: file ./test/relaxng/tutor9_6_3.xml line 2 element card
|
||||
Element addressBook has extra content: card
|
||||
Invalid attribute error for element card
|
||||
|
Loading…
Reference in New Issue
Block a user