1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-03-11 12:58:16 +03:00
libxml2/doc/catalog.html
Daniel Veillard e7ead2d237 added documentation about Catalog support, misses an API description
* doc/catalog.html doc/xml.html: added documentation about
  Catalog support, misses an API description
* doc/html/*: reextracted the API pages
Daniel
2001-08-22 23:44:09 +00:00

316 lines
14 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Libxml Catalog support</title>
<meta name="GENERATOR" content="amaya V5.0">
<meta http-equiv="Content-Type" content="text/html">
</head>
<body bgcolor="#ffffff">
<h1 align="center">Libxml Catalog support</h1>
<p>Location: <a
href="http://xmlsoft.org/catalog.html">http://xmlsoft.org/catalog.html</a></p>
<p>Libxml home page: <a href="http://xmlsoft.org/">http://xmlsoft.org/</a></p>
<p>Mailing-list archive: <a
href="http://mail.gnome.org/archives/xml/">http://mail.gnome.org/archives/xml/</a></p>
<p>Version: $Revision:$</p>
<p>Table of Content:</p>
<ol>
<li><a href="#General">General overview</a></li>
<li><a href="#definition">The definition</a></li>
<li><a href="#Simple">Using catalogs</a></li>
<li><a href="#Some">Some examples</a></li>
<li><a href="#reference">How to tune catalog usage</a></li>
<li><a href="#validate">How to debug catalog processing</a></li>
<li><a href="#Declaring">How to create and maintain catalogs</a></li>
<li><a href="#implemento">The implementor corner quick review of the
API</a></li>
<li><a href="#Other">Other resources</a></li>
</ol>
<h2><a name="General">General overview</a></h2>
<p>What is a catalog ? Basically it's a lookup mechanism which is used when
an entity (a file or a remote resource) reference another entity. The catalog
lookup is inserted between the moment the reference is recognized by the
software (XML parser, stylesheet processing, or even images referenced for
inclusion in a rendering) and the time where loading that resource is
actually started. </p>
<p>It is basically used for 3 things:</p>
<ul>
<li>mapping from "logical" names, the public identifiers and a more
concrete name usable for download (and URI). For example it can associate
the logical name
<p>"-//OASIS//DTD DocBook XML V4.1.2//EN" </p>
<p>of the DocBook 4.1.2 XML DTD with the actual URL where it can be
downloaded</p>
<p>http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd </p>
</li>
<li>remapping from a given URL to another one, like an HTTP indirection
saying that
<p>"http://www.oasis-open.org/committes/tr.xsl"</p>
<p>should really be looked at</p>
<p>"http://www.oasis-open.org/committes/entity/stylesheets/base/tr.xsl"
</p>
</li>
<li>providing a local cache mechanism allowing to load the entities
associated to public identifiers or remote resources, this is a really
important feature for any significant deployment of XML or SGML since it
allows to avoid the aleas and delays associated to fetching remore
resources.</li>
</ul>
<h2><a name="definition">The definitions</a></h2>
<p>Libxml, as of 2.4.3 implements 2 kind of catalogs:</p>
<ul>
<li>the older SGML catalogs, the official spec is SGML Open Technical
Resolution TR9401:1997, but is better understood by reading <a
href="http://www.jclark.com/sp/catalog.htm">the SP Catalog page</a> from
James Clark. This is relatively old and not the preferred mode of
operation of libxml.</li>
<li><a href="http://www.oasis-open.org/committees/entity/spec.html">XML
Catalogs</a> is far more flexible, more recent, uses an XML syntax and
should scale quite better. This is the default option of libxml.</li>
</ul>
<p></p>
<h2><a name="Simple">Using catalog</a></h2>
<p>In a normal environment libxml will by default check the presence of a
catalog in /etc/xml/catalog, and assuming it has been correctly populated,
the processing is completely transparent to the document user. To take a
concrete example, suppose you are authoring a DocBook document, this one
starts with the following DOCTYPE definition:</p>
<pre>&lt;?xml version='1.0'?&gt;
&lt;!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN"
"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd"&gt;
</pre>
<p>When validating the document with libxml, the catalog will be
automatically consulted to lookup the public identifier "-//Norman Walsh//DTD
DocBk XML V3.1.4//EN" and the system identifier
"http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd", and if these entities have
been installed on your system and the catalogs actually point to them, libxml
will fetch them from the local disk.</p>
<p style="font-size: 10pt"><strong>Note</strong>: Really don't use this
DOCTYPE example it's a really old version, but is fine as an example.</p>
<p>Libxml will check the catalog each time that it is requested to load an
entity, this include DTD, external parsed entities, stylesheets, etc ... If
your system is correctly configured all the authoring phase and processing
should use only local files, even if your document stay portable because it
uses the canonical public and system ID, referencing the remote document.</p>
<h2><a name="Some">Some examples:</a></h2>
<p>Here is a couple of fragments from XML Catalogs used in libxml early
regression tests in <code>test/catalogs</code> :</p>
<pre>&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
&lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
...</pre>
<p>This is the beginning of a catalog for DocBook 4.1.2, XML Catalogs are
written in XML, there is a specific namespace for catalog elements
"urn:oasis:names:tc:entity:xmlns:xml:catalog". The first entry in this
catalog is a <code>public</code> mapping it allows to associate a Public
Identifier with an URI. </p>
<pre>...
&lt;rewriteSystem systemIdStartString="http://www.oasis-open.org/docbook/"
rewritePrefix="file:///usr/share/xml/docbook/"/&gt;
...</pre>
<p>A <code>rewriteSystem</code> is a very powerful instruction, it says that
any URI starting with a given prefix should be looked at another URI
constructed by replacing the prefix with an new one. In effect this acts like
a cache system for a full area of the Web. In practice it is extremely useful
with a file prefix if you have installed a copy of those resources on your
local system. </p>
<pre>...
&lt;delegatePublic publicIdStartString="-//OASIS//DTD XML Catalog //"
catalog="file:///usr/share/xml/docbook.xml"/&gt;
&lt;delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook XML"
catalog="file:///usr/share/xml/docbook.xml"/&gt;
&lt;delegatePublic publicIdStartString="-//OASIS//DTD DocBook XML"
catalog="file:///usr/share/xml/docbook.xml"/&gt;
&lt;delegateSystem systemIdStartString="http://www.oasis-open.org/docbook/"
catalog="file:///usr/share/xml/docbook.xml"/&gt;
&lt;delegateURI uriStartString="http://www.oasis-open.org/docbook/"
catalog="file:///usr/share/xml/docbook.xml"/&gt;
...</pre>
<p>Delegation is the core features which allows to build a tree of catalogs,
easier to maintain than a single catalog, based on Public Identifier, System
Identifier or URI prefixes it instruct the catalog software to lookup entries
in another resource. This feature allow to build hierarchies of catalogs, the
set of entries presented should be sufficient to redirect the resolution of
all DocBook references to the specific catalog in
<code>/usr/share/xml/docbook.xml</code> this one in turn could delegate all
references for DocBook 4.2.1 to a specific catalog installed at the same time
as the DocBook resources on the local machine.</p>
<h2><a name="reference">How to tune catalog usage:</a></h2>
<p>The user can change the default catalog behaviour by redirecting queries
to its own set of catalogs, this can be done by setting the
<code>XML_CATALOG_FILES</code> environment variable to a list of catalogs, an
empty one should desactivate loading the default
<code>/etc/xml/catalog</code> default catalog.</p>
<p>@@More options are likely to be provided in the future@@</p>
<h2><a name="validate">How to debug catalog processing:</a></h2>
<p>Setting up the <code>XML_DEBUG_CATALOG</code> environment variable will
make libxml output debugging informations for each catalog operations, for
example:</p>
<pre>orchis:~/XML -&gt; xmllint --memory --noout test/ent2
warning: failed to load external entity "title.xml"
orchis:~/XML -&gt; export XML_DEBUG_CATALOG=
orchis:~/XML -&gt; xmllint --memory --noout test/ent2
Failed to parse catalog /etc/xml/catalog
Failed to parse catalog /etc/xml/catalog
warning: failed to load external entity "title.xml"
Catalogs cleanup
orchis:~/XML -&gt; </pre>
<p>The test/ent2 references an entity, running the parser from memory makes
the base URI unavailable and the the "title.xml" entity cannot be loaded.
Setting up the debug environment variable allows to detect that an attempt is
made to load the <code>/etc/xml/catalog</code> but since it's not present the
resolution fails. </p>
<p>But the most advanced way to debug XML catalog processing is to use the
<strong>xmlcatalog</strong> command shipped with libxml2, it allows to load
catalogs and make resolution queries to see what is going on. This is also
used for the regression tests:</p>
<pre>orchis:~/XML -&gt; ./xmlcatalog test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
orchis:~/XML -&gt; </pre>
<p>For debugging what is going on, adding one -v flags increase the verbosity
level to indicate the processing done (adding a second flag also indicate
what elements are recognized at parsing):</p>
<pre>orchis:~/XML -&gt; ./xmlcatalog -v test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
Parsing catalog test/catalogs/docbook.xml's content
Found public match -//OASIS//DTD DocBook XML V4.1.2//EN
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
Catalogs cleanup
orchis:~/XML -&gt; </pre>
<p>A shell interface is also available to debug and process multiple queries
(and for regression tests):</p>
<pre>orchis:~/XML -&gt; ./xmlcatalog -shell test/catalogs/docbook.xml "-//OASIS//DTD DocBook XML V4.1.2//EN"
&gt; help
Commands available:
public PublicID: make a PUBLIC identifier lookup
system SystemID: make a SYSTEM identifier lookup
resolve PublicID SystemID: do a full resolver lookup
add 'type' 'orig' 'replace' : add an entry
del 'values' : remove values
dump: print the current catalog state
debug: increase the verbosity level
quiet: decrease the verbosity level
exit: quit the shell
&gt; public "-//OASIS//DTD DocBook XML V4.1.2//EN"
http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd
&gt; quit
orchis:~/XML -&gt; </pre>
<p>This should be sufficient for most debugging purpose, this was actually
used heavilly to debug the XML Catalog implementation itself.</p>
<h2><a name="Declaring">How to create and maintain</a> catalogs:</h2>
<p>Basically XML Catalogs are XML files, you can either use XML tools to
manage them or use <strong>xmlcatalog</strong> for this. The basic step is
to create a catalog the -create option provide this facility:</p>
<pre>orchis:~/XML -&gt; ./xmlcatalog --create tst.xml
&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
orchis:~/XML -&gt; </pre>
<p>By default xmlcatalog does not overwrite the original catalog and save the
result on the standard output, this can be overrident using the -noout
option. The <code>-add</code> command allows to add entries in the
catalog:</p>
<pre>orchis:~/XML -&gt; ./xmlcatalog --noout --create --add "public" "-//OASIS//DTD DocBook XML V4.1.2//EN" http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd tst.xml
orchis:~/XML -&gt; cat tst.xml
&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
&lt;public publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
uri="http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"/&gt;
&lt;/catalog&gt;
orchis:~/XML -&gt; </pre>
<p>The <code>-add</code> option will always take 3 parameters even if some of
the XML Catalog constructs (like nextCatalog) will have only a single
argument, just pass a third empty string, it will be ignored.</p>
<p>Similary the <code>-del</code> option remove matching entries from the
catalog:</p>
<pre>orchis:~/XML -&gt; ./xmlcatalog --del "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" tst.xml
&lt;?xml version="1.0"?&gt;
&lt;!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"&gt;
&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"/&gt;
orchis:~/XML -&gt; </pre>
<p>The catalog is now empty. Note that the maching of <code>-del</code> is
exact and would have worked in a similar fashion with the Public ID
string.</p>
<p> This is rudimentary but should be sufficient to manage a not too complex
catalog tree of resources. </p>
<h2><a name="implemento">The implementor corner quick review of the
API:</a></h2>
<p>@@TODO@@</p>
<h2><a name="Other">Other resources</a></h2>
<p>The XML Catalog specification is relatively recent so there isn't much
litterature to point at:</p>
<ul>
<li>You can find an good rant from Norm Walsh about <a
href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
need for catalogs</a>, it provides a lot of context informations even if
I don't agree with everything presented.</li>
<li>An <a href="http://home.ccil.org/~cowan/XML/XCatalog.html">old XML
catalog proposal</a> from John Cowan</li>
<li>The <a href="http://www.rddl.org/">Resource Directory Description
Language</a> (RDDL) another catalog system but more oriented toward
providing metadata for XML namespaces.</li>
<li>the page from the OASIS Technical <a
href="http://www.oasis-open.org/committees/entity/">Committee on Entity
Resolution</a> who maintains XML Catalog, you will find pointers to the
specification update, some background and pointers to others tools
providing XML Catalog support</li>
</ul>
<p>If you have suggestions for corrections or additions, simply contact
me:</p>
<p><a href="mailto:daniel@veillard.com">Daniel Veillard</a></p>
<p>$Id:$</p>
</body>
</html>