diff --git a/ChangeLog b/ChangeLog index d1b6b7ae..3e10d5f6 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +Sat Dec 28 15:55:32 CET 2002 Daniel Veillard + + * doc/guidelines.html: added a document on guildeline for + publishing and deploying XML + Fri Dec 27 20:35:15 CET 2002 Daniel Veillard * valid.c xmlreader.c: final touch running DTD validation diff --git a/doc/guidelines.html b/doc/guidelines.html new file mode 100644 index 00000000..08cdf267 --- /dev/null +++ b/doc/guidelines.html @@ -0,0 +1,364 @@ + + + + + + XML resources publication guidelines + + + +

XML resources publication guidelines

+ +

+ +

The goal of this document is to provide a set of guidelines and tips +helping the publication and deployment of XML resources for the GNOME project. However it is not tied to +GNOME and might be helpful more generally, I welcome feedback on this document.

+ +

The intended audience are the software developpers who started using XML +for some of the resources of their project, as a storage format, for data +exchange, checking or transformations. There have been an increasing number +of new XML format defined, but not all steps have been taken, possibly by +lack of documentation, to truely gain all the benefits of the use of XML. +Those guidelines hopes to improve the matter and provide a better overview of +the overall XML processing and associated steps needed deploy it +successfully:

+ +

Table of content:

+
    +
  1. Design guidelines
  2. +
  3. Canonical URL
  4. +
  5. Catalog setup
  6. +
  7. Package integration
  8. +
+ +

Design guidelines

+ +

This part intend to focuse on the format itself of XML, those may arrive +a bit too late since the structure of the document may already be cast in +existing and deployed code. Still here are a few rules which might be helpful +when designing a new XML vocabulary or making the revision of an existing +format:

+ +

Reuse existing formats:

+ +

This may sounds a bit simplistic, but before designing your own format, +try to lookup existing XML vocabularies on similar data. Ideally this allows +to reuse them, in which case a lot of the existing tools like DTD, schemas +and stylesheets may already be available. If you are looking at a +documentation format, DocBook should +handle your needs. If reuse is not possible because some semantic or use case +aspects are too differents this will be helpful avoiding design errors like +targetting the vocabulary to the wrong abstraction level. In this format +design phase try to be synthetic and be sure to express the real content of +your data and use the XML structure to express the semantic and context of +those data.

+ +

DTD rules:

+ +

Building a DTD (Document Type Definition) or a Schema describing the +structure allowed by instances is the core of the design process of the +vocabulary. Here are a few tips:

+ + +

Versioning:

+ +

As part of the design, make sure the structure you define will be usable +for future extension that you may not consider for the current version, there +is 2 parts for this:

+ + +

Other design parts:

+ +

While defining you vocabulary, try to think in term of other usage to your +data, for example how using XSLT stylesheets could be used to make an HTML +view of your data, or to convert it into a different format. Checking XML +Schemas and looking at defining an XML Schemas with a more complete +validation and datatyping of your data structures are important, this helps +avoiding some mistakes in the design phase.

+ +

Namespace:

+ +

If you expect your XML vocabulary to be used or recognized outside of your +application (for example binding a specific processing from a graphic shell +like Nautilus to instance of your data) then you should really define an XML namespace for your +vocabulary. A namespace name is an URL (absolute URI more precisely), it is +generally recommended to anchor it as an HTTP resource to a server associated +with the software project, see the next section about this. In practice this +will mean that XML parsers will not handle your element names as-is but as a +couple based on the namespace name and the element name. This allow to +recognize and disambiguate processing. Unicity of the namespace name can be +for the most part garanteed by the use of the DNS registry. Namespace can +also be used to carry versionning informations like:

+ +

"http://www.gnome.org/project/projectname/1.0/"

+ +

an an easy way to use them is to make them the default namespace on the +root element of the XML instance like:

+
<structure xmlns="http://www.gnome.org/project/projectname/1.0/">
+  <data>
+  ...
+  </data>
+</structure>
+ +

In that document, structure and all descendant elements like data are in +the given namespace.

+ +

Canonical URL

+ +

As seen in the previous namespace section, while XML processing is not +tied to the Web there is a natural synergy between both, XML was designed to +be available on the Web, and keeping the infrastructure that way helps +deploying the XML resources. The core of this issue is the notion of +"Canonical URL" of an XML resource, the resource can be an XML document, a +DTD, a stylesheet, a schemas, or even non-XML data associated to an XML +resource, the canonical URL is the URL where the "master" copy of that +resource is expected to be present on the Web. Usually when processing XML a +copy of the resource will be present on the local disk, maybe in +/usr/share/xml or /usr/share/sgml maybe in /opt or even on C:\projectname\ +(horror !), the key point is that the way to name that resource should be +independant of the actual place where it reside on disk if it is available, +and the fact that the processing will still work if there is no local copy +(and that the machine where the processing is connected to the Internet).

+ +

What this really mean is that one should never use the local name of a +resource to reference it but always use the canonical URL. For example in a +DocBook instance the following should not be used:

+
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "/usr/share/xml/docbook/4.2/docbookx.dtd">
+ +

But always reference the canonical URL for the DTD:

+
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+ +

Similary, the document instance may reference the XSLT stylesheets needed to process it to +generate HTML, and the canonical URL should be used:

+
<?xml-stylesheet
+  href="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
+  type="text/xsl"?>
+ +

Defining the canonical URL for the resources needed should obbey a few +simple rules similar to those used to design namespace names:

+ + +

Catalog setup

+ +

How catalog works:

+ +

The catalogs are the technical mechanism which allow the XML processing +tools to use a local copy of the resources if it is available even if the +instance document references the canonical URL. XML Catalogs are +anchored in the root catalog (usually /etc/xml/catalog or +defined by the user). They are a tree of XML documents defining the mappings +between the canonical naming space and the local installed ones, this can be +seen as a static cache structure.

+ +

When the XML processor is asked to process a resource it will +automatically test for a locally available version in the catalog, starting +from the root catalog, and possibly fetching sub-catalog resources until it +finds that the catalog has that resource or not. If not the default +processing of fetching the resource from the Web is done, allowing in most +case to recover from a catalog miss. The key point is that the document +instances are totally independant of the availability of a catalog or from +the actual place where the loacl resource they reference may be installed. +This greatly improve the management of the document in the long run, making +them independant of the platform or toolchain used to process them.

+ +

Usual catalog setup:

+ +

Usually catalogs for a project are setup as a 2 level hierarchical cache, +the root catalog containing only "delegates" indicating a separate subcatalog +dedicated to the project. The goal is to keep the root catalog clean and +simplify the maintainance of the catalog by using separate catalogs per +project. For example when creating a catalog for the XHTML1 DTDs, only 3 items are added to +the root catalog:

+
  <delegatePublic publicIdStartString="-//W3C//DTD XHTML 1.0"
+                  catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
+  <delegateSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
+                  catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
+  <delegateURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
+                  catalog="file:///usr/share/sgml/xhtml1/xmlcatalog"/>
+ +

They are all "delegates" meaning that if the catalog system is asked to +resolve a reference corresponding to them, it has to lookup a sub catalog. +Here the subcatalog was installed as +/usr/share/sgml/xhtml1/xmlcatalog in the local tree, that +decision is left to the sysadmin or the packager for that system and may +obbey different rules, but the actual place on the filesystem (or on a +resource cache on the local network) will not influence the processing as +long as it is available. The first rule indicate that if the reference uses a +PUBLIC identifier beginning with the

+ +

"-//W3C//DTD XHTML 1.0"

+ +

substring, then the catalog lookup should be limited to the specific given +lookup catalog. Similary the second and third entries indicate those +delegation rules for SYSTEM, DOCTYPE or normal URI references when the URL +starts with the "http://www.w3.org/TR/xhtml1/DTD" subscting +which indicates the location on the W3C server where the XHTML1 resources are +stored, those are the beginning of all Canonical URLs for XHTML1 resources. +Those 3 rules are sufficient in practice to capture all references to XHTML1 +resources and direct the processing tools to the right subcatalog.

+ +

A subcatalog example:

+ +

Here is the complete subcatalog used for XHTML1:

+
<?xml version="1.0"?>
+<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN"
+          "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
+<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
+  <public publicId="-//W3C//DTD XHTML 1.0 Strict//EN"
+          uri="xhtml1-20020801/DTD/xhtml1-strict.dtd"/>
+  <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
+          uri="xhtml1-20020801/DTD/xhtml1-transitional.dtd"/>
+  <public publicId="-//W3C//DTD XHTML 1.0 Frameset//EN"
+          uri="xhtml1-20020801/DTD/xhtml1-frameset.dtd"/>
+  <rewriteSystem systemIdStartString="http://www.w3.org/TR/xhtml1/DTD"
+          rewritePrefix="xhtml1-20020801/DTD"/>
+  <rewriteURI uriStartString="http://www.w3.org/TR/xhtml1/DTD"
+          rewritePrefix="xhtml1-20020801/DTD"/>
+</catalog>
+
+ +

There is a few things to notice:

+ + +

Those 5 rules are sufficient to cover all references to the resources held +at the Canonical URL for the XHTML1 DTDs.

+ +

Package integration

+ +

Creating and removing catalogs should be handled as part of the process of +(un)installing the local copy of the resources. The catalog files being XML +resources should be processed with XML based tools to avoid problems with the +generated files, the xmlcatalog command coming with libxml2 allows to create +catalogs, and add or remove rules at that time. Here is a complete example +coming from RPM for the XHTML1 DTDs post install script:

+
%post
+CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
+#
+# Register it in the super catalog with the appropriate delegates
+#
+ROOTCATALOG=/etc/xml/catalog
+
+if [ ! -r $ROOTCATALOG ]
+then
+    /usr/bin/xmlcatalog --noout --create $ROOTCATALOG
+fi
+
+if [ -w $ROOTCATALOG ]
+then
+        /usr/bin/xmlcatalog --noout --add "delegatePublic" \
+                "-//W3C//DTD XHTML 1.0" \
+                "file://$CATALOG" $ROOTCATALOG
+        /usr/bin/xmlcatalog --noout --add "delegateSystem" \
+                "http://www.w3.org/TR/xhtml1/DTD" \
+                "file://$CATALOG" $ROOTCATALOG
+        /usr/bin/xmlcatalog --noout --add "delegateURI" \
+                "http://www.w3.org/TR/xhtml1/DTD" \
+                "file://$CATALOG" $ROOTCATALOG
+fi
+ +

The XHTML1 subcatalog is not created on-the-fly in that case, it is +installed as part of the files of the packages. So the only work needed is to +make sure the root catalog exists and register the delegate rules.

+ +

Similary, the script for the post-uninstall just remove the rules from the +catalog:

+
%postun
+#
+# On removal, unregister the xmlcatalog from the supercatalog
+#
+if [ "$1" = 0 ]; then
+    CATALOG=/usr/share/sgml/xhtml1/xmlcatalog
+    ROOTCATALOG=/etc/xml/catalog
+
+    if [ -w $ROOTCATALOG ]
+    then
+            /usr/bin/xmlcatalog --noout --del \
+                    "-//W3C//DTD XHTML 1.0" $ROOTCATALOG
+            /usr/bin/xmlcatalog --noout --del \
+                    "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
+            /usr/bin/xmlcatalog --noout --del \
+                    "http://www.w3.org/TR/xhtml1/DTD" $ROOTCATALOG
+    fi
+fi
+ +

Note the test against $1, this is needed to not remove the delegate rules +in case of upgrade of the package.

+ +

Following the set of guidelines and tips provided in this document should +help deploy the XML resources in the GNOME framework without much pain and +ensure a smooth evolution of the resource and instances.

+ +

Daniel Veillard

+ +

$Id$

+ +

+ +