xhtml-modularisation-thoughts 16.6 KB
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta name="copyright" content="Copyright (C) 2007 Henry S. Thompson"/><meta http-equiv="Content-type" content="text/html; charset=utf-8"/><style type="text/css">
       pre.code {font-family: monospace}
       pre {margin-left: 0em}
       ul.naked li { list-style-type: none }
       ol ol {list-style-type: lower-alpha}
       .math {font-family: 'Arial Unicode MS', 'Lucida Sans Unicode', serif}
       .sub {font-size: 80%; vertical-align: sub}
       .termref {text-decoration: none; color: #606000}
       div.toc h2 {font-size: 120%; margin-top: 0em; margin-bottom: 0em}
       div.toc h4 {font-size: 100%; margin-top: 0em; margin-bottom: 0em;
                   margin-left: 1em}
       div.toc h1 {font-size: 140%; margin-bottom: 0em}
       div.toc ul {margin-top: 1ex}
       .byline {font-size: 120%}
    </style><title>Review of XHTML Modularization</title></head><body style="font-family: times; background: rgb(254,250,246)">
 <div style="text-align: center">
  <h1>Review of XHTML Modularization</h1>
  <div class="byline">Henry S. Thompson</div>
  <div class="byline">22 October 2007</div>
 </div>
 <div class="toc"><h1>Table of Contents</h1><ul class="naked"><li><h2>1.  <a href="#background">Background</a></h2></li><li><h2>2.  <a href="#asPublished">The published approach</a></h2></li><li><h2>3.  <a href="#substGroup">Changing over to using substitution groups</a></h2></li><li><h2>4.  <a href="#gnbn">The bad news and the good news</a></h2></li><li><h2>5.  <a href="#morebn">The other bad news</a></h2></li><li><h2>6.  <a href="#trial">File-per-element-type trial</a></h2></li><li><h2>7.  <a href="#results">Outcome of trial</a></h2></li></ul></div>
  <div>
   <h2>1.  <a name="background">Background</a></h2>
   <p>As part of our work on <a href="http://www.w3.org/2001/tag/group/track/issues/41">TAG issue XMLVersioning-41</a>, I took <a href="http://www.w3.org/2001/tag/group/track/actions/15">an action to review</a> the mechanisms used by <a href="http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705/">XHTML Modularization</a>.  In particular, we were interested in exploring the potential for using substitution groups as a modularization/extensibility mechanism.</p>
   <p>In what follows I concentrate on elements and content models---attributes
present challenges which are at least in part distinct.</p>
  </div>
  <div>
   <h2>2.  <a name="asPublished">The published approach</a></h2>
   <p>The published set of schema documents are intended to be combined with a
driver that imports/includes a selected subset.  For example, <a href="http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705/SCHEMA/xhtml11.xsd">this driver document</a> will assemble a schema corresponding closely to <a href="http://www.w3.org/TR/2002/REC-xhtml1-20020801/">XHTML10</a> Strict (the difference is the addition of the Ruby Basic module, providing the <code>rb</code>, <code>rp</code>, <code>rt</code> and <code>ruby</code> elements).  There is also <a href="http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705/SCHEMA/xhtml-basic10.xsd">a driver</a> which approximates <a href="http://www.w3.org/TR/2000/REC-xhtml-basic-20001219/">XHTML Basic 1.0</a>. In total, the 48 modules define 81 element types (86 are <i>used</i>, but five of these are missing definitions!).</p>
   <p>The general paradigm is that the published modules define content models
chiefly by reference to named groups.  To make a change, users are expected to
<b>redefine</b> groups to add or remove elements, <i>ad lib</i>.</p>
   <p>For example, here's how the published definition specifies the content model for the
<code>body</code> element:</p>
   <blockquote><div><pre class="code">    &lt;xs:element
        name="body"
        type="xhtml.body.type"/&gt;

    &lt;xs:complexType
        name="xhtml.body.type"&gt;
        &lt;xs:group ref="xhtml.body.content"/&gt;
        &lt;xs:attributeGroup ref="xhtml.body.attlist"/&gt;
    &lt;/xs:complexType&gt;

    &lt;xs:group
        name="xhtml.body.content"&gt;
        &lt;xs:sequence&gt;
            &lt;xs:group ref="xhtml.Block.mix"
                maxOccurs="unbounded"/&gt;
        &lt;/xs:sequence&gt;
    &lt;/xs:group&gt;

    &lt;xs:group
        name="xhtml.Block.mix"&gt;
        &lt;xs:choice&gt;
            &lt;xs:group ref="xhtml.Heading.class"/&gt;
            &lt;xs:group ref="xhtml.List.class"/&gt;
            &lt;xs:group ref="xhtml.Block.class"/&gt;
            &lt;xs:group ref="xhtml.Misc.class"/&gt;
        &lt;/xs:choice&gt;
    &lt;/xs:group&gt;

    &lt;xs:group
        name="xhtml.List.class"&gt;
        &lt;xs:choice&gt;
            &lt;xs:element name="ul" type="xhtml.ul.type"/&gt;
            &lt;xs:element name="ol" type="xhtml.ol.type"/&gt;
            &lt;xs:element name="dl" type="xhtml.dl.type"/&gt;
        &lt;/xs:choice&gt;
    &lt;/xs:group&gt;
</pre></div></blockquote>
   <p>In order to add your own element to this, you would have to build your
<i>own</i> driver document, which included:</p>
   <blockquote><div><pre class="code">&lt;xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.w3.org/1999/xhtml"        <b>[1]</b>
           xmlns:mine="http://www.example.com/mine"
           xmlns="http://www.w3.org/1999/xhtml"&gt;
    &lt;xs:import namespace="http://www.example.com/mine"/&gt;      <b>[2]</b>

    &lt;xs:redefine schemaLocation="http://www.w3.org/MarkUp/SCHEMA/xhtml11-model-1.xsd"&gt;
      &lt;xs:group name="xhtml.Misc.class"&gt;
        &lt;xs:choice&gt;
          &lt;xs:group ref="xhtml.Misc.class"/&gt;                  <b>[3]</b>
          &lt;xs:element ref="mine:newfangled"/&gt;
        &lt;/xs:choice&gt;
      &lt;/xs:group&gt;
    &lt;/xs:redefine&gt;</pre></div></blockquote>
   <p>Notes:</p>
   <ul class="naked">
    <li><b>1</b>  We have to claim to be defining the schema for the XHTML
namespace, because this is the driver document which we will hand to validators
for entire instance documents.</li>
    <li><b>2</b>  I'm assuming that I've defined my own stuff in my own
namespace, including an element whose name is <code>newfangled</code>.  This
import allows me to reference it.</li>
    <li><b>3</b>  Redefining a group in terms of itself is a way of
<i>adding</i> things to the content model it defines.</li>
   </ul>
   <p>The net result of all this is that a document such as the following</p>
   <blockquote><div><pre class="code">&lt;html xmlns="http://www.w3.org/1999/xhtml" 
  xmlns:my="http://www.example.com/mine"&gt;
    . . .
    &lt;body&gt;
     &lt;div&gt;. . .&lt;/div&gt;
     &lt;my:newfangled&gt;. . .&lt;/my:newfangled&gt;
    &lt;/body&gt;
&lt;/html&gt;</pre></div></blockquote>
   <p>would be schema-valid per the schema corresponding to my driver schema
document as shown above.</p>
  </div>
  <div>
   <h2>3.  <a name="substGroup">Changing over to using substitution groups</a></h2>   
   <p>In the published modules, almost all the groups are choice groups
(disjunctions):  The only five actually substantive sequences are for
<code>frameset</code>, <code>head</code>, <code>html</code>, <code>ruby</code>
and <code>table</code>.  Since substitution groups provide a low-overhead,
non-intrusive way of adding to a disjunction, this looks encouraging.  Let's
see what our example would look like converted to substitution groups.  First
we simplify things in the published modules, using abstract elements whereever
the original uses choice groups:</p>
   <blockquote><div><pre class="code">&lt;xs:complexType name="xhtml.body.type"&gt;
&lt;xs:sequence&gt;
 &lt;xs:element ref="Body.mix" minOccurs="0" maxOccurs="unbounded"/&gt;
&lt;/xs:sequence&gt;
&lt;/xs:complexType&gt;

&lt;xs:element name="Body.mix" abstract="true"/&gt;

&lt;xs:element name="Heading.class" abstract="true" substitutionGroup="Body.mix"/&gt;
&lt;xs:element name="List.class" abstract="true" substitutionGroup="Body.mix"/&gt;
&lt;xs:element name="Block.class" abstract="true" substitutionGroup="Body.mix"/&gt;
&lt;xs:element name="Misc.class" abstract="true" substitutionGroup="Body.mix"/&gt;

&lt;xs:element name="ul" type="xhtml.ul.type" substitutionGroup="List.class"/&gt;
&lt;xs:element name="ol" type="xhtml.ol.type" substitutionGroup="List.class"/&gt;
&lt;xs:element name="dl" type="xhtml.dl.type" substitutionGroup="List.class"/&gt;</pre></div></blockquote>
   <p>Now we can do everything we need to do to add our own element in our own
schema document:</p>
   <blockquote><div><pre class="code">&lt;xs:schema targetNamespace="http://www.example.com/mine" xmlns:xhtml="http://www.w3.org/1999/xhtml"&gt;
  &lt;xs:import namespace="http://www.w3.org/1999/xhtml"/&gt;

  &lt;xs:element name="newfangled" substitutionGroup="xhtml:Misc.class"&gt;
   . . . 
  &lt;/xs:element&gt;
&lt;/xs:schema&gt;</pre></div></blockquote>
   <p>That's it.  Looks like a win to me.</p>
  </div>
  <div>
   <h2>4.  <a name="gnbn">The bad news and the good news</a></h2>
   <p>It doesn't work.  Yet.  XML Schema 1.0 allows an element to be in only
one substitution group.  But some elements in the published XHTML11 modules are
directly in several groups.  For example, <code>b</code> is in both the
<code>InlinePre.mix</code> group and the <code>InlPres.class</code> group.  So
the blanket replacement of groups with abstract elements as substition group
heads would require an inexpressible schema per XML Schema 1.0, as the
<code>b</code> element declaration would have to name <i>two</i> elements
as its substitution group head.  The good news is that XML Schema 1.1 allows
multiple substitution group heads, and XHTML11 Modularization is still in Last
Call, so they <i>could</i> shift to XML Schema 1.1.</p>
  </div>
  <div>
   <h2>5.  <a name="morebn">The other bad news</a></h2>
   <p>Substitution groups are great for devolved, bottom-up extensibility.  The
design pattern suggested by the above example is elegant and easy to use, for a
language intended to be open to user extension across a broad front.  But XHTML
modularization has at least <i>two</i> goals:</p>
   <ol>
    <li>Support user extensions;</li>
    <li>Support subsetting.</li>
   </ol>
   <p>Substitution groups do nothing for the second goal.  For example, in the
XHTML Basic driver, only <code>title</code>, <code>base</code>, <code>meta</code>, <code>link</code> and
<code>object</code> are allowed inside <code>head</code>, whereas in full
XHTML11, <code>script</code> and <code>style</code> are allowed as well.  This
is accomplished by having different definitions for the
<code>HeadOpts.mix</code> group in the respective driver documents.  There is
no straightforward bottom-up equivalent to this top-down approach to customization.</p>
  </div>
  <div>
   <h2>6.  <a name="trial">File-per-element-type trial</a></h2>
   <p>The previous section notwithstanding, maybe it's worth trying a complete
redesign along the following lines:</p>
   <ol>
    <li>One directory per module</li>
    <li>Module files have only type defns, abstract elt decls and attr group defns</li>
    <li>Type defns use abstract elts wherever appropriate, as discused above</li>
    <li>Each concrete element type has its own file, which simply declares
itself as in the subst group of its abstract equivalent</li>
    <li>Alternative drivers at top and module levels for subsetting</li>
   </ol>
   <p>Mixed or simple starred content models always just star an abstract elt,
mixed or not as appropriate.  These in turn will be cited as sgh by the
appropriate .mix or .whatever group-equivalents.</p>
   <p>So for example from</p>
   <blockquote><div><pre class="code">    &lt;xs:group name="xhtml.li.content"&gt;
        &lt;xs:sequence&gt;
            &lt;xs:group ref="xhtml.Flow.mix" minOccurs="0" maxOccurs="unbounded"/&gt;
        &lt;/xs:sequence&gt;
    &lt;/xs:group&gt;

    &lt;xs:complexType name="xhtml.li.type" mixed="true"&gt;
        &lt;xs:group ref="xhtml.li.content"/&gt;
        &lt;xs:attributeGroup ref="xhtml.li.attlist"/&gt;
    &lt;/xs:complexType&gt;
</pre></div></blockquote>
   <p>we would want, in two separate documents:</p>
   <blockquote><div><pre class="code">    &lt;xs:complexType name="xhtml.li.type" mixed="true"&gt;
     &lt;xs:sequence&gt;
      &lt;xs:element ref="xhtml.li.content" minOccurs="0" maxOccurs="unbounded"/&gt;
     &lt;/xs:sequence&gt;
    &lt;/xs:complexType&gt;
    
    &lt;xs:element name="xhtml.li.content" abstract="true"/&gt;

    ---------
    
    &lt;xs:element name="xhtml.Flow.mix" abstract="true" substitutionGroup="... xhtml.li.content ..."/&gt;</pre></div></blockquote>
   <p>Could go one of two ways wrt element types:</p>
   <ol>
    <li>Use abstract names, e.g. <code>li.abs</code> which in turn have the
right type defn, and do the 'real' elts separately with an appropriate sgh;</li>
    <li>Use the 'real' names directly, non-abstract</li>
   </ol>
   <p>The advantage of (1) is that you could do e.g. japanese xhtml with <i>or
without</i> allowing the original english, and it's consistent with how
other elts are handled.  The advantage of (2) is that
it's simpler.</p>
   <p>For example, from:</p>
   <blockquote><div><pre class="code">    &lt;xs:group name="xhtml.ol.content"&gt;
        &lt;xs:sequence&gt;
            &lt;xs:element name="li" type="xhtml.li.type" maxOccurs="unbounded"/&gt;
        &lt;/xs:sequence&gt;
    &lt;/xs:group&gt;

    &lt;xs:complexType name="xhtml.ol.type"&gt;
        &lt;xs:group ref="xhtml.ol.content"/&gt;
        &lt;xs:attributeGroup ref="xhtml.ol.attlist"/&gt;
    &lt;/xs:complexType&gt;</pre></div></blockquote>
   <p>we would get, again in two files:</p>
   <blockquote><div><pre class="code">    &lt;xs:complexType name="xhtml.ol.type"&gt;
     &lt;xs:sequence&gt;
      &lt;xs:element ref="li.abs" maxOccurs="unbounded"/&gt;
     &lt;/xs:sequence&gt;
     &lt;xs:attributeGroup ref="xhtml.ol.attlist"/&gt;
    &lt;/xs:complexType&gt;
    
    &lt;xs:element name="li.abs" abstract="true" type="xhtml.li.type"/&gt;

    ---------
    
    &lt;xs:element name="li" substitutionGroup="li.abs"/&gt;</pre></div></blockquote>
  </div>
  <div>
   <h2>7.  <a name="results">Outcome of trial</a></h2>
   <p>I built a <a href="http://www.w3.org/2001/tag/2007/09/xmod/">set of
files and directories</a>, using the strategy outlined above, and it works.
(I wrote two stylesheets, one
<a href="http://www.w3.org/2001/tag/2007/09/xmod/moduleTypes.xsl">per
module</a> and one
<a href="http://www.w3.org/2001/tag/2007/09/xmod/newModel.xsl">per
profile</a>, which did <i>almost</i> all the work).</p>
   <p>The good news is that it not only works, it's actually very clean and
powerful in some ways.  It was trivial and straightforward, for instance, to
produce an all-Japanese version of the Core profile, something which would have
been neither using the published approach.  Also, having done that, it was even
<i>more</i> trivial to produce a bilingual version of the Core profile,
which would not be at all true for the published approach.</p>
   <p>The price for this is more files, but actually fewer bytes:  The Core profile needs 13
schema documents totalling 52K bytes in the original formulation, 45 schema
documents but only 44K bytes in the new formulation.  The Basic profile needs
21 schema documents and 82K bytes in the original formulation, 70 schema
documents but only 81K bytes in the new formulation.</p>
   <p>One unexpected, but particularly nice, aspect of the new approach is that
in at least some cases it removes the necessity for defining special restricted
content models for profiles.  In the original formulation of the basic profile, special restricted
module definitions are required for <a href="http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705/SCHEMA/xhtml-basic-table-1.xsd">tables</a> and <a href="http://www.w3.org/TR/2006/WD-xhtml-modularization-20060705/SCHEMA/xhtml-basic-form-1.xsd">forms</a>.  In the new substitution-group approach, the full module definitions can be used unchanged, because their content models are expressed in terms of abstract elements (e.g. <code>colgroup.abs</code> and <code>button.abs</code>).  Because the 'basic' profile doesn't include the element schema files for the elements not included in the profile, for some of those abstract elements, there are no concrete elements identifying them as their substitution-group head.  So for example the full table content model <code>( caption.abs?, (col.abs*|colgroup.abs*), ((thead.abs?,tfoot.abs?,tbody.abs*)|tr.abs+) )</code> becomes in practice <code>( caption?, tr+ )</code>, because the basic profile driver includes only <code>caption.xsd</code> and <code>tr.xsd</code>.  It seems likely that this feature of the new approach will make defining profiles which are strict subsets of the whole language much simpler.</p>
   </div>
 
</body></html>