1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      XML document interpretation
    </title>
    <link rel="Stylesheet" href="di.css" type="text/css" />
    <meta http-equiv="Content-Type" content=
    "text/html; charset=us-ascii" />
  </head>
  <body>
    <address>
      Tim Berners-Lee<br />
      Created: 2002/02/14, last change: $Date: 2007/09/03 23:41:55
      $<br />
      Status: personal view only, following from TAG and ww-tag and
      other mailing list discussions. Editing status: draft.
    </address>
    <p>
      This issue is sometimes termed the XML Processing Model
      problem. There was in fact an <a href=
      "/XML/2001/07/XMLPM.html">XML Processing Model Workshop</a>.
      In the light of lack of consensus result from the workshop,
      and specifically prompted by a question about the
      relationship of XEncryption to other specs, occurring as XEnc
      made its way to Candidate Recommendation status in W3C, this
      document was eventually started as an <a href=
      "http://www.w3.org/2002/02/25-tagmem-irc#T17-03-57">action
      item</a> from a TAG meeting, to open discussion on a new
      issue mixedNamespaceMeaning-13. That issue was then split
      into several other issues, one of which,&nbsp;<a href=
      "http://www.w3.org/2001/tag/issues.html?type=1#xmlFunctions-34">xmlFunctions-34</a>,
      is the main import of this document. &nbsp;In June 2005, this
      was revised as the XML Processing Model working group charter
      was being discussed.
    </p>
    <p>
      <a href="./">Up to Design Issues</a>
    </p>
    <hr />
    <h1>
      The Interpretation of XML documents
    </h1>
    <h3>
      Abstract:
    </h3>
    <p>
      It might seem that the specifications of different XML
      namespaces can make inconsistent claims such that the
      semantics of a mixed namespace documents are inconsistent.
      The solution sometimes proposed is a "processing model
      language" such that there is no default meaning of an XML
      document without such an external processing definition. This
      article argues that there is only one basic generic
      processing model (or rather, algorithm for allocating
      semantics) for XML documents which preserves needed
      properties of a multi-namespace system. These properties
      include the need to be able to define the semantics of an XML
      element without clashes of different specifications. This
      introduces the concept of an on of an XML document is defined
      starting at the document root by the specifications of the
      element types involved. A common class of foreign element
      name, called here <em>XML function</em>, has to be recognized
      in default processing by any supporting application, and
      returns more XML when it is elaborated.
    </p>
    <h2>
      <a name="problem" id="problem">The problem</a>
    </h2>
    <p>
      If one party sends another an XML document, how does one say
      what it means? Or, if you don't like the <em>meaning</em>
      word, what specs are invoked in what way when an XML document
      is published or transmitted? This question is sometimes posed
      as: What are is the processing model for XML?
    </p>
    <p>
      The interpretation of a plain XHTML document is fairly well
      understood. The document itself is a human language document,
      and so the conventions - sloppy and wonderful - of human
      language define how it is understood and interpreted. And the
      interpretation of tags such as H1 is described in a
      well-thumbed standard and many books, and is implemented more
      or less consistently in many devices.
    </p>
    <p>
      But what happens when we start to mix namespaces? When SVG is
      embedded within XHTML, or RDF or XSLT for that matter, what
      are the rules which ensure that the receiver will understand
      well the intent, the client software do the right thing --
      and the person understand the right thing? The same issues
      obviously apply when the information has machine-readable
      semantics.
    </p>
    <p>
      As Paul Prescod <a href=
      "http://lists.w3.org/Archives/Public/www-tag/2002Feb/0123.html">
      points out</a>, there are plenty of places one might think of
      looking for information about how to process a document:
    </p>
    <ol>
      <li>DOCTYPE statement
      </li>
      <li>top-level namespace
      </li>
      <li>schema reference declaration
      </li>
      <li>other root-level declared namespaces
      </li>
      <li>any attribute on the root element
      </li>
      <li>anything in the document
      </li>
    </ol>
    <p>
      In fact the general problem is that without any overall
      architecture, one can write specs which battle each other.
      "The X attribute changes the meaning of the Y attribute",
      "The Z attribute restores the meaning of the X attribute
      irrespective of any Y attribute" and so on. In such a world,
      one would never know whether one had correctly interpreted
      anything, as there might be somewhere something deemed to
      change the meaning of what we have. Clearly this way lies
      chaos. A coherent architecture defines which specs to look at
      to determine the interpretation of a document. We don't have
      this yet (2002) for XML.
    </p>
    <p>
      However, in practice if a person were to look at a document
      with a mixture of XHTML and SVG, they would probably find its
      meaning unambiguous.
    </p>
    <p>
      In the same message, Paul opines, <em>Top-down
      self-descriptiveness is one of the major advantages of XML
      and I think that doing otherwise should be deprecated</em>. I
      completely agree with this conclusion. He concludes correctly
      that the root namespace (the namespace of the document
      element) [or a DOCTYPE, which I will not discuss further] is
      the only thing one must be able to dispatch on.
    </p>
    <h3>
      <a name="pipeline" id="pipeline"></a>The Pipeline Processing
      model
    </h3>
    <p>
      However, he secondarily concludes that, because it is
      important to define what processing to be done first, one
      should use wrapper elements, so that if there are any XSLT
      elements within a document, a wrapper causes XSLT processing
      to be done, and so on. The discussion about documents with
      more than one namespace has often made an implict assumption
      that the XML is to be processed in a pipeline, in which each
      stage implements one XML technology, such as include
      processing, style sheet processing, decryption, and so on.
      &nbsp;The point of this article is that &nbsp;while this
      works in simple cases, in the general case the pipeline model
      is basically broken. &nbsp;Once you have things arbitraryily
      nested inside each other, there is no single pipeline which
      will do a general case. &nbsp;And nesting things inside each
      other in arbitrary ways is core to the power of XML.
    </p>
    <h3>
      <a name="Specific" id="Specific">Specific cases: XML
      functions</a>
    </h3>
    <p>
      The pipline model makes it very messy to address a situation
      which is increasingly common. This is of an XML document
      which contains a large numbers of embedded elements from
      namespaces such as
    </p>
    <ul>
      <li>XSLT, in "<a href=
      "http://www.w3.org/TR/1999/REC-xslt-19991116#result-element-stylesheet">Literal
      Result Element as Stylesheet</a>" mode.
      </li>
      <li>XInclude
      </li>
      <li>XMLEncryption
      </li>
      <li>XQuery (?)
      </li>
      <li>Internationalization tags such as "do not translate this
      phrase when translating this document"
      </li>
    </ul>
    <p>
      These namespaces share common properties:
    </p>
    <ul>
      <li>They are the sort of thing you want to use with any sort
      of document, without it having to be foreseen in the schema
      for the original document
      </li>
      <li>The content of these elements is not the final form, but
      will be replaced with other content
      </li>
      <li>The resulting content may recursively have invocations of
      the same or different things from the list
      </li>
      <li>The effect of processing the element in this namespace is
      constrained such that it can only elaborate the contents of
      that branch of the tree. The element is replaced with its
      result of processing, but none of its ancestors or siblings
      may be affected.
      </li>
      <li>There are certain very special cases in which you want to
      be able to mention one without it being expanded.
      </li>
    </ul>
    <p>
      To treat these as a group, I will call these elements
      <strong>XML functions</strong>. The term is not picked
      randomly. Let's look at some examples, each of which has its
      peculiarities.
    </p>
    <h4>
      <a name="XSLT" id="XSLT"></a>XSLT Literal Result Element as
      Stylesheet (LRES)
    </h4>
    <p>
      Let me clarify this way of looking at XSLT. The XSLT spec
      defines an XSLT namespace and how you make an XSLT document
      (stylesheet) out of it. Normally, the style sheet has
      <span style="font-family: monospace;">xsl:stylesheet</span>
      as its document element. However, there is a special "Literal
      result element as Stylesheet" (LRES) form of XSLT, in which a
      template document in a target namespace (such as XHTML) has
      XSLT embedded in it only at specific places. &nbsp;Here is an
      example from the spec.
    </p>
    <pre>
&lt;html xsl:version="1.0"<br />      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"<br />      xmlns="http://www.w3.org/TR/xhtml1/strict"&gt;<br />  &lt;head&gt;<br />    &lt;title&gt;Expense Report Summary&lt;/title&gt;<br />  &lt;/head&gt;<br />  &lt;body&gt;<br />    &lt;p&gt;Total Amount: &lt;xsl:value-of select="expense-report/total"/&gt;&lt;/p&gt;<br />  &lt;/body&gt;<br />&lt;/html&gt;
</pre>
    <p>
      The XSLT spec formally defines the LRES form as an
      abbreviation for the full form. In doing so it loses the
      valuable fact that in the LRES form, XSLT elements behave as
      XML functions. They actually adhere to the constraints above.
      This is is very valuable. The XSL spec says that the
      interpretation be that an XSLT document be generated and
      processed to return the "real" document. However, this does
      not scale in design terms. As the XSLT specification itsels
      notes,
    </p>
    <p style="margin-left: 40px;">
      "In some situations, the only way that a system can recognize
      that an XML document needs to be processed by an XSLT
      processor as an XSLT stylesheet is by examining the XML
      document itself. Using the simplified syntax makes this
      harder.<br />
      <br />
      NOTE: For example, another XML language (AXL) might also use
      an axl:version on the document element to indicate that an
      XML document was an AXL document that required processing by
      an AXL processor; if a document had both an axl:version
      attribute and an xsl:version attribute, it would be unclear
      whether the document should be processed by an XSLT processor
      or an AXL processor.<br />
      Therefore, the simplified syntax should not be used for XSLT
      stylesheets that may be used in such a situation"
    </p>
    <p>
      It does not work when other namespaces use the same trick. It
      also prevents applications from using optimizations which
      result from the constraints above. So, while the spec
      formally defines a template document in that way, one can
      make, it seems, a completely equivalent definition in terms
      of XML functions.
    </p>
    <p>
      Imagine a document in which at various different parts of the
      tree different forms occur, and in which these xml functions
      are in fact nested: you resolve an XInclude and it returns
      something with XSLT bits in.
    </p>
    <p>
      It is essential primarily to define what such a document
      should actually be when (for example) presented to a user. It
      is an extra plus to have some visibility from outside the
      document as to what functionality will be necessary to have
      to fully process the document, such as from the MIME header,
      but we can get to that later.
    </p>
    <h4>
      <a name="XInclude" id="XInclude"></a>XInclude
    </h4>
    <p>
      This is probably a simple function. The include element is
      replaced by the referenced document or part of document. This
      is straightforward and obviously nests.
    </p>
    <p>
      It is also obvious that it doesn't actually matter , when
      xincludes are nested, that it doesn't make any difference
      whether you consider the inner ones to be expanded before or
      after the outer ones. (The base URI of a reference always has
      to be taken as that of the original source document, no
      matter where the refernce ends up being expanded)
    </p>
    <h2>
      <a name="Processing" id="Processing">Top-down Processing
      model</a>
    </h2>
    <p>
      I think that the battle over the order of processing of XML
      functions is often an ill-formed question. XML is a tree. It
      is appropriate for the interpretation of the tree to be
      defined from the top down. This does not determine the order
      in which the leaves of the tree have to be done.
    </p>
    <p>
      Here are some ways in which processors could handle an XHTML
      document containing XML functions:
    </p>
    <ul>
      <li>Noting that XHTML is a plain vanilla language, but that
      this document contains other things, first pipeline it
      through an XSLT processor, then an XInclude processor (the
      order being arbitrary), then a an XML decryption processor,
      and again in a cycle, until there are no functions left.
      </li>
      <li>Invoke an XML support class which then parses the
      document recursively. This more powerful XML parser has the
      ability to dispatch to the support class for an XML function
      whenever it finds one.
      </li>
      <li>Invoke an XHTML support class which then parses the
      document as it needs to in order to display it.. This more
      powerful XML parser has the ability to dispatch to the
      support class for an XML function whenever it finds one.
      However, the XHTML parser uses the constraint that in certain
      cases the front of an XHTML document can be displayed before
      the last has been parsed, and it actually delays evaluation
      of functions until the user's use of scroll keys makes it
      necessary. It turns out that certain things never need to be
      evaluated at all, saving time and bandwidth.
      </li>
    </ul>
    <p>
      This is NOT supposed to be a definitive list of ways of
      parsing XML documents with functions - it is only supposed to
      illustrate the fact that many approaches are possible which
      can be shown to be mathematically equivalent in their effect.
      (This is why I tend to talk about the meaning, or
      interpretation, of a document, rather than the processing
      model)
    </p>
    <h3 id="need">
      <a name="quote" id="quote"></a>The need to quote
    </h3>
    <p>
      That said, it may be necessary to define a reference
      processing model, just because one has to have a way of
      describing what the document means. In this case note that
      the first model above is not appropriate. It uses the fact
      that XHTML contains no tricks - it is "plain vanilla" in that
      everything in the document is part of the document in the
      same way, modulo styling. (I simplify). This does not apply
      to other sorts of document. Take an XML package for example:
      the contents of the packages are quoted and is not
      appropriate just to expand the contents of them. Only the
      cover note, the defining document contains the import of the
      package as a whole, and the interpretation of the other
      packaged things is only known in as much as the cover note
      defines it. it is essential that languages such as XML
      packaging can be defined in XML. It is essential that one
      can, if you like, quote a bit of XML literally, and make up a
      new tag which says something quite new about the contents.
      Therefore, while it works with XHTML, and as Tim Bray says
      (TAG 2002/02/14) there are many applications which do
      "generic XML processing" such as trawling documents for links
      and use of language, there will be certain namespaces such as
      HTML and SVG for which that makes sense and and other such as
      XML packaging and Xml encryption, in which it won't. <em>(On
      the semantic web case, the same applies, and was the cause in
      2002 of much discussion in RDF groups because RDF does not
      have quotes, and the informal use of
      rdf:parseType="log:quote")</em>
    </p>
    <p>
      If you need another example, think about the XSLT insert
      which generates and XInclude element: It may contain what
      seems to be and even is an XInclude element, but should not
      be expanded as contents of the XSLT element.
    </p>
    <p>
      The reference processing model must be then, that parsing of
      an XML document conceptually involves elaboration of any
      functions, and that processors must be able to dispatch based
      on namespace at any point in the tree.
    </p>
    <p>
      The result of such processing is the document which should
      correspond to the XML schema, if the. There is normally no
      call for schema validation of the document which still
      contains XML functions. Systems which claim to be conformant
      to the spec of a given XML function mean that they can, in
      any XML document, elaborate the function according to the
      specification. As Jacek Kopecky says (2002/02/21), <em>[...]
      by saying on the sender: "We expect the XHTML processor to be
      able to handle XInclude and therefore this thing is an XHTML
      document all right"</em>. We can't of course expect old XML
      processors to handle XInclude, but we can expect anything
      which claims conformance with Xinclude to do so.
    </p>
    <h3>
      <a name="Software" id="Software"></a>Software designs for
      top-down processing
    </h3>
    <p>
      In object-oriented software terms, one imagines handing an
      XML tree to an instance of an object class which supports the
      element type of the document element. This then returns
      something as defined by the spec. (An HTML document
      conceptually returns a hypertext page, an SVG document a
      diagram, an RDF document a conceptual graph (small c small
      g)). The object may itself call out to a parser to get the
      infoset for its contents, and it may or may not call out to
      the XML function evaluator but whether it does or not is
      defined by its own specification. But XML functions just
      return XML which replaces them. And any XML applications
      which claim conformance to the XML function's spec should be
      able to accept this.
    </p>Similarly, in an event-oriented architecture, an event
    stream which is being fed to an HTML handler would, when a
    foreign namespace such as XSLT is found, be vectored to an XSLT
    handler. The software design has to allow the XSLT handler to
    hand back a new set of events, a serialization of the resultant
    tree, to the HTML handler.<br />
    <br />
    The software design in either vase also has to allow enough
    context to be shared between the applications so that they can
    perform their function: embedded SVG needs a display context
    such as part of a drawing space which corresponds to the space
    in the rendering of the HTML document, and so on.<br />
    <h3>
      <a name="siblings" id="siblings"></a>Unresolved issue:
      references to siblings
    </h3>
    <p>
      This note does not address many of the issues around the XML
      processing model.
    </p>
    <p>
      There is a possible ambiguity when a function refers to the
      current document. In other words, though it is not allowed to
      change things outside itself, it may read outside itself.
      This (if allowed) would clear raise the question of whether
      it references the document before or after its own or other
      function's elaboration.
    </p>
    <p>
      A related question is whether an XPointer fragment identifier
      should refer to the document before or after elaboration of
      functions. My inclination is to say after, because then you
      know that an XPointer into an SVG object will resolve to a
      bit of SVG. But there may be arguments the other way.
    </p>
    <p>
      XML Digital Signature (I am told) specifically requires that
      the signature is done on the raw source of the document
      before XInclude. Without going into the relative merits of
      signature before and after XInclude and other functions, it
      is clear that there are cases when either would be useful.
    </p>
    <p>
      The ambiguity of these references, like the problems in XSLT
      of generating XSLT stylesheets with XSLT stylesheets, stem
      from the lack of quoting syntax in XML.
    </p>
    <h3>
      <a name="MIME" id="MIME">MIME content-type labeling</a>
    </h3>
    <p>
      <em>@@This section is not complete. It has been covered more
      thoroughly by TAG discussions already. @@ link</em>
    </p>
    <p>
      An XML document is in most cases self-describing. That is,
      you don't need to know anything more that it is XML to
      interpret it. In email and HTTP applications, it is useful
      for the RFC822-style message to define how the body should be
      interpreted using the <code>content-type</code> header. All
      that is necessary, then, is that the content-type should
      indicate XML (<code>text/xml</code> or
      <code>application/xml</code> or anything with
      <code>+xml)</code> and a top-down generic processing is
      valid. (The algorithm for determining the character encoding
      is not addressed here @@ link)
    </p>
    <p>
      While this is sufficient, it is however useful to be able to
      provide more visibility as to what is contains [Roy Fielding,
      Dissertation, Ch4 @@link]. The document element gives, in
      many cases, fundamental information about the resulting type
      of the whole document, irrespective of functions elaborated
      or plugins plugged in. For example, whatever the content, an
      <code>xhtml:html</code> document is a hypertext page. This
      means that some systems will represent it in a window and
      allows certain functionality. The operating system, if it
      knows this, can use icons to tell the user, before they open
      an email or follow a link, what sort of a thing it contains
      or leads to. Similarly, an SVG document will return a
      diagram, and an RDF document body of knowledge -- a set of
      relational data. So more than any other namespace used in the
      document, the document element namespace is crucial.
    </p>
    <p>
      This is why the best practice is to publish documents with
      standard and therefore well-known document element types as a
      special MIME type. This allows an XHTML page to be visible as
      such from the HTTP headers alone. This allows smarter
      processing by intermediates, decisions about proxy caching,
      translation, and so on. It allows the content negotiation of
      HTTP to operate, allowing a user for example to express a
      preference for audio rather than video. This also allows
      systems which want to to optimize the dispatching of a
      handler for the document from the MIME type alone. A "+xml"
      prefix as defined by RFC____@@ should be used whenever the
      document is also a self-describing top-down XML document for
      which the top-down processing model applies. (The fact that a
      document is a well-formed XML1.0 document alone does
      <em>not</em> constitute grounds for adding the "+xml")
    </p>
    <p>
      Simon St Laurent has suggested [@@ his Internet-draft,
      possibly timed out] that all namespaces used in the document
      be listed as parameters to the MIME type. This makes sense on
      the surface. It may not be practical or worth the effort. It
      is a lot of bits, and in any case exactly what will be
      required cannot be determined until the document has been
      interpreted top-down. However, it or something equivalent is
      necessary if one is to specify the software support which is
      necessary.
    </p>
    <ul>
      <li>The top element can in fact be such that the other
      elements are to interpreted in arbitrarily weird ways
      </li>
      <li>For many document element types, there is a guarantee of
      the sort of object which is being represented.
      </li>
    </ul>So the best form of visibility would be state (and
    possibly negotiate) the set of XML deatures which must be
    supported to properly process the document.
    <h3>
      <a name="implied" id="implied"></a>Related notion: implied
      namespace
    </h3>
    <p>
      When a namespace-specific content-type has been specified, is
      it also necessary to specify the document namespace, or could
      that be assumed? That would mean that a plain XHTML file
      would not need an explict namespace. It is tempting to say
      that the default namespace should default to that associated
      with the content type, but in fact the logical thing is for
      the document namespace.
    </p>
    <p>
      @@Decision tree diagram - add
    </p>
    <h3>
      <a name="L1578" id="L1578">User-defined processing of
      documents</a>
    </h3>
    <p>
      This document defines the basic interpretation of an XML
      document. There have been many suggestions of ways in which a
      complex and different order of processing could be specified,
      many of these mentioned at the workshop, and including Sun's
      XML pipeline submission. My current view is that such
      languages should be regarded themselves the top-level
      document which then draws in the rest of the document by
      reference as it is elaborated.
    </p>
    <h3>
      <a name="L1610" id="L1610">Server-side processing of
      documents</a>
    </h3>
    <p>
      In the HTTP protocol, or email for that matter, the important
      interface which is standardized is the one between the
      publisher (or sender) and receiver. We concern ourselves with
      what a receiver can do by way of interpretation of an XML
      document published or sent. Any processing which has happened
      on the server or sender side in order to process that
      document is not part of the protocol. While XML functions may
      indeed be elaborated to form a document for transmission from
      another one, that is something for control within the server
      and so is not a primary concern for standardization.
    </p>
    <p>
      When a document is in a pure fucntional form, it actually is
      an opmization whether the functions are elaborated by the
      server or or the client.
    </p>
    <h2>
      <a name="schema" id="schema"></a>The requirements on Schema
      languages
    </h2>This tree-oriented architecture for XML puts requirements
    on schema languages. With DTDs, and with current XML schema,
    there was no natural way to describe how namespaces fit
    together. There have been many rather unnatural attempts to
    create a modular system, such as the HTML modularization
    @@link. The way this has been done has basically been to make
    one great big schema for the combined language, in such a way
    that the new schema constrains the way the elements from
    different namespaces can fit together.<br />
    The problem is to avoid making this an n<sup>2</sup> problem.
    Will the working group which integrates n specs (such as 4 for
    XHTML, SVG, XForms, MathML) take n<sup>2</sup> years to make
    the schema? It would be far preferable if one could just write
    a scheme for each new facility,<br />
    <br />
    Conversely, what would a schema language would have to allow us
    to say:<br />
    <ul>
      <li>
        <span style="font-family: monospace;">&lt;its:info
        translate="no"&gt;<span style=
        "font-style: italic;">x</span>&lt;/its:info&gt;</span> can
        occur anywhere <span style=
        "font-family: monospace; font-style: italic;">x</span> can
        occur (for systems which support ITS)
      </li>
      <li>&lt;its:info translate="no"&gt;x&lt;/its:info&gt; can
      occur anywhere x can occur, so long as x is human-presentable
      content.
      </li>
      <li>&lt;xenc:encrypted&gt; ...&lt;/&gt; is a function: it can
      occur anywhere, so long as XEnc is supported. It's processing
      will return XML mixed content which will replace this
      element.
      </li>
      <li>&lt;svg:drawing/&gt; can occur anywhere &lt;xhtml:img
      /&gt; can occur.
      </li>
    </ul>This way of specifying n independent schemas, or rather
    schemas which have back-references to earlier schemas in some
    cases, allows a product to simply quote the set of XML
    technologies which it supports. This has to be negotiated
    between the sender and receiver of XML. It is not the same in
    the general case to the set of namespaces used in the document,
    because function elaboration may change that. All the same, the
    namespaces may be a useful way of indirectly referring to the
    features.
    <p>
      Because the mode of operation in which the content is
      evaluated with function processing is very common, it would
      be useful in a schema for example to indicate this mode, or,
      more practically, to indicate the exceptions. There are very
      few elements which don't elaborate their contents at the
      moment in the markup world, and so they should be the
      exception. (Many computing languages of course reserve
      special punctuation for this quoting but adding punctuation
      at this stage isn't the XML style!)
    </p>
    <h2>
      <a name="Conclusion" id="Conclusion"></a>Conclusion
    </h2>
    <p>
      The top-down processing model for XML as an architectual
      principle resolves many of the questions which remain
      unanswerable with pipelined processing. In fact,
      consideration of the example shows that pipeline processing
      could be actually dangerious, producing errors and possibly
      security issues, in the case of generally nested XML
      technologies of the types discussed.
    </p>
    <h2>
      <a name="References" id="References">References</a>
    </h2>
    <ul>
      <li>Discussion on www-tag@w3.org list
        <ul>
          <li>
            <a href=
            "http://lists.w3.org/Archives/Public/www-tag/2002Feb/0123.html">
            19 Feb 2002, Paul Prescod Namespace Dispatching</a>
          </li>
        </ul>
      </li>
      <li>
        <a href=
        "http://www.imc.org/ietf-xml-mime/mail-archive/threads.html">
        The archive of the XML-MIME list relevant</a> to MIME
        dispatching of XML documents
      </li>
      <li>
        <a href="http://www.w3.org/XML/2001/07/XMLPM.html">XML
        processing model workshop</a>
      </li>
      <li>TAG Issue <a href=
      "http://www.w3.org/2001/tag/issues.html#xmlFunctions-34">XMLFunctions-34</a>
      </li>
      <li>W3C Specifications: <a href=
      "http://www.w3.org/TR/REC-xml/">XML spec</a>; <a href=
      "http://www.w3.org/TR/REC-xml-names/">Namespaces in XML</a>
      </li>
      <li>
        <a href=
        "http://www.w3.org/TR/2002/NOTE-xml-pipeline-20020228/">XML
        pipeline definition language</a>, Sun Microsystems
      </li>
      <li>
        <a href=
        "http://lists.w3.org/Archives/Member/xml-pm-ws/2001Jul/thread.html">
        XML Processing Model discussion list</a> (W3C members
        archive)
      </li>
    </ul>
    <hr />
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <p>
      <a href="../People/Berners-Lee">Tim BL</a>
    </p>
  </body>
</html>