WD-doctypes 19.2 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426
<!doctype html public "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<TITLE>W3C WD: HTML Dialects: Internet Media Types and SGML Document Types</TITLE>
</HEAD><BODY><P>
<H2 align=right><A HREF="../"><IMG BORDER="0" ALT="W3C" SRC="../Icons/WWW/w3c_home.gif" ALIGN="left"
ALT="W3C:"></A> WD-doctypes-960302
</H2><H1 class=doctitle align=center>HTML Dialects: Internet Media and SGML
Document
Types
</H1><H3 align=center>W3C Working Draft 06-Mar-96
</H3><DL><DT>This version:
<DD>http://www.w3.org/pub/WWW/TR/WD-doctypes-960302
<BR>
$Id: WD-doctypes.html,v 1.12 1996/12/09 03:28:20 jigsaw Exp $
<DT>Latest version:
<DD>http://www.w3.org/pub/WWW/TR/WD-doctypes
<DT>Authors:
<DD>Daniel W. Connolly &lt;connolly@w3.org&gt;
</DL><P><HR>
<H2>Status of this document
</H2><P>This is [not yet] a W3C Working Draft for review by W3C members and
other
interested
parties. It is a draft document and may be updated, replaced or obsoleted
by
other documents at any time. It is inappropriate to use W3C Working Drafts
as reference material or to cite them as other than "work in progress".
A list of current W3C working drafts can be found at:
<A
href="http://www.w3.org/pub/WWW/TR">http://www.w3.org/pub/WWW/TR</A>
<P><B>Note:</B> since working drafts are subject to frequent change, you
are advised to reference
the above address, rather than the addresses of working drafts themselves.
<H2>Abstract
</H2><P>The HTML 2.0 specification, RFC1866, defines an SGML application
and an Internet media type. The specification notes that extensions are
planned, but only the <TT>text/html; level=2&nbsp;</TT>internet media type
and the <TT>"-//IETF//DTD HTML 2.0//EN"</TT> document type are defined. This
document suggests the use of URIs as system identifiers for document type
definitions, allowing decentralized evolution of the language. The use of
marked sections as a transition technique and the continued
use of the level mechanism for standardized points in the evolution path
are discussed.
<P><HR>
<H3>Contents
</H3><UL><LI><A HREF="#intro">Introduction</A>
<LI><A HREF="#problem">Problem Statement</A>
<LI><A href="#refs" rev=toc>References</A>
</UL><P><HR>
<H2><A name=intro>Introduction</A>
</H2><P>The goal of any HTML specification should be to promote that
confidence in the fidelity of communications using HTML. This means:
<OL><LI>making it clear to authors what idioms are available
<LI>making it clear to implementors how to interpret the
<LI>keeping HTML simple enough that it can be implemented
<LI>making HTML expressive enough that it can represent
a useful majority of the contemporary communications idioms in
this community
<LI>making some allowance for expressing idioms not captured
by the specification
<LI>addressing relavent interoperability issues with other
applications and technologies
</OL><P>HTML 2.0 specifies a set of idioms widely used and supported as of
June of 1994. But HTML and the web are still in a stage of rapid
innovation and evolution, and will be for the forseaable future. The
HTML 2.0 specification fails to accomodate this evolution--it fails to
meet goal #5, and goal #6 cannot be met by any frozen document, as
"contemporary communications idioms" evolve over time.
<P>Examples of this evolution include the introduction of forms and
tables. In each case, information providers suddenly had two kinds of
clients: those with support for the new feature, and those
without. They were faced with the following choices:
<DL><DT>Stick to the lowest common denominator
<DD>This sacrifices rich information delivery for ubiquitous access
<DT>Exploit the new feature
<DD>Some clients will fail to support the new feature, and in stead
see "noise." Some information providers employ a "You must have a
forms-capable browser to access this page" disclaimer.
<DT>Make the choice explicit
<DD>This is the "click here if your browser supports forms"
phenomenon. The information provider maintains two representations:
feature-rich and feature-poor. The consumer's readering experience is
disrupted to make an irrelevant technical decision that they may not
be equipped to make.
</DL><P>Optimally, the system should obviate the
need for information providers and consumers to deal with this issue
explicitly. Interoperability between new and old components should be
automatic.
<P>This document proposes a mechanism that obviates the need for
consumers to explicitly deal with the issue. The mechanism does not
alleviate the information provider's burden, but it does increase
reliability even in the case that information providers are unwilling
to invest the effort necessary to support old clients.
<H2><A NAME=problem>Problem Statement</A>
</H2><P>Consider the following documents:
<H3>Level 0: Simple HTML
</H3><PRE>
&lt;title&gt;Example: Simple HTML&lt;/title&gt;
&lt;p&gt;A paragraph with a &lt;a href="#dest"&gt;link&lt;/a&gt;.
&lt;ul&gt;
&lt;li&gt;a list
&lt;li&gt;of &lt;a href="dest"&gt;items
</PRE><H3>Level 1: Phrase Markup, Nested Lists, and Images
</H3><PRE>
&lt;title&gt;Example: Phrase Markup, Nested Lists, and Images&lt;/title&gt;
&lt;p&gt;A paragraph with &lt;em&gt;emphasis&lt;em&gt; and an &lt;img ALT="image"
SRC="foo.png"&gt;.
&lt;ol&gt;
&lt;li&gt;Section 1
&lt;li&gt;Section 2
 &lt;li&gt;Section 2.1
 &lt;li&gt;Section 2.2
&lt;li&gt;Section 3
&lt;/ol&gt;
</PRE><H3>Level 2: Forms
</H3><PRE>
&lt;title&gt;Example: Forms&lt;/title&gt;
&lt;h1&gt;Forms&lt;/h1&gt;
&lt;form action="/cgi-bin/test" method=POST&gt;
&lt;p&gt;&lt;input name=x&gt;
&lt;p&gt;&lt;input name=y&gt;
&lt;p&gt;&lt;input name=z&gt;
&lt;/form&gt;
</PRE><H3>Level 3: Tables, Objects, and Figures
</H3><PRE>
&lt;title&gt;Example: Tables, Inserts, and Figures&lt;/title&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Col 1 &lt;th&gt;Col 2 &lt;th&gt;Col 3
&lt;tr&gt;&lt;td&gt;A     &lt;td&gt;B     &lt;td&gt;C
&lt;tr&gt;&lt;td&gt;1     &lt;td&gt;2     &lt;td&gt;3
&lt;/table&gt;
&lt;fig&gt;
&lt;caption&gt;Figure 1: A Movie&lt;/caption&gt;
&lt;object data="movie.mpg"&gt;
[Movie elided]
&lt;/object&gt;
&lt;/fig&gt;
</PRE><P>There is a convention among HTML user agents to ignore unrecognized
markup.  Given the above documents, HTML user agents will behave
reliably for documents containing only markup they support. In the
face of unrecognized markup, the reliability varies:
<TABLE><CAPTION>HTML Document vs. User Agent Features
</CAPTION><TR><TH>Document:
</TH><TH>Level 0
</TH><TH>Level 1
</TH><TH>Level 2
</TH><TH>Level 3
</TH></TR><TR><TH>Level 0<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>phrase markup and images lost
</TD><TD>forms shown as noise
</TD><TD>tables and figure captions shown as noise
</TD></TR><TR><TH>Level 1<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>forms shown as noise
</TD><TD>tables and figure captions shown as noise
</TD></TR><TR><TH>Level 2<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>tables and figure captions shown as noise
</TD></TR><TR><TH>Level 3<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>100% fidelity
</TD></TR></TABLE><H2>A Robust Definition of the <TT>text/html</TT> Internet
Media Type
</H2><P>Actually, none of the above documents conforms to the specificatoin
for the <TT>text/html </TT>media type given in <A HREF="#html2">[RFC1866]</A>
-- they are missing a document type declaration, e.g.:
<PRE>&lt;!doctype html public "-//IETF//DTD HTML 2.0//EN"&gt;
</PRE><P>The HTML 2.0 specification advises implementors to infer the above declaration if none is given. This is poor advice since in practice, the chance that
such a document conforms to the HTML 2.0 DTD is very small
<A
HREF="#Adams95">[Adams95]</A> <FONT SIZE=-1>(cite Tim Bray at opentext, regarding
%age of valid HTML docs?)</FONT>
<P>Rather than binding&nbsp;<TT>text/html</TT> to any particular DTD, we define it to be and SGML document type that includes HTML level 1, as defined by <A HREF="#html2">[RFC1866]</A>. (An SGML document type t1 includes t2 if every document conforming to t2 also conforms to t1.)
<P>We define a <TT>text/html</TT> body to be an SGML document entity whose DTD is externally referenced; i.e. the body begins with one of
<PRE>&lt;!doctype html public "..." system "..."&gt;
&lt;!doctype html public "..."&gt;
&lt;!doctype html system "..."&gt;
&lt;!doctype html&gt;
</PRE><P>And we remove the default from the level parameter:
<P>
<DL><DT>Media Type name
<DD>text
<DT>Media subtype name
<DD>html
<DT>Required parameters
<DD>none
<DT>Optional parameters
<DD>level, charset
<DT>Encoding considerations
<DD>any encoding is allowed
<DT>Security considerations
<DD>Anchors, embedded images, and all other elements which contain URIs as
parameters may cause the URI to be dereferenced in response to user input.
In this case, the security considerations of [URL@@] apply.
<P>The widely deployed methods for submitting forms requests -- HTTP and
SMTP -- provide little assurance of confidentiality. Information providers
who request sensitive information via forms -- especially by way of the
`PASSWORD' type input field -- should be aware and make their users aware
of the lack of confidentiality.
</DL><P>The optional parameters are defined as follows:
<DL><DT>Level
<DD>The level parameter specifies the feature set used in the document. The
level is an integer number, implying that any features of same or lower level
may be present in the document. Level 1 is all features defined in
<A HREF="#html2">[RFC1866]</A> except those that require the FORM element.
Level 2 includes form processing. &nbsp;There is no default. In the absence
of a level parameter, the &lt;!doctype ...&gt; in the body determines the
level.
<DT>Charset
<DD>The charset parameter (as defined in section 7.1.1 of RFC 1521[MIME])
may be given to specify the character encoding scheme used to represent the
HTML document as a sequence of octets. The default value is outside the scope
of this specification; but for example, the default is `US-ASCII' in the
context of MIME mail, and `ISO-8859-1' in the context of HTTP [HTTP].
</DL><H2>Decentralized Definition of the HTML Document Type
</H2><P>The expectation is that in addition to the standard DTDs, the HTML
processing capabilities of a user agent are described by some DTD, and that
this DTD has a formal public identifier, a Uniform Resource Identifier (URI
or URL), or both.
<P>Most documents will be prepared for standard HTML user agents, and their
document type will be declared ala:
<PRE>&lt;!doctype html public "-//IETF//DTD HTML 2.0//EN"&gt;
</PRE><P>A Document prepared for a user agent with support for some other HTML dialect would have its document type declared using one of the following:
<PRE>&lt;!doctype html public "-//VendorCo Inc.//DTD HTML v1.4//EN"
	system "http://www.vendor.com/html-public-text/v1.4.dtd"&gt;
&lt;!doctype html system "http://www.vendor.com/html-public-text/v1.4.dtd"&gt;
</PRE><P>All user agents would have built-in support for the standard DTDs, plus a few popular de-jour DTDs. Some user agents would be able to accomodate new DTDs at runtime by fetching them from the network. User agents without this capability, on encountering an unknown DTD identifier, could warn that the document might not be processed as intended by the information provider.
<H2>Marked Sections for Robust Handling of Unknown Markup
</H2><P>The "ignore unrecognized markup"
convention is unacceptably unreliable in cases such as forms and tables.
<P>The improved convention is that marked sections are processed as
per [ISO8879] (see @@marked sections primer). Additionally, parameter
entity references of the form <TT>%if-xxx</TT> are presumed to resolve to
<TT>IGNORE</TT>, and those of the form <TT>%no-xxx</TT> are presumed
to resolve to <TT>INCLUDE</TT>, unless the DTD in effect has a declaration
for those names.
<P>Using this convention, consider the following enhanced document:
<H3>Level 3/1: Conditional Table
</H3><PRE>
&lt;doctype html system "http://www.w3.org/html-pubtext/960212/html.dtd"&gt;
&lt;title&gt;Example: Conditional Table&lt;/title&gt;
&lt;![ %if-table [
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Col 1 &lt;th&gt;Col 2 &lt;th&gt;Col 3
&lt;tr&gt;&lt;td&gt;A     &lt;td&gt;B     &lt;td&gt;C
&lt;tr&gt;&lt;td&gt;1     &lt;td&gt;2     &lt;td&gt;3
&lt;/table&gt;
]]&gt;
&lt;![ %no-table [
&lt;pre&gt;
Col 1     Col 2   Col 3
A         B       C
1         2       3
&lt;/pre&gt;
]]&gt;
</PRE><P>Assuming support for marked sections, an HTML 2.0 user agent will process the table marked up using &lt;pre&gt;, whereas a user agent that supports the 960212 DTD will process the &lt;table&gt; markup. A user agent that does not support the 960212 DTD, but does support tables, is likely to process the &lt;tables&gt; markup reliably, since its DTD is likely to have declarations ala:
<PRE>&lt;!entity % if-tables "INCLUDE"&gt;<BR>&lt;!entity % no-tables "IGNORE"&gt;
</PRE><P>and declarations for &lt;table&gt;, &lt;tr&gt;, &lt;td&gt;, etc. that match the 960212 DTD.
<P>This convention would have dealt gracefully with FORM and TABLES.
It has the potential to deal gracefully with SCRIPT, MATH, APPLET, etc.
<P>While the marked section markup may seem unwieldy, it is necessary
<EM>only</EM> when both of the following conditions hold:
<OL><LI>a feature hasn't been fully deployed, i.e. there is still a significant
installed base that doesn't support it and
<LI>the information provider needs "forwards compatibility"
-- i.e. they're willing to put more stuff in the document to be sure that
old browsers behave nicely.
</OL><P>Here are some cases to mull over, in roughly historical order:
<P>
<TABLE BORDER CELLPADDING="2"><TR><TH>DOCTYPE
</TH><TH>Features<BR>Used in Doc
</TH><TH>Features in	<BR>Marked Section?
</TH><TH>Browser	Capabilities
</TH><TH>Result
</TH></TR><TR><TD>1.0
</TD><TD>1.0
</TD><TD>no
</TD><TD>1.0
</TD><TD>100% reliable *1
</TD></TR><TR><TD>1.x
</TD><TD>1.0+phrase markup
</TD><TD>no
</TD><TD>1.0
</TD><TD>some signal loss *2
</TD></TR><TR><TD>2.0
</TD><TD>2.0lev1 (no forms)
</TD><TD>no
</TD><TD>2.0lev1
</TD><TD>100% reliable *1
</TD></TR><TR><TD>2.0
</TD><TD>2.0 incl forms
</TD><TD>no
</TD><TD>2.0lev1
</TD><TD>some form noise *3
</TD></TR><TR><TD>2.0
</TD><TD>2.0 incl forms
</TD><TD>no
</TD><TD>2.0
</TD><TD>100% reliable *1
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>no	(tables)
</TD><TD>2.0
</TD><TD>some table noise *3
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>no	(tables)
</TD><TD>3.x (tables)
</TD><TD>100% reliable *1
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl apology
</TD><TD>2.0+marked	sections
</TD><TD>100% reliable *4 (apology shown)
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl apology
</TD><TD>2.0
</TD><TD>some table noise,*5, apology
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl	apology
</TD><TD>2.0+tables
</TD><TD>98% reliable,*6 apology (uneeded)
</TD></TR><TR><TD>3.x	(tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl apology
</TD><TD>3.x(tables)	Marked S.
</TD><TD>100% reliable*1 (table shown)
</TD></TR></TABLE><DL><DT>*1
<DD>Standard features
<DT>*2
<DD>Unrecognized markup ignored &nbsp;without much disruption
<DT>*3
<DD>Unrecognized markup causes disruption
<DT>*4
<DD>Apology for lack of support shown
<DT>*5
<DD>Apology shown along with goofed up table
<DT>*6
<DD>Apology shown along with correctly processed table
</DL><P>In the table above, substitute any of script, style, math, embed,
etc. for forms/tables with the same result.
<P>The HTML 2.0 "ignore unknown tags" absorbs changes along the lines of
phrase markup and new IMG attributes ala *2. But for novel new features like
forms and tables, we see *3. Note that without marked sections, each non-trivial
feature introduced causes a transitional period involving lots of interactions
ala *3, with most things settling down ala *1, but an indefinite burden of
*3 style interactions due to outdated software.
<P>Until marked sections are supported, providers who use marked sections
are rewarded ala *5, but penalized ala *6. (They are apparently already to
live with this, as evidenced by the "if your browsers doesn't support forms,
..." apologies we see, even on forms-capable browsers.)
<P>With marked sections, non-trivial new features can be introduced with
interactions ala *4, with graceful transition back to style *1.
<H2>Format Negotiation Using Links and Resource Information
</H2><P>@@information provider maintains several variants; one corresponds
to the capabilities of most if his/her readership, and that's the one that's
shipped by default. It has links to the other variants, so that remedial
clients can downgrade at runtime.
<H2>Format Negotiation Using HTTP
</H2><P>@@see: tables deployment document
<P>The combination of relying on internal labelling (with external labelling
in the content type as an optimization) and marked sections is a viable
medium-to-long term solution.
<P>The internal labelling/marked section strategy is the equivalent ofthe
color TV solution: send the color signal to everybody, and the folks that
can't show the color just throw it away.
<P>The external labelling/format negotiation strategy is like having the
broadcasters send black-and-white signal to folks that request it, and color
to the rest. In some cases (like inline graphics formats), this is the right
thing to do. But it appears that in the vast majority of cases involving
new HTML features, it's just not worth the trouble.
<P>@@discuss negotiation based on user-agent, caching, etc.
<P><HR>
<H2>Appendix: Marked Sections Primer
</H2><P>See: <A HREF="http://www.ebt.com/usrbooks/teip3/2404">"Marked Sections"
in <CITE>TEI Gentle Intro to SGML</CITE></A>
<H2><A name=refs>References</A>
</H2><DL><DT><A NAME=Adams95>Adams, Nov 95</A>
<DD><PRE>Date: Thu, 9 Nov 95 13:03:39 EST
Message-Id:&lt;9511091801.AA04679@trubetzkoy.stonehand.com&gt;
From: Glenn Adams&lt;glenn@stonehand.com&gt;
To: Multiple recipients of list&lt;html-wg@oclc.org&gt;
</PRE><DT>T. Berners-Lee &amp; D. Connolly,
November 1995.
<DD>"Hypertext Markup Language - 2.0" &nbsp;<B><A name=html2>RFC 1866</A></B>
<A
href="ftp://ds.internic.net/rfc/rfc1866.txt">ftp://ds.internic.net/rfc/rfc1866.txt</A>
<DT>Altheim, Murray, Jan 1996
<DD><A HREF="http://ogopogo.nttc.edu/spec/html/modular-dtd.html"><CITE>A
Modular DTD Approach for HTML Specification</CITE></A> National Technology
Transfer Center, work in progress
<DT>Connolly, Jan 1996
<DD><A HREF="public-text/">W3C HTML Public Text Repository</A> work in progress
<DT>Connolly
<DD><A HREF="table-deployment"><CITE>Toward Graceful Deployment of
Tables</CITE></A>
<DT>Connolly, XXX
<DD><PRE>To: mwm@contessa.phone.net
cc: Multiple recipients of list &lt;html-wg@oclc.org&gt;
Subject: Reliable Interoperability [was: LiveScript and HTML ]
In-reply-to: Your message of "Mon, 16 Oct 1995 23:00:26 EDT."
             &lt;19951016.75EF780.11F50@contessa.phone.net&gt; 
Date: Tue, 17 Oct 1995 00:32:12 -0400
From: "Daniel W. Connolly" &lt;connolly@beach.w3.org&gt;
</PRE><DT>Clark, James
<DD>nsgmls -- a new SGML parser
<DT>Behlendorf , Jan 1996
<DD><PRE>Date: Sun, 7 Jan 1996 23:45:23 -0800 (PST)
From: Brian Behlendorf &lt;brian@organic.com&gt;
To: www-talk@w3.org
Subject: HTML variants and content negotiation
Message-Id: <A HREF="http://www.eit.com/msgid/Pine.SGI.3.91.960107232733.10147O-100000@fully.organic.com">&lt;Pine.SGI.3.91.960107232733.10147O-100000@fully.organic.com&gt;</A>
</PRE></DL><P><HR>
<A HREF="../"><IMG BORDER="0" ALIGN=Left SRC="../Icons/WWW/w3c_home.gif" ALT="W3C"
WIDTH="72" HEIGHT="48"></A>
The World Wide Web Consortium:
<A HREF="http://www.w3.org/">http://www.w3.org/</A>
</BODY></HTML>