WD-doctypes
19.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
<!doctype html public "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<TITLE>W3C WD: HTML Dialects: Internet Media Types and SGML Document Types</TITLE>
</HEAD><BODY><P>
<H2 align=right><A HREF="../"><IMG BORDER="0" ALT="W3C" SRC="../Icons/WWW/w3c_home.gif" ALIGN="left"
ALT="W3C:"></A> WD-doctypes-960302
</H2><H1 class=doctitle align=center>HTML Dialects: Internet Media and SGML
Document
Types
</H1><H3 align=center>W3C Working Draft 06-Mar-96
</H3><DL><DT>This version:
<DD>http://www.w3.org/pub/WWW/TR/WD-doctypes-960302
<BR>
$Id: WD-doctypes.html,v 1.12 1996/12/09 03:28:20 jigsaw Exp $
<DT>Latest version:
<DD>http://www.w3.org/pub/WWW/TR/WD-doctypes
<DT>Authors:
<DD>Daniel W. Connolly <connolly@w3.org>
</DL><P><HR>
<H2>Status of this document
</H2><P>This is [not yet] a W3C Working Draft for review by W3C members and
other
interested
parties. It is a draft document and may be updated, replaced or obsoleted
by
other documents at any time. It is inappropriate to use W3C Working Drafts
as reference material or to cite them as other than "work in progress".
A list of current W3C working drafts can be found at:
<A
href="http://www.w3.org/pub/WWW/TR">http://www.w3.org/pub/WWW/TR</A>
<P><B>Note:</B> since working drafts are subject to frequent change, you
are advised to reference
the above address, rather than the addresses of working drafts themselves.
<H2>Abstract
</H2><P>The HTML 2.0 specification, RFC1866, defines an SGML application
and an Internet media type. The specification notes that extensions are
planned, but only the <TT>text/html; level=2 </TT>internet media type
and the <TT>"-//IETF//DTD HTML 2.0//EN"</TT> document type are defined. This
document suggests the use of URIs as system identifiers for document type
definitions, allowing decentralized evolution of the language. The use of
marked sections as a transition technique and the continued
use of the level mechanism for standardized points in the evolution path
are discussed.
<P><HR>
<H3>Contents
</H3><UL><LI><A HREF="#intro">Introduction</A>
<LI><A HREF="#problem">Problem Statement</A>
<LI><A href="#refs" rev=toc>References</A>
</UL><P><HR>
<H2><A name=intro>Introduction</A>
</H2><P>The goal of any HTML specification should be to promote that
confidence in the fidelity of communications using HTML. This means:
<OL><LI>making it clear to authors what idioms are available
<LI>making it clear to implementors how to interpret the
<LI>keeping HTML simple enough that it can be implemented
<LI>making HTML expressive enough that it can represent
a useful majority of the contemporary communications idioms in
this community
<LI>making some allowance for expressing idioms not captured
by the specification
<LI>addressing relavent interoperability issues with other
applications and technologies
</OL><P>HTML 2.0 specifies a set of idioms widely used and supported as of
June of 1994. But HTML and the web are still in a stage of rapid
innovation and evolution, and will be for the forseaable future. The
HTML 2.0 specification fails to accomodate this evolution--it fails to
meet goal #5, and goal #6 cannot be met by any frozen document, as
"contemporary communications idioms" evolve over time.
<P>Examples of this evolution include the introduction of forms and
tables. In each case, information providers suddenly had two kinds of
clients: those with support for the new feature, and those
without. They were faced with the following choices:
<DL><DT>Stick to the lowest common denominator
<DD>This sacrifices rich information delivery for ubiquitous access
<DT>Exploit the new feature
<DD>Some clients will fail to support the new feature, and in stead
see "noise." Some information providers employ a "You must have a
forms-capable browser to access this page" disclaimer.
<DT>Make the choice explicit
<DD>This is the "click here if your browser supports forms"
phenomenon. The information provider maintains two representations:
feature-rich and feature-poor. The consumer's readering experience is
disrupted to make an irrelevant technical decision that they may not
be equipped to make.
</DL><P>Optimally, the system should obviate the
need for information providers and consumers to deal with this issue
explicitly. Interoperability between new and old components should be
automatic.
<P>This document proposes a mechanism that obviates the need for
consumers to explicitly deal with the issue. The mechanism does not
alleviate the information provider's burden, but it does increase
reliability even in the case that information providers are unwilling
to invest the effort necessary to support old clients.
<H2><A NAME=problem>Problem Statement</A>
</H2><P>Consider the following documents:
<H3>Level 0: Simple HTML
</H3><PRE>
<title>Example: Simple HTML</title>
<p>A paragraph with a <a href="#dest">link</a>.
<ul>
<li>a list
<li>of <a href="dest">items
</PRE><H3>Level 1: Phrase Markup, Nested Lists, and Images
</H3><PRE>
<title>Example: Phrase Markup, Nested Lists, and Images</title>
<p>A paragraph with <em>emphasis<em> and an <img ALT="image"
SRC="foo.png">.
<ol>
<li>Section 1
<li>Section 2
<li>Section 2.1
<li>Section 2.2
<li>Section 3
</ol>
</PRE><H3>Level 2: Forms
</H3><PRE>
<title>Example: Forms</title>
<h1>Forms</h1>
<form action="/cgi-bin/test" method=POST>
<p><input name=x>
<p><input name=y>
<p><input name=z>
</form>
</PRE><H3>Level 3: Tables, Objects, and Figures
</H3><PRE>
<title>Example: Tables, Inserts, and Figures</title>
<table>
<tr><th>Col 1 <th>Col 2 <th>Col 3
<tr><td>A <td>B <td>C
<tr><td>1 <td>2 <td>3
</table>
<fig>
<caption>Figure 1: A Movie</caption>
<object data="movie.mpg">
[Movie elided]
</object>
</fig>
</PRE><P>There is a convention among HTML user agents to ignore unrecognized
markup. Given the above documents, HTML user agents will behave
reliably for documents containing only markup they support. In the
face of unrecognized markup, the reliability varies:
<TABLE><CAPTION>HTML Document vs. User Agent Features
</CAPTION><TR><TH>Document:
</TH><TH>Level 0
</TH><TH>Level 1
</TH><TH>Level 2
</TH><TH>Level 3
</TH></TR><TR><TH>Level 0<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>phrase markup and images lost
</TD><TD>forms shown as noise
</TD><TD>tables and figure captions shown as noise
</TD></TR><TR><TH>Level 1<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>forms shown as noise
</TD><TD>tables and figure captions shown as noise
</TD></TR><TR><TH>Level 2<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>tables and figure captions shown as noise
</TD></TR><TR><TH>Level 3<BR>User Agent
</TH><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>100% fidelity
</TD><TD>100% fidelity
</TD></TR></TABLE><H2>A Robust Definition of the <TT>text/html</TT> Internet
Media Type
</H2><P>Actually, none of the above documents conforms to the specificatoin
for the <TT>text/html </TT>media type given in <A HREF="#html2">[RFC1866]</A>
-- they are missing a document type declaration, e.g.:
<PRE><!doctype html public "-//IETF//DTD HTML 2.0//EN">
</PRE><P>The HTML 2.0 specification advises implementors to infer the above declaration if none is given. This is poor advice since in practice, the chance that
such a document conforms to the HTML 2.0 DTD is very small
<A
HREF="#Adams95">[Adams95]</A> <FONT SIZE=-1>(cite Tim Bray at opentext, regarding
%age of valid HTML docs?)</FONT>
<P>Rather than binding <TT>text/html</TT> to any particular DTD, we define it to be and SGML document type that includes HTML level 1, as defined by <A HREF="#html2">[RFC1866]</A>. (An SGML document type t1 includes t2 if every document conforming to t2 also conforms to t1.)
<P>We define a <TT>text/html</TT> body to be an SGML document entity whose DTD is externally referenced; i.e. the body begins with one of
<PRE><!doctype html public "..." system "...">
<!doctype html public "...">
<!doctype html system "...">
<!doctype html>
</PRE><P>And we remove the default from the level parameter:
<P>
<DL><DT>Media Type name
<DD>text
<DT>Media subtype name
<DD>html
<DT>Required parameters
<DD>none
<DT>Optional parameters
<DD>level, charset
<DT>Encoding considerations
<DD>any encoding is allowed
<DT>Security considerations
<DD>Anchors, embedded images, and all other elements which contain URIs as
parameters may cause the URI to be dereferenced in response to user input.
In this case, the security considerations of [URL@@] apply.
<P>The widely deployed methods for submitting forms requests -- HTTP and
SMTP -- provide little assurance of confidentiality. Information providers
who request sensitive information via forms -- especially by way of the
`PASSWORD' type input field -- should be aware and make their users aware
of the lack of confidentiality.
</DL><P>The optional parameters are defined as follows:
<DL><DT>Level
<DD>The level parameter specifies the feature set used in the document. The
level is an integer number, implying that any features of same or lower level
may be present in the document. Level 1 is all features defined in
<A HREF="#html2">[RFC1866]</A> except those that require the FORM element.
Level 2 includes form processing. There is no default. In the absence
of a level parameter, the <!doctype ...> in the body determines the
level.
<DT>Charset
<DD>The charset parameter (as defined in section 7.1.1 of RFC 1521[MIME])
may be given to specify the character encoding scheme used to represent the
HTML document as a sequence of octets. The default value is outside the scope
of this specification; but for example, the default is `US-ASCII' in the
context of MIME mail, and `ISO-8859-1' in the context of HTTP [HTTP].
</DL><H2>Decentralized Definition of the HTML Document Type
</H2><P>The expectation is that in addition to the standard DTDs, the HTML
processing capabilities of a user agent are described by some DTD, and that
this DTD has a formal public identifier, a Uniform Resource Identifier (URI
or URL), or both.
<P>Most documents will be prepared for standard HTML user agents, and their
document type will be declared ala:
<PRE><!doctype html public "-//IETF//DTD HTML 2.0//EN">
</PRE><P>A Document prepared for a user agent with support for some other HTML dialect would have its document type declared using one of the following:
<PRE><!doctype html public "-//VendorCo Inc.//DTD HTML v1.4//EN"
system "http://www.vendor.com/html-public-text/v1.4.dtd">
<!doctype html system "http://www.vendor.com/html-public-text/v1.4.dtd">
</PRE><P>All user agents would have built-in support for the standard DTDs, plus a few popular de-jour DTDs. Some user agents would be able to accomodate new DTDs at runtime by fetching them from the network. User agents without this capability, on encountering an unknown DTD identifier, could warn that the document might not be processed as intended by the information provider.
<H2>Marked Sections for Robust Handling of Unknown Markup
</H2><P>The "ignore unrecognized markup"
convention is unacceptably unreliable in cases such as forms and tables.
<P>The improved convention is that marked sections are processed as
per [ISO8879] (see @@marked sections primer). Additionally, parameter
entity references of the form <TT>%if-xxx</TT> are presumed to resolve to
<TT>IGNORE</TT>, and those of the form <TT>%no-xxx</TT> are presumed
to resolve to <TT>INCLUDE</TT>, unless the DTD in effect has a declaration
for those names.
<P>Using this convention, consider the following enhanced document:
<H3>Level 3/1: Conditional Table
</H3><PRE>
<doctype html system "http://www.w3.org/html-pubtext/960212/html.dtd">
<title>Example: Conditional Table</title>
<![ %if-table [
<table>
<tr><th>Col 1 <th>Col 2 <th>Col 3
<tr><td>A <td>B <td>C
<tr><td>1 <td>2 <td>3
</table>
]]>
<![ %no-table [
<pre>
Col 1 Col 2 Col 3
A B C
1 2 3
</pre>
]]>
</PRE><P>Assuming support for marked sections, an HTML 2.0 user agent will process the table marked up using <pre>, whereas a user agent that supports the 960212 DTD will process the <table> markup. A user agent that does not support the 960212 DTD, but does support tables, is likely to process the <tables> markup reliably, since its DTD is likely to have declarations ala:
<PRE><!entity % if-tables "INCLUDE"><BR><!entity % no-tables "IGNORE">
</PRE><P>and declarations for <table>, <tr>, <td>, etc. that match the 960212 DTD.
<P>This convention would have dealt gracefully with FORM and TABLES.
It has the potential to deal gracefully with SCRIPT, MATH, APPLET, etc.
<P>While the marked section markup may seem unwieldy, it is necessary
<EM>only</EM> when both of the following conditions hold:
<OL><LI>a feature hasn't been fully deployed, i.e. there is still a significant
installed base that doesn't support it and
<LI>the information provider needs "forwards compatibility"
-- i.e. they're willing to put more stuff in the document to be sure that
old browsers behave nicely.
</OL><P>Here are some cases to mull over, in roughly historical order:
<P>
<TABLE BORDER CELLPADDING="2"><TR><TH>DOCTYPE
</TH><TH>Features<BR>Used in Doc
</TH><TH>Features in <BR>Marked Section?
</TH><TH>Browser Capabilities
</TH><TH>Result
</TH></TR><TR><TD>1.0
</TD><TD>1.0
</TD><TD>no
</TD><TD>1.0
</TD><TD>100% reliable *1
</TD></TR><TR><TD>1.x
</TD><TD>1.0+phrase markup
</TD><TD>no
</TD><TD>1.0
</TD><TD>some signal loss *2
</TD></TR><TR><TD>2.0
</TD><TD>2.0lev1 (no forms)
</TD><TD>no
</TD><TD>2.0lev1
</TD><TD>100% reliable *1
</TD></TR><TR><TD>2.0
</TD><TD>2.0 incl forms
</TD><TD>no
</TD><TD>2.0lev1
</TD><TD>some form noise *3
</TD></TR><TR><TD>2.0
</TD><TD>2.0 incl forms
</TD><TD>no
</TD><TD>2.0
</TD><TD>100% reliable *1
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>no (tables)
</TD><TD>2.0
</TD><TD>some table noise *3
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>no (tables)
</TD><TD>3.x (tables)
</TD><TD>100% reliable *1
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl apology
</TD><TD>2.0+marked sections
</TD><TD>100% reliable *4 (apology shown)
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl apology
</TD><TD>2.0
</TD><TD>some table noise,*5, apology
</TD></TR><TR><TD>3.x(tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl apology
</TD><TD>2.0+tables
</TD><TD>98% reliable,*6 apology (uneeded)
</TD></TR><TR><TD>3.x (tables)
</TD><TD>2.0+tables
</TD><TD>yes, incl apology
</TD><TD>3.x(tables) Marked S.
</TD><TD>100% reliable*1 (table shown)
</TD></TR></TABLE><DL><DT>*1
<DD>Standard features
<DT>*2
<DD>Unrecognized markup ignored without much disruption
<DT>*3
<DD>Unrecognized markup causes disruption
<DT>*4
<DD>Apology for lack of support shown
<DT>*5
<DD>Apology shown along with goofed up table
<DT>*6
<DD>Apology shown along with correctly processed table
</DL><P>In the table above, substitute any of script, style, math, embed,
etc. for forms/tables with the same result.
<P>The HTML 2.0 "ignore unknown tags" absorbs changes along the lines of
phrase markup and new IMG attributes ala *2. But for novel new features like
forms and tables, we see *3. Note that without marked sections, each non-trivial
feature introduced causes a transitional period involving lots of interactions
ala *3, with most things settling down ala *1, but an indefinite burden of
*3 style interactions due to outdated software.
<P>Until marked sections are supported, providers who use marked sections
are rewarded ala *5, but penalized ala *6. (They are apparently already to
live with this, as evidenced by the "if your browsers doesn't support forms,
..." apologies we see, even on forms-capable browsers.)
<P>With marked sections, non-trivial new features can be introduced with
interactions ala *4, with graceful transition back to style *1.
<H2>Format Negotiation Using Links and Resource Information
</H2><P>@@information provider maintains several variants; one corresponds
to the capabilities of most if his/her readership, and that's the one that's
shipped by default. It has links to the other variants, so that remedial
clients can downgrade at runtime.
<H2>Format Negotiation Using HTTP
</H2><P>@@see: tables deployment document
<P>The combination of relying on internal labelling (with external labelling
in the content type as an optimization) and marked sections is a viable
medium-to-long term solution.
<P>The internal labelling/marked section strategy is the equivalent ofthe
color TV solution: send the color signal to everybody, and the folks that
can't show the color just throw it away.
<P>The external labelling/format negotiation strategy is like having the
broadcasters send black-and-white signal to folks that request it, and color
to the rest. In some cases (like inline graphics formats), this is the right
thing to do. But it appears that in the vast majority of cases involving
new HTML features, it's just not worth the trouble.
<P>@@discuss negotiation based on user-agent, caching, etc.
<P><HR>
<H2>Appendix: Marked Sections Primer
</H2><P>See: <A HREF="http://www.ebt.com/usrbooks/teip3/2404">"Marked Sections"
in <CITE>TEI Gentle Intro to SGML</CITE></A>
<H2><A name=refs>References</A>
</H2><DL><DT><A NAME=Adams95>Adams, Nov 95</A>
<DD><PRE>Date: Thu, 9 Nov 95 13:03:39 EST
Message-Id:<9511091801.AA04679@trubetzkoy.stonehand.com>
From: Glenn Adams<glenn@stonehand.com>
To: Multiple recipients of list<html-wg@oclc.org>
</PRE><DT>T. Berners-Lee & D. Connolly,
November 1995.
<DD>"Hypertext Markup Language - 2.0" <B><A name=html2>RFC 1866</A></B>
<A
href="ftp://ds.internic.net/rfc/rfc1866.txt">ftp://ds.internic.net/rfc/rfc1866.txt</A>
<DT>Altheim, Murray, Jan 1996
<DD><A HREF="http://ogopogo.nttc.edu/spec/html/modular-dtd.html"><CITE>A
Modular DTD Approach for HTML Specification</CITE></A> National Technology
Transfer Center, work in progress
<DT>Connolly, Jan 1996
<DD><A HREF="public-text/">W3C HTML Public Text Repository</A> work in progress
<DT>Connolly
<DD><A HREF="table-deployment"><CITE>Toward Graceful Deployment of
Tables</CITE></A>
<DT>Connolly, XXX
<DD><PRE>To: mwm@contessa.phone.net
cc: Multiple recipients of list <html-wg@oclc.org>
Subject: Reliable Interoperability [was: LiveScript and HTML ]
In-reply-to: Your message of "Mon, 16 Oct 1995 23:00:26 EDT."
<19951016.75EF780.11F50@contessa.phone.net>
Date: Tue, 17 Oct 1995 00:32:12 -0400
From: "Daniel W. Connolly" <connolly@beach.w3.org>
</PRE><DT>Clark, James
<DD>nsgmls -- a new SGML parser
<DT>Behlendorf , Jan 1996
<DD><PRE>Date: Sun, 7 Jan 1996 23:45:23 -0800 (PST)
From: Brian Behlendorf <brian@organic.com>
To: www-talk@w3.org
Subject: HTML variants and content negotiation
Message-Id: <A HREF="http://www.eit.com/msgid/Pine.SGI.3.91.960107232733.10147O-100000@fully.organic.com"><Pine.SGI.3.91.960107232733.10147O-100000@fully.organic.com></A>
</PRE></DL><P><HR>
<A HREF="../"><IMG BORDER="0" ALIGN=Left SRC="../Icons/WWW/w3c_home.gif" ALT="W3C"
WIDTH="72" HEIGHT="48"></A>
The World Wide Web Consortium:
<A HREF="http://www.w3.org/">http://www.w3.org/</A>
</BODY></HTML>