NOTE-xh-19980511
21.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/sgml/HTML4-loose.dtd">
<HTML>
<HEAD>
<TITLE>XML in HTML Meeting Report</TITLE>
</HEAD>
<BODY>
<DIV class="header">
<H2 align="right">
<A href="http://www.w3.org/">
<IMG border="none" align="left" alt="W3C" src="http://www.w3.org/Icons/WWW/w3c_home"></A>NOTE-xh-19980511
</H2>
<H1 align="center">
XML in HTML Meeting Report
</H1>
<H3 align="center">
W3C Note 11 May 1998
</H3>
<DL>
<DT>
This Version:
<DD>
<A HREF="http://www.w3.org/TR/1998/NOTE-xh-19980511">http://www.w3.org/TR/1998/NOTE-xh-19980511</A><BR>
$Date: 2007/01/26 10:12:49 $
<DT>
Latest Version:
<DD>
<A HREF="http://www.w3.org/TR/NOTE-xh">http://www.w3.org/TR/NOTE-xh</A>
<DT>
Editors:
<DD>
<A href="http://www.w3.org/People/Connolly/">Dan Connolly</A>
<A href="mailto:connolly@w3.org"><TT><connolly@w3.org></TT></A> W3C<BR>
Lauren Wood
<A href="mailto:lwood@sq.com"><TT><lauren@softquad.com></TT></A> Softquad
</DL>
<H2>
Status of This Document
</H2>
<P>
This document summarizes the discussion and conclusions of a meeting held
to coordinate across several W3C Working Groups. While the decisions of this
forum are not binding on any of the working groups, they represent substantial
experience and analysis and should guide future work.
<P>
Please direct comments to
<A HREF="../../MarkUp/Forums#www-html">www-html</A>, a public discussion
forum.
<P>
This document is a NOTE made available by the W3 Consortium for discussion
only. This indicates no endorsement of its content, nor that the Consortium
has, is, or will be allocating any resources to the issues addressed by the
NOTE.
<P>
<HR>
<H2>
Contents
</H2>
<OL>
<LI>
<A HREF="#About">About the Meeting</A>
<OL>
<LI>
<A HREF="#Background">Background</A>
<LI>
<A HREF="#who">Participants</A>
</OL>
<LI>
<A HREF="#Summary">Summary of Discussion</A>
<OL>
<LI>
<A HREF="#RDF">RDF Requirements</A>
<LI>
<A HREF="#MathML">MathML Requirements</A>
<LI>
<A HREF="#Types">Types of HTML</A>
<LI>
<A HREF="#General">General Requirements</A>
<LI>
<A HREF="#sol">Possible Solutions</A>
</OL>
<LI>
<A HREF="#Conclusions">Conclusions and Future Work</A>
<OL>
<LI>
<A HREF="#link">Including XML by Reference</A>
<LI>
<A HREF="#attrs">Using Attributes to Hide New Idioms</A>
<LI>
<A HREF="#xml-block">Using <TT><XML></TT>, an HTML Enhancement</A>
<LI>
<A HREF="#script-hack">Using Script to Hide Content in Older Browsers</A>
<LI>
<A HREF="#sprinkles">Using Namespaces, Stylsheets, and the DOM</A>
</OL>
<LI>
<A HREF="#References">References</A>
</OL>
</DIV>
<P>
<HR>
<H2>
<A NAME="About">About the Meeting</A>
</H2>
<P>
A number of issues regarding the use of XML<A HREF="#XML">[XML]</A> in HTML
documents were brought to the attention of the W3C Hypertext Coordination
Group. In particular, MathML<A HREF="#mathml">[MathML]</A> and
RDF<A href="#rdf-syntax">[RDF]</A> are written in XML and intended to be
used in HTML documents.
<P>
In response, the coordination group held a meeting 11-12 Feb 1998 in San
Jose, CA. We would like to thank the host, Sun Microsystems.
<H3>
<A NAME="Background">Background</A>
</H3>
<P>
As discussed in <A HREF="#Dialects">[Dialects]</A>, evolution of the HTML
specification proceeds by introduction of new idioms which interact with
deployed software in one of the following ways:
<DL>
<DT>
The idiom is ignored altogether.
<DD>
for example, <TT><img src="..."></TT> was ignored by the deployed software
base when it was introduced. New empty elements and new attributes generally
behave this way.
<DT>
The enhanced functionality of the new idiom is ignored, but the content is
otherwise handled sensibly.
<DD>
for example, <TT><em>abc</em></TT> displays without emphasis
on some very old user agents. New "inline" elements often behave this way.
<DT>
The idiom is disruptive in deployed software
<DD>
for example, forms and tables display as a jumble of noise in software deployed
before they were introduced. New block elements are particularly difficult
to deploy gracefully.
</DL>
<P>
For the past few years, the HTML Working Group has vetted new proposals on
behalf of the web community, considering the value of each versus the cost
of deployment. But with the introduction of XML into the web, markup design
is decentralized. Each community or even each user can use whatever elements
and attributes they choose and give them whatever meaning and significance
they choose. As MathML and RDF show, at least some of this XML markup is
intended for use inside HTML documents.
<P>
This meeting explored mechanisms to use XML markup in HTML documents: existing
mechanisms and possible enhancements. In particular:
<UL>
<LI>
how do we "hide" new idioms from deployed software?
<LI>
how do we introduce new idioms with distinctive display characteristics,
such as MathML?
</UL>
<H3>
<A NAME="who">Participants</A>
</H3>
<P>
Participants from all W3C working groups, especially RDF, MathML, CSS&FP,
and XML, and DOM were invited. A wide variety of experience and requirements
were represented by the meeting participants:
<UL>
<LI>
Vidur Apparao, Netscape
<LI>
Jon Bosak, Sun (host)
<LI>
Dean Burson, Lotus
<LI>
Dan Connolly, W3C (co-chair)
<LI>
Ramanathan Guha, Netscape
<LI>
Bruce Hunt, Adobe
<LI>
Jacob Levy, Sun
<LI>
Eve Maler, ArborText
<LI>
Murray Maloney, CN Group
<LI>
John McCarthy, Berkeley Labs
<LI>
Robert Miner, University of Minnesota
<LI>
Scott Isaacs, Microsoft
<LI>
Jean Paoli, Microsoft
<LI>
T.V. Raman, Adobe
<LI>
Nisheeth Ranjan, Netscape
<LI>
David Singer, IBM
<LI>
Bob Sutor, IBM
<LI>
Ralph Swick, W3C
<LI>
Paul Topping, Design Science
<LI>
Chris Wilson, Microsoft
<LI>
Lauren Wood, SoftQuad (co-chair)
</UL>
<H3>
Miscellaneous
</H3>
<P>
The participants request that W3C make the W3C site searchable.
<H2>
<A NAME="Summary">Summary of Discussion</A>
</H2>
<H3>
<A NAME="RDF">RDF Requirements</A>
</H3>
<P>
The
<A HREF="http://www.w3.org/RDF/Group/1998/02/WD-rdf-syntax-19980216/#usage">Appendix
B</A> of <A href="#rdf-syntax">[RDF]</A> says:
<BLOCKQUOTE>
The recommended technique for embedding RDF statements in an HTML document
is simply to insert the RDF in-line. This will make the resulting document
non-conformant to HTML specifications up to and including HTML 4.0 but the
RDF Working Group hopes that the HTML specification will evolve to support
this.
</BLOCKQUOTE>
<P>
The discussion around the RDF requirements showed that possible solutions
for RDF included putting all the information into attributes; putting it
in an external file; and putting it at the end of the document. in general
the participants thought that putting information into attributes was safer
than putting it in an external file because of worries about security and
forcing tools to be able to cope with multiple files. Since many tools already
have to cope with multiple files, other participants thought this was not
a drawback where security was not an issue. Some participants thought that
putting the information in an external file would sometimes be a necessity,
so tools would have to learn to cope.
<H3>
<A NAME="MathML">MathML Requirements</A>
</H3>
<P>
MathML has many requirements. One of these is a system that can cope with
several small chunks of XML in one document, since a document may have many
small equations. It has extreme formatting requirements, only some of which
are shared by other objects. There was some discussion of MathML needs in
terms of the DOM and formatting properties. The MathML has to be able to
be passed as a chunk to an external renderer, and the XML has to be able
to be formatted in a reasonable way. The MathML does not include HTML elements
within it. That was discussed within the MathML WG, but rejected. The requirement
that the content of MathML should not show up in down-level browsers was
not as strong for MathML as for RDF, although some of the participants thought
it would be best.
<H3>
<A NAME="Types">Types of HTML</A>
</H3>
<P>
The participants came to the conclusion that there was definite agreement
on doing an XML block, where the contents of the block are well-formed XML,
without any HTML semantics. There was much discussion about whether there
was a reasonable method to include significant non-standard non-empty elements
could be found, and whether there was a possibility of defining some sort
of "good" HTML that people would use. Reasons for not allowing HTML semantics
in the XML block, even on elements with the same element types as exist in
HTML, included
<OL>
<LI>
Browsers would need to expose rendering model to other processors too soon.
<LI>
Different error-handling mechanisms
<LI>
All XML processors would need to process HTML, and users might expect that
processing to match current HTML browsers
</OL>
<P>
There was also some support for doing an XML version of HTML, where all the
XML rules would apply.
<P>
The discussion about whether it was possible to require that the contents
of any non-standard elements be well-formed XML mostly came to the conclusion
that it wasn't; or that it would be extremely expensive for those users simply
wanting to add, e.g., a CHAPTER element to their pages. There was support
for the notion that there is a difference between adding XML to pages (where
the contents of the XML would be well-formed XML) and adding unknown elements
in a standard way to HTML (where the contents of the unknown element would
not follow XML well-formed rules.) Whether the HTML in an unknown HTML element
needed to be "good" HTML wasn't fully clarified at the meeting.
<P>
Another problem is that old browsers render PIs.
<H3>
<A NAME="General">General Requirements</A>
</H3>
<P>
During the discussion the following requirements were generally agreed upon.
<UL>
<LI>
A method in HTML to declare that a tag begins a block of XML
<LI>
A method in HTML to declare that an unknown tag is significant (versus the
default "ignored" case), and whether the tag is empty or not.
</UL>
<P>
Agreement on terminology: XML blocks, significant non-standard HTML elements
(sometimes also called sprinkles), and crud (or real-world HTML). But how
do we distinguish between XML blocks and significant elements? An XML block
contains XML -- not HTML. A significant element contains HTML -- not XML
(unless it's empty, of course; we have to be able to distinguish between
empty and non-empty).
<H3>
<A NAME="sol">Possible Solutions</A>
</H3>
<P>
The question of how to "sprinkle" non-standard elements in an HTML document
while retaining HTML semantics of all elements with HTML element and attribute
types devoured most of the meeting. We did not come to a final conclusion
on this subject. One proposed solution was to use new elements called CONTAINER
and LEAF, with the CLASS attribute used to show the type. The drawback is
that users can't define non-standard attributes. There was also much discussion
as to whether users would accept this sort of solution, or whether they would
want to invent their own element types. It was felt that this solution would
allow users to keep on using "real" HTML (a.k.a crud) inside the wrapper
elements.
<P>
Another proposal was to allow users to define their own wrapper elements.
If all elements within the block have end tags, even if they are EMPTY elements,
then this could be the way to extensible HTML (not XML). There were several
points against this, including the large number of non-standard EMPTY elements
that already exist. Many participants thought that defining browser behaviour
for this would be almost impossible, and that migrating HTML users to XML
with the HTML tagset was a better solution.
<P>
How to clean up HTML came up again and again in the discussions. The participants
agreed that it is impossible in the general case to create valid HTML from
an arbitrary page on the Web without human intervention. Users will not want
to risk breaking documents which function. Current HTML has three components:
the element type names, default rendering, and semantics (e.g. forms).
<P>
There was a strong contingent that said users should wait for XML tools to
become generally available and use those, rather than trying to add XML to
HTML.
<P>
The MathML group would like a mechanism to tell browsers a plain-text string
to render, if the equation can't be rendered. This sort of mechanism would
potentially be useful for other XML content with high rendering requirements
as well.
<P>
The biggest reason to come up with a standard method for adding XML (or unknown
HTML) to HTML is to allow poeple to use styles and the DOM with these elements.
Currently they can't. Browsers do not apply CSS styles to unknown elements,
and unknown container elements are not exposed as containers in the MSIE
object model. (The DOM WG decided not to tackle the problem, and only talks
about valid HTML 4.0 documents, and XML as a separate entity.)
<P>
A potential solution was to write HTML as XML, i.e. with MIME-type text/xml.
Then all the XML rules would apply. One problem with this is that some browsers
sniff the document irrespective of MIME-type and display the content if it
looks like HTML according to some heuristic<A HREF="#InetSDK">[InetSDK]</A>,
<A HREF="http://www.microsoft.com/msdn/sdk/inetsdk/help/itt/monikers/appendix_a.htm">Appendix
A</A>. This may include, for example, having a TITLE element anywhere within
the first 200 bytes of the document. Thus document providers may have to
add a comment long enough to get rid of the heuristics.
<H2>
<A NAME="Conclusions">Conclusions and Future Work</A>
</H2>
<H3>
<A NAME="link">Including XML by Reference</A>
</H3>
<P>
The first option for using XML in HTML documents is to include it by reference,
using <TT><LINK></TT>, <TT><A></TT>,
<TT><OBJECT></TT> or perhaps even <TT><IMG></TT>. This markup
conforms to existing W3C Recommendations. This gives predictable behaviour
across the whole spectrum of HTML user agents, at the cost of managing and
accessing the compound document.
<H3>
<A NAME="attrs">Using Attributes to Hide New Idioms</A>
</H3>
<P>
Another option with predictable behaviour is to use tags and attributes only,
and avoid character data which will be displayed by deployed software. Strictly
speaking, documents enhanced this way do not conform to the HTML 2, 3.2,
or 4.0 specification, but each of those specifications included a note to
implementors to ignore unknown attributes.
<P>
The XML namespace facility<A HREF="#XML-Names">[XML-Names]</A> should be
used to manage the risk of name collisions for new attributes and elements.
Note that unfortunately, much of the deployed base of user agents will display
XML namespace declarations as text.
<H3>
<A NAME="xml-block">Using <TT><XML></TT>, an HTML Enhancement</A>
</H3>
<P>
The linking and attributes mechanisms do not satisfy all of the requirements
presented at the meeting. It was agreed that an enhancement to HTML to accomodate
XML blocks is necessary.
<P>
The definition of an XML block is a chunk of well-formed XML that is inside
an HTML document. Any elements within the chunk that happen to have the same
element types as HTML elements are <EM>not</EM> considered to be HTML elements.
The error-handling as defined in the XML specification applies, i.e. the
processor <EM>must</EM> halt on well-formedness errors.
<P>
There were two proposals for this. (Other proposals that were discussed were
discovered to be variations of these).
<OL>
<LI>
using namespaces, which means the presence of a colon in an element type
implies that the contents are well-formed XML
<LI>
using a specific element type (the discussion centered around XML and XML-BLOCK
and eventually we settled for XML)
</OL>
<P>
Using a specific element type has the advantage that the meaning is clear,
and that attribute can be added to the element for such things as MIME-type
and a link to an external file containing the XML content.
<P>
For the XML block case, the group decided on a vote of 10 for and 1 abstension
(none against) to use an element called XML. This must be added to a future
version of HTML. The attributes are TYPE for the MIME-type and SRC for the
URL of the content if it is in an external file. The contents of the XML
element are XML. There is an xml PI at the beginning of the XML block that
contains all other information that the XML block needs.
<H3>
<A NAME="script-hack">Using Script to Hide Content in Older Browsers</A>
</H3>
<P>
Interoperability with the 3.0 generation of browsers is required for successful
deployment of RDF, among other applications. This means that the XML block
is not a complete solution either.
<P>
There are a number of ways in which content can be made to not show up in
browsers that don't understand the element.
<OL>
<LI>
the XML could be in a separate file, linked to from the HTML document in
some way.
<LI>
the XML could be in the HEAD of the HTML document
<LI>
the DTD for the XML fragment could be written in such a way that all content
appears as attribute values
<LI>
the XML content could be put at the end of the document, which doesn't really
hide it, but this method does get the content out of the way of the main
document content.
</OL>
<P>
Of these, putting the content in the HEAD is the most problematic because
of the difficulties for HTML browsers of defining where the HEAD ends.
<P>
Any of these methods would be considered to not break HTML or XML, and the
participants decided that these should be written up (with the exception
of putting content in the HEAD) as the recommended methods for coping with
XML where the content should not show up in older browsers.
<P>
There are, of course, times when none of these methods are suitable for some
reason. The group therefore decided to also figure out which of the many
unliked methods was the least undesirable. The choices were
<UL>
<LI>
putting the XML content inside a comment
<LI>
putting the XML content inside a SCRIPT element with the value of the LANGUAGE
attribute being "XML"
<LI>
putting the XML content inside an APPLET element
</UL>
<P>
The proposal to put the XML content inside an OBJECT element was quickly
rejected, as it would not work in Netscape Navigator 3.0.
<P>
The problem with APPLET is that if the user has applet loading turned off,
the content will show. The problem with SCRIPT is that it breaks the currently
defined content model of SCRIPT. There were also worries about whether future
XML users will use the SCRIPT element themselves, which would not be possible
if it were a reserved element. This concern wasn't shared by the entire group.
The problem with using comments is that comments are meant to not contain
parsed data, and users couldn't put another comment inside the XML content.
<P>
The vote (1 per company) was 1 for comments, 1 for APPLET, and 8 for SCRIPT.
<P>
Details of the XML block and SCRIPT mechanisms are the subject of a Working
Draft in progress.
<H3>
<A NAME="sprinkles">Using Namespaces, Stylsheets, and the DOM</A>
</H3>
<P>
The discussion of using XML markup in HTML documents such that it would be
"significant" to stylesheet and DOM implementations did not reach a clear
consensus.
<P>
We observed that XML can be modelled using the HTML 4.0 DIV, SPAN, and CLASS
markup, which are significant to stylesheet and DOM implementations. Some
experience with this style suggested the community would not embrace it,
but the discussion was not conclusive.
<P>
A proposal for a "sprinkles" mechanism is the subject of a Working Draft
in progress.
<H2>
<A NAME="References">References</A>
</H2>
<DL class="bib">
<DT>
<A NAME="rdf-syntax">[RDF]</A>
<DD>
<A HREF="http://www.w3.org/RDF/Group/1998/02/WD-rdf-syntax-19980216"><CITE>Resource
Description Framework (RDF) Model and Syntax</CITE></A><BR>
W3C Working Draft 16 Feb 1998<BR>
Ora Lassila, Ralph R. Swick, eds.
<DT>
<A NAME="XML">[XML]</A>
<DD>
<A HREF="http://www.w3.org/TR/1998/REC-xml-19980210"><CITE>Extensible Markup
Language (XML) 1.0</CITE></A><BR>
W3C Recommendation 10-February-1998<BR>
Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, eds.
<DT>
<A NAME="HTML4">[HTML4]</A>
<DD>
<CITE><A HREF="http://www.w3.org/TR/REC-html40-971218">HTML 4.0
Specification</A></CITE><BR>
W3C Recommendation 18-Dec-1997<BR>
Dave Raggett, Arnaud Le Hors, Ian Jacobs, eds.
<DT>
<A NAME="mathml">[MathML]</A>
<DD>
<A HREF="http://www.w3.org/TR/1998/REC-MathML-19980407"><CITE>Mathematical
Markup Language (MathML) 1.0 Specification</CITE></A><BR>
W3C Recommendation 07-April-1998<BR>
Patrick Ion , Robert Miner
<DT>
<A NAME="Dialects">[Dialects]</A>
<DD>
<CITE><A HREF="http://www.w3.org/pub/WWW/TR/WD-doctypes-960302">HTML Dialects:
Internet Media and SGML Document Types</A></CITE><BR>
W3C Working Draft 06-Mar-96<BR>
Daniel W. Connolly
<DT>
<A NAME="InetSDK">[InetSDK]</A>
<DD>
<A HREF="http://www.microsoft.com/msdn/sdk/inetsdk/help/"><CITE>Internet
Client SDK</CITE></A>, December 19, 1997, Microsoft Corporation
<DT>
<A NAME="XML-Names">[XML-Names]</A>
<DD>
<CITE><A HREF="http://www.w3.org/TR/1998/WD-xml-names-19980327">Namespaces
in XML</A></CITE>, W3C Working Draft 27-March-1998<BR>
Tim Bray, Dave Hollander, Andrew Layman
</DL>
</BODY></HTML>