hacking.html
8.44 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
<!doctype html public "-//W3C//DTD HTML 1997-05-18//EN"
"html.dtd">
<HTML>
<HEAD>
<TITLE>XML Hacking is Fun!</TITLE>
</HEAD>
<BODY>
<P>
<A href="../../"><IMG src="../../Icons/WWW/w3c_home" ALT="W3C"></A> |
<A HREF="../../Architecture/">Architecture</A> |
<A HREF="../../MarkUp/SGML/">XML</A>
<H1>
XML Hacking is Fun!
</H1>
<ADDRESS>
<A HREF="../../People/Connolly/">Dan Connolly</A><BR>
Created: Mon May 12 16:06:27 CDT 1997<BR>
$Id: hacking.html,v 1.5 1998/04/29 03:20:20 connolly Exp $
</ADDRESS>
<P>
For me, XML puts the fun back into web hacking. I wrote three XML parsers
last weekend. Great stress relief!
<P>
See also: <A href="../notes.html">some more notes on XML implementation
experience</A>, mostly by Bert Bos.
<HR>
<DL>
<DT>
<A href="xml.py">xml.py</A>
<DD>
<A href="http://www.python.org">python</A> module for XML.</></>
<DT>
<A href="xml-check.pl">xml-check.pl</A>
<DD>
quick and dirty XML well-formedness checker in perl. Got bored with this
and moved on to python after a bit.</></>
</DL>
<H2>
Converting XML to Lout
</H2>
<DL>
<DT>
<A href="loutwr.py">loutwr</A>
<DD>
lexical details of writing lout format
<DT>
<A href="xml2lout.py">xml2lout</A>
<DD>
rules/stack-based conversion to lout
<DT>
<A href="html2lout.py">html2lout</A>
<DD>
add some rules for HTML
<DT>
<A href="report2lout.py">report2lout</A>
<DD>
add some rules for a latex/lout-like
<A href="../../MarkUp/9705/report.dtd">report DTD</A> on top of html
</DL>
<H2>
XML Typing notes
</H2>
<P>
XML document types should evolve gracefully. Technically, format negotiation
is a solution to deployment of revised data formats, but it did not meet
the market constraints (i.e. it wasn't cost-effective for the involved parties)
in the case of HTML forms, tables and foriegn payload (scripts and stylesheets).
<P>
I'm investigating ways to express the MIME multipart alternative concept
at the element level in XML. This allows new features in XML documents to
be deployed like color over the b/w TV signal. It allows the new and the
old semantics to be expressed in the same file, which cuts down the cost
of managing the data (copy, rename, verify, datestamp, inodes, ...) and caching
it.
<P>
My intuition says that we can borrow the inheritance and subtyping ideas
from <A href="../../OOP/">OOP</A> to model a form of type negotiation for
XML.
<P>
<DL>
<DT>
Akpotsui, Extase K. A; Quint, Vincent; Roisin, Cécile.
<A HREF="ftp://ftp.inrialpes.fr/pub/opera/publications/MCM97.ps.gz"><CITE>Type
Modelling for Document Transformation in Structured Editing
Systems</CITE></A>. Mathematical and Computer Modelling 25/4 (February 1997)
1-19 (with 26 references). Authors' affiliation: INRIA/Project Opéra.
<DD>
Abstract:
<BLOCKQUOTE>
This paper addresses the problem of type transformation in structured editing
systems and proposes a type description model convenient for type comparison
and document conversation. Two kinds of transformations are considered: dynamic
transformations allow a structured editor to change the structure of a part
of a document when the part is copied of moved, and static transformations
allow specific tools to restructure documents when their generic structure
is modified. We present in this paper the current state of our research on
formal analysis for these transformations.
</BLOCKQUOTE>
</DL>
<P>
Cut/paste issues. Shows that DTD's are not just regexps: & ? are novel.
<P>
Also shows that separating element names from element types is essential
for some kinds of modelling. I suspect DTD's should be extended to allow
this (well... replaces with something that expresses this.) For example,
allow XPTR style selectors rather than just namegroups in element declarations:
<PRE>
<!element (parent1 child) ANY>
<!element (parent2 child) (x|y|z)>
</PRE>
<P>
@@don't use class, just make up new elements and use containment!
<H2>
XML Modules
</H2>
<P>
About namespaces in DTDs... how about:
<PRE>
<![ module-name [
<!entity module-name "IGNORE">
... module contents ...
]]>
</PRE>
<P>
which is just like:
<PRE>
#ifdef _module_h
#define _module_h
... module contents ...
#endif /* _module_h */
</PRE>
<P>
I made a <A href="fix-sgml.el">patch to psgml mode</A> to allow me to use
this syntax.
<P>
You still have to have a partial order on your modules. And it's still just
one big namespace. So it's just like C -- which is good enough for lots of
things, but not for truly independent development.
<H2>
Marked Sections, and Here Documents, and Archives
</H2>
<P>
Is an unescaped > allowed in XML content? (9711 spec says yes.)
<P>
HTML 2.0 spec discouraged it in order to avoid ]]> showing up in documents,
which is an error in SGML'86.
<P>
XML of 9711 has the same misfeature, but it's marked "for compatibility".
<P>
Marked sections can't contain ]]>
<P>
What's the purpose of a marked section, anyway? If it's just to be able to
put XML inside XML without lots of tedious escaping, then the above limitation
isn't a showstopper.
<P>
But it seems to me that the purpose is to be able to include foriegn data
like SCRIPT and STYLE, in which case this limitation is really painful.
<P>
Based on shell/perl HERE documents and MIME multipart syntax, I suggest the
following:
<PRE>
<![myStringHere[ ... ]myStringHere]>
</PRE>
<P>
which allows ... to contain ANY sequence of characters. Any sequence of bytes,
actually! This solves the script/style problem, plus gives XML the potential
to replace tar, zip, etc. in the same way that HERE documents facilitate
shar archives. (But Just Say No to turning-complete archive formats.)
<H2>
Empty end Tags
</H2>
<P>
I'm implemented support for:
<PRE>
<foo> ... </>
</PRE>
<P>
The implementation cost is trivial. The deployment cost is the risk that
folks will expect legacy HTML elements to work this way:
<PRE>
<blockquote> ... </>
</PRE>
<H2>
attribute value syntax
</H2>
<P>
???
<H2>
Character Entities
</H2>
<P>
Bad idea. general entites are very powerful, and all we need is a way to
escape three characters (maybe two).
<P>
Other characters should be done with "replaced elements" with fallback inside,
e.g.:
<PRE>
<emdash>---</>
</PRE>
<P>
Going to Unicode is probably cost-effective in the long term, but the documents
don't degrade gracefully.
<H2>
Convenience Entities: macros and includes
</H2>
<P>
These are obviated by linking. The idiom:
<PRE>
<!doctype html public "-//IETF//DTD HTML//EN" [
<!entity product-name "Gee Whiz&tm;">
<!entity legal system "legal.html">
]>
... &product-name;
...
&legal.html;
</PRE>
<P>
can be done ala:
<PRE>
<!doctype html system "http://www.w3.org/9705/html.dtd">
<div style="display: none">
<span id=product-name>Gee Whiz&tm;</span>
</div>
... <a href="#product-name" xml-link=replace>Gee Whiz&tm;</>
<a href="legal.html" xml-link=replace>Copyright (c) 1997 by US</a>
</PRE>
<P>
The a's could be left empty. But for the benefit of downlevel clients, you
can (by machine) propagate the destination of the link (or a part of it)
to the souce. clients,
<H2>
Parameter Entities
</H2>
<DL>
<DT>
.cm
<DD>
content model. Fully parenthesized. Can be used anywhere a gi can be used.
<DT>
.orList
<DD>
union expression. orLists can be concatendated. @#hmmm.. namegroup?
<DT>
.valType
<DD>
attribute value type, e.g. CDATA with overloaded semantics
<DT>
.tagType
<DD>
list of attribute declarations, ala a list of methods, i.e. an object type
<DT>
.dtd
<DD>
link to another entity in DTD syntax
</DL>
<H2>
DT and DD
</H2>
<P>
I want DT/DD to be able to format ala:
<PRE>
term definition
definition def d
efiintion
</PRE>
<P>
so I changed the content models of dt and dd so that dd is contained within
dt.
<H2>
Testing Notes
</H2>
<P>
@@link to MIX.
<PRE>
ok3: uses internal declaration subset. Boo.
note that this is a perfect example of how
entities are redundant with respect to linking
ok3a: @@ WF client should check for data outside root element
torture:
whacked internal declaration out
removed references to other entities
#@@ is an unescaped > allowed in xml? what about ]]>?
is ]]> a reportable error? well-formedness error? validity error?
This doesn't match:
<p>PI with markup: <?Myparser &lt;p> or <p> --
which?></p>
</PRE>
</BODY></HTML>