9810xn.html
11.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
<HTML>
<HEAD>
<META name="RCS-Id" content="$Id: 9810xn.html,v 1.19 1998/09/29 15:30:39 connolly Exp $">
<TITLE>The XML Revolution (draft for Nature's Web Site)</TITLE>
</HEAD>
<BODY>
<H1>
The XML Revolution
</H1>
<ADDRESS>
by <A HREF="#Dan">Dan Connolly</A><BR>
draft of $Date: 1998/09/29 15:30:39 $
</ADDRESS>
<P>
If you have ever peeked with the "view source" option on your Web browser,
then you're familiar with Hypertext Markup Language
(<A HREF="../../MarkUp/">HTML</A>).
<P>
HTML was an overwhelming success because it fulfilled a dream that word
processors, despite their myriad features, don't<A HREF="#WWW92">[WWW92]</A>:
<BLOCKQUOTE>
Pick up your pen, mouse or favorite pointing device and press it on a reference
in this document - perhaps to the author's name, or organization, or some
related work. Suppose you are directly presented with the background material
- other papers, the author's coordinates, the organization's address and
its entire telephone directory. Suppose each of these documents has the same
property of being linked to other original documents all over the world.
You would have at your fingertips all you need to know about electronic
publishing, high-energy physics or for that matter Asian culture. If you
are reading this article on paper, you can only dream, but read on.
</BLOCKQUOTE>
<P>
Now that dream is a reality, and human communication is augmented by the
Web; that is, as long as the communication consists of a title, headings,
paragraphs, lists, tables, and forms.
<P>
What about all the other communications idioms and document types that we
routinely use to get our work, business, and play done?
<UL>
<LI>
Restaurant menus
<LI>
Theatre programs
<LI>
Meeting minutes with agenda items and actions
<LI>
Cheques, invoices, and purchase orders
<LI>
Calendars and project schedules
</UL>
<P>
Extensible Markup Lanuage (<A HREF="../../XML/">XML</A>) is the evolutionary
successor to HTML, in "less is more" fashion. If you're thinking that XML
is all the stuff from HTML plus a few more things, think again. It's the
same pointy-brackets, tags, and attributes; but when it comes to tag names,
the slate is wiped clean. XML is like HTML with the training wheels off.
<P>
Of course, you can imitate menus, programs and schedules with HTML, or you
can put pictures or facsimiles of their traditional printed form on the Web.
That's great because it allows you to share them with people all over the
planet instantly. But it doesn't invite the computer to help you manage them.
<P>
The bane of my existence is doing things that I know the computer could do
for me.
<P>
If the Web page with your personal calendar say's you'll be in New York next
Thursday, and the page with your workgroup calendar says you'll be in London
all week, shouldn't the computer be able to warn you about the conflict?
And shouldn't it go ahead and ask you if it's OK to cancel your flight to
London and purchase this other ticket to New York?
<P>
As a medium for human communication, the Web has reached critical mass, (I
won't go so far as to say it's mature--there's plenty of work to be done!)
but as a mechanism to exploit the power of computing in our every-day life,
the Web is in its infancy. The Web now allows us to communicate our problems
to one another faster than ever before, but does it really help us solve
them?
<P>
XML is so simple that it just might work: it just might revolutionize the
ability of people to conduct commerce, express themselves, and generally
get work done with computers and networks.
<P>
Web site designers are doing some amazing things, but they often re-invent
the wheel for any number of reasons. Order processing systems make a good
example: some web design shop, say <TT>mall.com</TT>, built one shopping-cart
system, but <TT>mousetraps.com</TT> can't use it, because
<UL>
<LI>
their infrastructure is Windows NT, and the <TT>mall.com</TT> system is based
on Unix, or
<LI>
perl vs. Java, or perhaps
<LI>
the <TT>mousetraps.com</TT> folks were just too busy to discover that
<TT>mall.com</TT> had solved the problem, or
<LI>
the <TT>mall.com</TT> system is aimed at a million transactions per day and
requires thousands of dollars worth of hardware and software, while the
<TT>mousetraps.com</TT> folks only expect a few orders a week and can only
afford a few hundred dollars, or
<LI>
<TT>mall.com</TT> doesn't care to share its technology with the community
either because
<UL>
<LI>
they don't want to lose a competitive advantage or
<LI>
because they don't want to take on a support burden.
</UL>
</UL>
<P>
For all these reasons, it takes longer to develop effective web sites than
it should, and the community is looking for opportunities to share technologies
and resources.
<P>
At the lowest level, organizations like The World Wide Web Consortium
(<A HREF="http://www.w3.org/">W3C</A>), The Internet Engineering Task Force
(<A HREF="http://www.ietf.org/">IETF</A>) and The Object Management Group
(<A HREF="http://www.omg.org/">OMG</A>) are engaged in updating the transport
infrastructure, <A HREF="../../Protocols/">HTTP</A>, firstly to address some
of the design shortcomings that 5 years of experience has exposed, and secondly
to better integrate with modern software development. At the next level,
the software development community is pushing the Web down into the
infrastructure of operating systems and languages like perl, Java, and Microsoft
Windows. The goal of all this low-level stuff is that it "just works," like
a lightswitch or a telephone.
<P>
But there's a twist: along with shipping your pages around, the computing
infrastructure should take every opportunity to read, understand, and act
on them. There's no reason to live with the status
quo<A HREF="#bosak97">[Bosak97]</A>:
<BLOCKQUOTE>
Hospitals have begun to offer the [home health care] agencies a solution
that goes something like this:
<OL>
<LI>
Log into the hospital's Web site.
<LI>
Become an authorized user.
<LI>
Access the patient's medical records using a Web browser.
<LI>
Print out the records from the browser.
<LI>
Manually key in the data from the printouts.
</OL>
<P>
The knowledgeable reader may smile at this "solution," but in fact this is
not a joke; this is an actual proposal from a large American hospital known
for its early adoption of advanced medical information systems.
</BLOCKQUOTE>
<P>
<EM>Manually key in the data</EM>? Can't the two systems be made to talk
to each other? Never mind the multibillion-dollar medical industry; how often
do you get a computer-generated bill, invoice, or airline ticket, and then
manually key the information into your computer to manage your schedule or
finances? Is this the best we can do? Not if the XML revolution succeeds.
<P>
Today, several major Web search services build big indexes. These are incredibly
useful, but they're also limited: they don't know the difference between
a book <EM>by</EM> Ben Franklin and a book <EM>about</EM> Ben Franklin, let
alone the difference between an African beetle and a Volkswagon Beetle.
<P>
The search services <EM>do</EM> know which part of your page is the title,
because the <TT><title> </TT>tag in the HTML markup tells them. Why
not just add <TT><by></TT> and <TT><about></TT> and
<TT><genus></TT> and such tags to HTML? Because...
<UL>
<LI>
technically, it would produce a mess: HTML is hard enough to process now,
and if we make it harder, we reduce the chance that new tools will come along
and make the Web smarter.
<LI>
socially, it wouldn't work: the HTML specification is maintained by a small
group of experts who are trusted to Do The Right Thing on behalf of the
community; that small group doesn't have expertise in all subjects that may
be covered by Web pages, and if we added that expertise to the group, it
would be too large to function. It is much better to give everyone a tool
that they can easily adapt for their own particular needs.
</UL>
<P>
HTML was a critical first step, but it is, by design, a one-size-fits-all
solution; it works well when applied to its original domain of simple structured
documents with links, but doesn't work so well in all the other domains where
people want the Web to apply.
<P>
XML, like the Internet and the Web, is designed to facilitate a marketplace
of competing companies, innovative individuals, and organizations of all
sizes in between. <A HREF="http://www.w3.org/">W3C</A> is a consortium of
270+ member organizations committed to the growth of this marketplace, ensuring
interoperability and smooth evolution.
<P>
This decentralized marketplace is already at work: to automate exchange of
bills, statements, and payments, the banking and software heavyweights are
working on Open Financial Exchange
(<A HREF="http://www.oasis-open.org/cover/gen-apps.html#ofe#xml-ofe">OFX</A>);
meanwhile, to automate exchange of information about chemicals, their properties,
uses and suppliers, one researcher in Nottingham, Peter Murray-Rust, rolled
up his sleeves, and Chemical Markup Language
(<A HREF="http://www.oasis-open.org/cover/gen-apps.html#cml">CML</A>) was
born.
<P>
XML is intended to span this wide spectrum of application, and it has become
a strategic technology in W3C, where members are sharing resources to compliment
HTML with XML-based technologies:
<UL>
<LI>
<A HREF="../../Math/">MathML</A>, for describing mathematics as a basis for
machine-to-machine communication.
<LI>
<A HREF="../../AudioVideo/#SMIL">SMIL</A>, for expressing media synchronization
<LI>
<A HREF="../../RDF/">RDF</A>, for resource description, such as library-style
cataloging
<LI>
<A HREF="../../P3P/">P3P</A>, to use XML and RDF so users can be informed,
in control, and make decisions based on their individual privacy preferences.
</UL>
<P>
XML by itself is just a simple text format; but together with all the ways
it's being used to share structured information, it's a revolution that promises
to make the Web a whole lot smarter.
<P>
<HR>
<H2>
<A name="r234lk">References</A>
</H2>
<DL>
<DT>
<A NAME="WWW92">[WWW92]</A>
<DD>
<A HREF="http://www.w3.org/History/1992/ENRAP/Article_9202.ps"><CITE>World-Wide
Web: The Information Universe</CITE></A><BR>
Berners-Lee, T., et al., (1992), Electronic Networking: Research, Applications
and Policy, Vol 1 No 2, Meckler, Westport CT, Spring 1992
<DT>
<A NAME="bosak97">[Bosak97]</A>
<DD>
<A HREF="http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm"><CITE>XML,
Java, and the future of the Web</CITE></A>
<DD>
Jon Bosak, Sun Microsystems
</DL>
<P>
<HR>
<P>
<EM><A HREF="./" NAME="Dan">Dan Connolly</A> is the leader of the
<A HREF="../../Architecture">W3C Architecture Domain</A>. He began contributing
to the World Wide Web project, and in particular, the HTML specification,
while developing hypertext production and delivery software in 1992.</EM>
<P>
<EM>He presented a draft of <A HREF="../../MarkUp/html-spec">HTML 2.0</A>
at the <A HREF="http://www.cern.ch/WWW94/">first Web Conference</A> in 1994
in Geneva, and served as editor until it became a Proposed Standard RFC in
November 1995.</EM>
<P>
<EM>He was the chair of the W3C Working Group that produced HTML 3.2 and
HTML 4.0, and collaborated with Jon Bosak to form the W3C
<A HREF="../../XML/">XML</A> Working Group and produce the W3C XML 1.0
Recommendation.</EM>
<P>
<EM>Dan received a B.S. in Computer Science from the
<A HREF="http://www.utexas.edu/">University of Texas at Austin</A> in 1990.
His research interest is investigating the value of formal descriptions of
chaotic systems like the Web, especially in the consensus-building
process.</EM>
<P>
</BODY></HTML>