Stack.html
32.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
The stack of specifications - Design Issues
</title>
<link rel="Stylesheet" href="di.css" type="text/css" />
<meta http-equiv="Content-Type" content="text/html" />
</head>
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
<address>
Tim Berners-Lee Date: 2002/05, last change: $Date: 2003/01/06
19:40:09 $<br />
Status: personal view only. Editing status: rough..
</address>
<p>
<em>Abstract: This is backgrounder explaining where the web
specifications fit into the internet technology as a whole.
It explains the philosophy of electronic communications
having well-defined meaning grounded in a stack of
interconnected specifications. This is all normally -- and
quite justifiably -- taken for granted by Web engineers. But
it is needs to be emphasised when the Internet is abused ,
for example by spammers who forge email headers, or companies
who cheat protocol timeouts in order to claim greater
performance, and in doing so, break the system. This article
debunks the idea that "its Ok to interpret things this way as
more and more people are doing it".</em>
</p>
<p>
<em>It was originally the subject of a keynote address at the
International World Wide Web Conference in Hawai'i, April
2002.</em>
</p>
<p>
<a href="./">Up to Design Issues</a>
</p>
<hr />
<h1>
The Stack of Specifications
</h1>
<p>
Bits mean something.
</p>
<p>
When you connect a cat-5 ethernet cable to your computer, you
effectively commit to taking part, with your computer, in a
very special system. It is a system in which the meaning of
messages is determined, in advance, by specifications. This
is a principle which is so basic to network computer systems
that it is rarely stated. But as the stack of specifications
gets higher and higher, and as electronic commerce, legally
enforceable agreements, and socially sensitive issues such as
privacy and fraud become matters of public concern, it is
worth reiterating for the record.
</p>
<p>
The Internet works because of interoperability between
different computers, despite different hardware, operating
systems, local language context, and software supplier. Users
of the web sign on to the use of these languages when they
use the Internet.
</p>
<p>
There is this little philosophy joining many specifications,
without which the Web falls apart.
</p>
<p>
Lets take an example.
</p>
<h3>
You have an ethernet cable
</h3>
<p>
You walk into a meeting room, and you are offered a thin
cat-5 cable with a 10-base T connector. This is an Ethernet
connector which only takes Ethernet packets. The only way to
use it to communicate is for your computer to send packets
which are formatted to the Ethernet specification. The
Ethernet specification is a large document (Similar to
<strong>IEEE standard 802.3</strong>) put together by a bunch
of engineers, and once they were done Ethernet existed as a
standard, and computers which know nothing about each other
could exchange packets over local area networks..
</p>
<p>
The Ethernet defines the format of an Ethernet packet, which
has a little header information, but mostly carries
information on behalf you the user. The spec also,
importantly, defines some rules of behaviour. For example,
the ethernet doesn't work if more than one computer tries to
transmit at once. There is a rule that if you find that
happens, everyone involved backs off and comes back at a
random interval. Each computer is supposed to wait on average
the same amount of time before trying again. Of course, you
could cheat by actually pretending that your random number
happened to be really small every time, and on average your
computer would end up getting though more and blocking
everyone else out, just like people who always seem to be the
one talking in a meeting. But that would be cheating, and
contrary to the Ethernet specification. By connecting to an
ethernet cable, there is an understanding that your computer
will stick to the rules
</p>
<p>
An ethernet packet can be sent to anyone on the same wired or
wireless local area network. How does a computer know what to
do with a packet when it gets it? How does it know how to
interpret that packet? Well, there is a field in the packet
which tells it, in a coded way, what the use of the packet
is, and therefore how to interpret it.
</p>
<p>
Of course, there are lots of uses of the Ethernet, but a very
common use of an Ethernet packet is to use it to carry an
Internet packet. Ethernet packets can only cross the local
area network, while Internet packets are forwarded anywhere
in the world. So, there is a particular code - a particular
value for the field in the Ethernet packet - which tells any
receiving computer that the data is actually an Internet
Packet. This means that to understand anything more about the
packet means, you have to read another spec: the
<strong>Internet Protocol (IP, RFC791).</strong>
</p>
<p>
@@@ The complete graph of interdependencies between
specifications.
</p>
<h3>
You send an Internet packet
</h3>
<p>
So suppose you send an Internet packet. You put the ethernet
address of the local "router" computer into the ethernet
address field, but within the "data" part of the ethernet
packet is the IP packet and inside that is an internet
address field, which takes the IP address (the thing like
18.96.237.175) which identifies the computer Although the
ethernet packet you send it in only gets as far as some
computer a "router" on the local net, that computer passes
the IP contents on, from computer to computer across
interconnected networks until it arrives on the right local
network for its actual destination.
</p>
<p>
So how does that computer know what to do with it? Well,
there is a field in the IP packet which carries a coded value
to tell the computer receiving it what to do with it. .
</p>
<pre>
From Internet Protocol (RFC791):
A summary of the contents of the internet header follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | <strong>Protocol</strong> | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Figure 4.
</pre>
<p>
And there are a lot of things you can do with an IP packet,
but a very common one is to use that IP packet to set up, or
to be a part of, a reliable stream of communication using the
<strong>Transmission Control Protocol (TCP) (RFC
793).</strong>
</p>
<h3>
You send a TCP packet
</h3>
<p>
When you send, or your computer sends, a packet in the TCP
protocol, there is an understanding that that packet conforms
to the protocol. That means a couple of things. It means that
you agree that the packet's contents it to be interpreted
according to the TCP protocol specification. It also means
that you agree to abide by the rules of the specification,
which determine, rather like with the Ethernet protocol, how
long your computer will wait before re-sending a packet which
didn't seem to get there. If your computer re-sends too
early, then it hogs the Internet and slows down everyone
else. If your computer send a packet to start a new
connection when it doesn't really want to, then the
destination computer will prepare a lot of memory to receive
all the data you are going to send, and wait. If you keep
doing it, then that computer can just run out of memory and
stop working. So you can cheat and you can do real damage by
breaking protocols.
</p>
<h3>
Introducing IANA: The Port number registry
</h3>
<p>
So you computer must stick to the TCP specification. When it
does that, the TCP protocol assures that the two computers
have a reliable connection without any missing bits. What
they use it for is no concern of TCP, apart from the fact
that the TCP protocol specifies, within the TCP packet (which
is inside the IP packet (inside the ethernet packet)) a
special field whose coded value, or <strong>port
number</strong>. There is a convention, which is written into
the TCP specification, (@@check and quote wording) that the
meaning of the port number is determined by a table which is
changed from time to time, but kept by the <strong>Internet
Assigned Numbers Authority</strong> (IANA). Without going
into the politics of the changes and control around IANA, it
is just worth noting that this is, architecturally, a
"flexibility point", where the community can introduce a new
protocol to run on top of TCP/IP without having to write it
into a new version of the TCP/IP specification itself.
</p>
<p>
The port number registry is on the web (@@ link) but also, on
a unix computer, there is a list of the well-known ports in
the file /etc/services.
</p>
<p>
When you send a TCP/IP packet there is therefore an
understanding that if you send to one of the well-defined
port numbers, then you are going to use it in a way defined
by the specification defined in the IANA registry. For
example, port number 25 indicates that you are going to use
it to transfer some email, and that you undertake to
communicate according to the Simple Mail Transport Protocol
specification.
</p>
<h3>
You send an email message
</h3>
<p>
You get the picture. One specification, once you commit to
it, depending on the values of certain fields, invokes
further specifications. By committing originally to using an
ethernet cable, you commit to your computer using, on your
behalf, the various other specifications. In the case in
which your computer sends email, it may for example open a
TCP/IP connection to to port 25, and then use the Simple Mail
Transfer Protocol (SMTP, RFC821). This specification
indicates that the body of the SMTP communication is
formatted according to the email message specification,
RFC822. RFC 822 specifies the headers on email messages. It
specifies, for example that a given "From" field indicates
the email address sender of the message.
</p>
<p>
It is possible, of course, to cheat. with the SMTP protocol.
It is possible to lie about who is sending the message - to
send a message which appears to be from one person to a
friend. This breaks to protocol. It breaks it, here, in a way
which is very clear to people: it sneaks past their personal
email filtering, and also any automated filtering, tricking
them into reading a message. This is a security violation. It
can use up a person's time, energy, bandwidth and disk space
for the commercial gain (indirectly through advertising and
sales) of the perpetrator.
</p>
<p>
The Internet specifications, to which any Internet user
implicitly agrees in using the Internet at all, define what
the fields in an email message mean. To put incorrect
information in these fields is to make a misrepresentation,
just as it would have been in any other medium. It should be
subject to the same penalties as lying or fraud in any other
medium.
</p>
<p>
When the Internet was young and used by research
institutions, its misuse would inconvenience other users and
lead to reprobation and the disdain of one's peers. Now that
the Internet is such as large force in society, it is
possible to make a lot of money and create a lot of damage by
protocol abuse. You can compare a lie in an internet message,
depending on how it is done, to forging a check, connecting
to the electricity supply the other side of the meter, or to
poisoning the water supply. Society must therefore be careful
to be absolutely clear about the illegality of such misuse.
</p>
<h3>
<a name="publish2" id="publish2">You publish a Web page</a>
</h3>
<p>
When you publish a web page, just as when you send an email
message, the web page or the message generally carries a
meaning. Well,it can be a picture or a poem which is more
artistic than linguistic, but in a large number of cases the
meaning is a well-defined part of a communication between
parties. It may be a human-readable document, like the page
describing a pair of pants your are about to buy from a
store, or it may be machine-processable, like the Online
Financial Exchange (OFX) format bank statement your financial
software downloads from your bank.
</p>
<p>
Of course, you would find it hard work to make sense of the
OFX file if you just read it without the help of the
financial agent, and your financial agent wouldn't make much
sense of the catalog page. Something must allow us to
distinguish how web pages and emails should be interpreted,
just as a computer has to figure out how to make sense of an
Ethernet packet. And just the same sort of thing indeed
happens.
</p>
<p>
When you publish a web page, you give it a HTTP URI. You pick
a URI from the space of URIs which are yours to define. Some
people have space on their own domain, some people have the
right to pick URIs in part of someone else's domain. But the
URI is one which you own or over which you have authority.
You are not allowed to pick one in someone else's space.
</p>
<p>
Whoever owns the domain has the authority to define which
computer serves information in it. They have the authority
then to have a computer -- a web server - which is configured
to act on their behalf. It is then assumed that the computer
acts on the their behalf. The server is the agent of the
publisher. What it does is tell any asking browser what you
have said is a representation of the document for a given
URI.
</p>
<p>
When someone follows a link to your web page, their browser
opens a TCP/IP connection to TCP port 80 on the machine which
is registered as serving the (www.whatever.com, etc) in
question. Their agent, their browser, asks your agent, the
server, to give it some representation of the web page for
that URI.
</p>
<p>
Why? Because the URI specification says that what you can
tell about a URI depends on the first bit, in this case
<code>http:</code>. It indicates that an <strong>IANA URI
scheme registry</strong> is used to tell you what
specification applies.
</p>
<p>
The IANA registry indicates that the <code>http:</code>
scheme calls out the <strong>HTTP 1./1
specification</strong>, RFC@@@.
</p>
<p>
HTTP 1.1 says that (unless otherwise specified) the client
contacts the server on TCP port 80. The IANA registry of port
numbers, just as it allocates port 25 to mail transfer,
allocates 80 to HTTP. The HTTP spec is therefore mutually
assumed by both parties. This spec describes what a request
means, and that when the request is successful, what the
response message sent back to the browser means.
</p>
<p>
According to HTTP 1.1, in that response, there is a field
(<strong>Content-type</strong>) which indicates how the body
of the response should be interpreted. For each valid value
of that field, there is an <strong>IANA content-type
registry</strong> value which explains which specification
applies to the body of the message. This is just the same
system as for email.
</p>
<p>
When the value if the field is <code>text/html</code>, it
indicates that the message is a hypertext document ("web
page") which is to be presented to the human being and
interpreted then by the human being in the usual human way.
If the field indicates it is an OFX file, then that means
that the OFX specification determines what it means, and you
need a program or something which understands what the fields
of the OFX documents mean. In neither case can you argue that
you didn't know. So long as the writers of the specification
do a good job (and goodness knows they work hard enough at
it) then there can be no argument as to what the actual
fields in your bank statement mean.
</p>
<h3>
<a name="publish1" id="publish1">You publish an XML
document</a>
</h3>
<p>
When you publish a document in XML, then there is another
layer involved. Many different languages -- or even mixture
of languages -- can be sent structured as XML. The mime type
of the document can just be "application/xml", which doesn't
tell the reader how to interpret it. For that, you have to
look at the outermost element of the XML document. The
namespace declaration gives a URI indicating the namespace.
</p>
<p>
Note the difference between the use of a URI and a central
registry. Because the namespace is identified by a URI, the
web becomes the registry. Anyone can make a new XML
namespace. Also, one can use a URI, such as a HTTP URI, which
can be dereferenced. This allows the information which would
have been in the registry to be put into a web document. (The
W3C TAG is currently debating the issue of the best format to
use for this meta information, but HTML, RDDL and RDF have
been used in various combinations. But broadly there are two
types of information. There may be a specification (or a
reference to one) to tell a human reader what the language is
and how to interpret it. there may also be data - a schema
which describes the grammar of the language, or even the
start of a logical definition of what the language means.
</p>
<p>
But whatever information may or may not be available
automatically, in an XML world, a system has to look into the
document, at the namespace of the outermost element, to know
how to interpret it. This generally means what application to
launch - not to mention what icon to use to represent the
document to a person.
</p>
<p>
An example of a machine-readable document with important
semantics is an online P3P web site privacy policy. This is
an XML document which gives, for each category of personal
information, the sort of thing the web site promises to do or
not do with it. It can be scanned by a a browser more easily
than a person can read a privacy policy. It is a useful
feature, as it saves everyone's time and increases public
confidence in responsible web sites. It clearly depends on
the meaning of the terms being well defined by the
specification.
</p>
<p>
<em>(Problem: this doesn't always happen: MathML and XHTML as
XML in practice.@@ links)</em>
</p>
<h3>
<a name="publish" id="publish">You publish an RDF
document</a>
</h3>
<p>
Now let's talk semantics. Harder semantics - for logical
systems. Some XML documents are RDF documents. RDF/XML is an
XML-based language for data. It is very simple: each document
is just a set of "triples". A triple gives the value of some
property of some object - or some relationship between some
object and some other object. The triples are independent, so
interpreting the document is just, the RDF spec explains, a
question of interpreting each triple.
</p>
<p>
How do you figure out what a triple means? Well, the property
(or relationship) is identified by a URI. And whoever made up
the URI gets to say what the property means, that is, what
any triple using that property means.
</p>
<p>
So if make a property http://www.w3.org/2002/05/example#color
and define that the color property is a name out of the
Pantone(tm) list of colors and you send someone an order in
RDF for a hat which has
<em>http://www.w3.org/2002/05/example#color</em> of
<em>blue256</em> then you are specifying blue256 on the
pantone scale. No one can argue that you meant some other
scale of blue. Normally the argument is made much easier by
my actually writing a document
http://www.w3.org/2002/05/example in which I explain what
#color means. No one can argue, in their catalog, that "By
suit, we mean something which is black, whatever
<em>http://www.w3.org/2002/05/example#color</em> someone
might say it is". The meaning of the triple is determined by
the property, not by the subject or the object of the triple.
</p>
<table border="2">
<caption>
A <a name="section" id="section">section</a> through the
stack
</caption>
<tbody>
<tr>
<th>
Specification
</th>
<th>
Field
</th>
<th>
Where to look up values
</th>
<th>
example value
</th>
<th>
Example value calls out
</th>
</tr>
<tr>
<td>
Ethernet (cf. IEEE 802.3)
<p>
and either DIX(RFC894) or 802.2,3 <a href=
"http://www.ietf.org/rfc/rfc1042.txt">RFC1042</a>
</p>
</td>
<td>
Ethernet type (or protocol identification field for
LLC) 16-bit Ethertype
</td>
<td>
IEEE registry
<p>
Assignment by RAC process @@link
</p>
</td>
<td>
0x800
</td>
<td>
<a href="http://www.faqs.org/rfcs/rfc791.html">Internet
Protocol (RFC791)</a>
</td>
</tr>
<tr>
<td>
<a href="http://www.faqs.org/rfcs/rfc791.html">Internet
Protocol (RFC791)</a>
</td>
<td>
Protocol
</td>
<td>
IANA protocol-numbers
</td>
<td>
<a href=
"http://www.iana.org/assignments/protocol-numbers">6</a>
</td>
<td>
Transmission Control protocol (RFC793)
</td>
</tr>
<tr>
<td>
<a href=
"http://www.ietf.org/rfc/rfc0793.txt">Transmission
Control protocol (RFC793)</a>
</td>
<td>
port
</td>
<td>
IANA registry
<p>
port-numbers
</p>
</td>
<td>
<a href=
"http://www.iana.org/assignments/port-numbers">80</a>
</td>
<td>
HTTP 1.1
</td>
</tr>
<tr>
<td>
<a href="/Protocols/rfc2616/rfc2616.html">HTTP 1.1</a>
</td>
<td>
content-type
</td>
<td>
IANA registry
<p>
mime types
</p>
</td>
<td>
application/xml
</td>
<td>
XML1.0+NS
</td>
</tr>
<tr>
<td>
<a href="/TR/REC-xml">XML</a> 1.0+<a href=
"/TR/REC-xml-names">NS</a>
</td>
<td>
xmlns
</td>
<td>
The Web
</td>
<td>
...@@..rdf
</td>
<td>
RDF M&S 1.0
</td>
</tr>
<tr>
<td>
<a href="/TR/REC-rdf-syntax">RDF MS 1.0</a>
</td>
<td>
property
</td>
<td>
The Web
</td>
<td>
rdf:type
</td>
<td>
RDF MS 1.0 section 4.1
</td>
</tr>
<tr>
<td>
<a href="/TR/REC-rdf-syntax/#type">RDF MS 1.0
definition of rdf:type</a>
</td>
<td>
object
</td>
<td>
The Web
</td>
<td>
cyc:Person
</td>
<td>
cyc ontology
</td>
</tr>
</tbody>
</table>
<p>
Looking at the table which summarizes the steps we have been
through, you will see the specs are connected by some field
which points to the next spec through some list or registry.
For the more recent layers, the registry has been replaced by
the Web.
</p>
<h2 id="hooks">
The hooks - identifiers
</h2>
<p>
That's an interesting trend. If you like, we can see the
technology move through three stages of civilization, in
terms of the identifiers which are used for concepts.
</p>
<ol>
<li>Using numbers or strings
</li>
<li>Using URIs - identify the same thing in all contexts
</li>
<li>Using dereferencable URIs
</li>
</ol>
<p>
The early protocols used numbers and strings which requires a
central registry. that worked, because the only common
concepts were those in the standard protocols, and those had
to be common across the net for interoperability. In these
areas still there is a strong argument for central control.
</p>
<p>
As we move on to later protocols, the protocols themselves
become more diverse. This is partly because they are at a
higher application level. The centralized model starts to
break down, as witness some of the social difficulties of
getting an IANA allocation for a MIME type an embryonic W3C
specification. So new protocols allow new applications to be
defined using URIs, allowing anyone who has access to a bit
of domain space to allocate them.
</p>
<p>
The third stage of civilization is the one at which the
identifiers can be looked up on the web. This is quite useful
for engineers who encounter new languages. It doesn't really
justify its existence, though, until one has technology --
Semantic Web technology -- in which an automated agent can
pick up metadata about the languages on the fly, and use that
metadata to enhance its processing of data in that language.
</p>
<p>
(What if I don't have a web site? This is becoming less and
less of a problem. There are all kinds of existing ways of
allocating an identifier. But the persistence of such
information is, and always will be, like the cleanliness of
water and air, an important social issue.)
</p>
<h2>
<a name="When" id="When">When the chain does NOT connect</a>
</h2>
<p>
We have seen how any user of the Internet is bound to a
series of specifications which define the meanings of terms,
and hence allow his or her equipment and agents to
interoperable with others. This stack prevents one from
sending a nasty email to someone and then protesting that the
message didn't mean anything. So if the stack is so strict,
how <em>does</em> one send a nasty email message when one
<em>doesn't</em> mean it? There are plenty of times you want
to include an attachment to which you want to refer, but for
which you don't claim authorship or responsibility.
Understanding the exceptions is as important as understanding
the general rule. Many protocols have ways of breaking the
chain, of including information which is not part of the
meaning of the message.
</p>
<p>
In email it is an <strong>attachment</strong>. There is
always in email a cover note, the basic message, which
conveys the actual message. You normally only use any
attachment according to the main message. It might be "Hey,
Joe, what do you think of this paper?", or "Look at this
stupid program - but whatever you do don't run it!"
</p>
<p>
Currently (2002) XML doesn't have a common standard for what
has been called in that context "<strong>packages</strong>".
This is a pity. It is on the agenda for XML Protocol working
group, as seen as essential for SOAP operations. One must be
able to include documents stapled to a SOAP request or
response, which are not to be just acted on.
</p>
<p>
At the Semantic Web level, those who have played with the
<a href="Notation3.html">Notation3</a> language will
recognize the curly brackets as the packaging, or
<strong>quoting</strong>. Whereas a document
</p>
<pre>
my:car srgb:color "000044".
</pre>
<p>
asserts that the car in question is blue, the document
</p>
<pre>
my;form67 :says {my:car srgb:color "000044"}.
</pre>
<p>
does not. It merely says something about the statement that
the car is blue.
</p>
<p>
So being able to refer to something without asserting it,
whether you call it attachment, packaging, or quoting, is an
important feature of a language. The fact that you can do
this removes the last excuse for anyone claiming not to have
meant whatever they did say in the main message!
</p>
<h2 id="Conclusion">
Conclusion
</h2>
<p>
Internet messages and Web documents are represented in
computer languages with well-defined specifications. Use of
the Internet and the Web implies an acceptance of the
specifications as authoritative.
</p>
<p>
The specifications are linked together by identifiers which
in earlier specs were numbers, but in later specs are URIs,
ideally URIs which can be looked up on the Web. The ability
to make these linked specifications requires the
specifications to be designed very independently. This is
simply the software engineering practice of information
hiding between layers.
</p>
<p>
The trend for the higher layers is toward more and more
machine-processable metadata about such languages, which can
be retrieved automatically and will aid in processing. Some
of these will relate the semantics of terms in one vocabulary
to terms in another, on a web-like way.
</p>
<p>
The fact that as we move into the applications we see more
and more diverse uses of the Web and the Net does not
diminish our reliance on a sound standards in the supporting
infrastructure.
</p>
<hr />
<h2>
Related
</h2>
<ul>
<li>The Meaning of a document
</li>
<li>The meaning of an XML document
</li>
</ul>
<h2>
References
</h2>
<p>
The table above contains hypertext links to some
specifications used as examples.
</p>
<p>
See also:
</p>
<ul>
<li>The RDF concepts document. @@
</li>
</ul>
<hr />
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<p>
<a href="../People/Berners-Lee">Tim BL</a>
</p>
</body>
</html>