NameMyth.html
20 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<meta http-equiv="Content-Type" content="text/html" />
<title>
The Name Myth -- Axioms of Web architecture
</title>
<link href="di.css" rel="stylesheet" type="text/css" />
</head>
<body bgcolor="#DDFFDD" text="#000000">
<address>
Tim Berners-Lee
<p>
Date: December 19, 1996
</p>
<p>
Status: personal view. Editing status: Italic text is
rough. Reques complete edit and possibly massaging, but
content is basically there.
</p>
</address>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<h3>
Axioms of Web Architecture: 2
</h3>
<hr />
<h1>
The Myth of Names and Addresses<br />
</h1>
<p>
The discussion above about the universality of URIs
(Universal Resource Identifiers) mentions briefly how URIs
are designed to encompass both things we think of as
addresses and those we think of as names. Much of the
discussion of this issue has been clouded by attempts to
distinguish names from addresses. The term "identifier" was
picked in an attempt to side-step this issue but
historically, that did not prevent a quagmire of circular
discussion which in some circles paralyzed any forward
progress. Therefore, in this section let me state the
philosophy which to my mind sets this problem in the right
light and should prevent further fruitless discussion.
<i><br /></i>
</p>
<p>
There is the commonly held belief that names and addresses
are different and distinct. We learn the importance of the
difference between identifiers in a programming language and
addresses within a computer memory. We learn the difference
in properties between fully qualified domain names on the
internet and internet protocol addresses. This can lead us
easily into imagining that there are two types of objects:
Names, which once attached to an object follow it for its
life wherever it should reside, and "addresses" which change
frequently whenever an object moves or is copied or
replicated from one "location" to another.
</p>
<p>
However, the only true location is a point in three
dimensional space, and within computer systems and especially
networked computer systems there is a very large number of
complex indirection between almost anything we would call a
name <i>or</i> an address and the actual physical location of
the memory cell which stores it. At one end of the spectrum a
computer memory address often is really an address within a
virtual memory space allocated to a particular project, and
when used is translated by the hardware into a physical
memory address, or for that matter into an address, into a
piece of memory which is being moved out into somewhere and
swapping the file on disk storage. Filenames are mapped
though mount tables and directory files into "inodes" which
are mapped onto track and sector locations. Internet protocol
addresses [IP Addresses] similarly are not bound absolutely
to a given computer: they can be re-allocated within the
constraints that because they are used for routing, there is
information connecting parts of the IP address with routing
information and so the computer corresponding to a given IP
address cannot be moved far in the routing structure. So, we
see that the constraint on how you can re-use an address is a
function of what information is in the address. When most
programs or people mention IP addresses, they simply quote
four decimal numbers, each between naught and 255 without
worrying about the internal structure. So, the information
within the IP address which prevents it being re-used in a
different area is to most people not explicit: It is, if you
like, hidden within there as the reason why IP addresses
can't be used. When we want to use something to refer to a
computer but still be able to move the computer or at least
the thing corresponding to that identification across from
one part of the internet to another, we use our domain name.
The domain name system, being completely independent of the
routing system, allows us to allocate any IP address at all
to a computer of a given domain name. Therefore, if we
believe the naming myth the domain name is a name and the IP
address is truly an address.
</p>
<h2>
<a name="Anecdotes" id="Anecdotes">Two anecdotes about names
and addresses</a>
</h2>
<p>
Two real-life anecdotes illustrate the dangers of making this
assumption. When there were only a few web servers and I kept
a registry of all those which I knew, I was contacted by a
group in Australia who were putting up a server with some
interesting botanical information. They sent me some details
of the server to be put into the list and they gave me the IP
address of the machine. My email reply explained that I
always prefer to refer to servers by their domain name rather
than their IP address and asked them for the domain name of
the server. They replied that the domain name they would use
would depend on the department within the university which
was responsible for maintaining the server but due to a
university re-organization, it was not at this point clear
which department that would be. However, they explained that
they could guarantee that the IP address of the server would
remain unchanged for a long time.
</p>
<p>
Several years later, the list of servers now abandoned as a
single list of all World Wide Web servers was among the
now-extensive web of information maintained on the server
known as info.cern.ch, the first World Wide Web server set up
at the start of the World Wide Web project. At this time the
responsibility for the coordination of World Wide Web
protocols was shifting from CERN to MIT/LCS and the embryonic
World Wide Web Consortium. For a while, CERN continued to
maintain the server, but later the master sources for that
information were maintained in America. Soon after this the
authorities at CERN requested that the name info.cern.ch
should no longer be used to refer to this information, as it
was no longer under control of CERN and they could no longer
assume responsibility for it. In fact, there was a policy
that names in the cern.ch domain should never be allowed to
refer to Internet addresses which were not physically on the
CERN site. Therefore all hypertext pointers into the
info.cern.ch space have had to be changed over the course of
time to point to the <code>w3.org</code> space.
</p>
<p>
These two examples show the "name" of objects having to be
changed even though the objects retained their essential
identity. The reason was in each case imbedded information in
the name: the domain name on the server contains authority
information about the maintainer of the computer whose
address corresponds to the domain name. If the authority for
an object changes, whether it "moves" on not, then there may
be a need to change its name under these circumstances. It
turns out that for almost any naming or addressing system in
which there is some information (other than random numbers or
dates of creation of the objects) built into the name that
the name might have to be changed when the facts
corresponding to that information change. Therefore it
becomes simply a matter of choice between naming or
addressing systems as to what sort of information you wish to
include implicitly or explicitly within your "name" or
"address".
</p>
<h2>
<a name="Why" id="Why">Why Names Change</a><br />
</h2>
<p>
<small>See also:</small>
</p>
<ul>
<li>
<small>In the Syyle Guide for Online Hypertext, <a href=
"../Provider/Style/Overview.html"><i>Cool URLs don't
change</i></a></small>
</li>
</ul>
<p>
It is worth looking at some of the reasons for names in
practical use to change or need to be changed. Some World
Wide Web servers have unwisely simply mapped the URL space
onto a Unix filename space, and the results of this,
especially in the early days, were URLs which might look like
this:
</p>
<p>
http://pegasus.cs.foo.edu/disk1/students/romeo/cool/latest/readthis.html
</p>
<p>
Looking at the segments of this name we can see as many
reasons for the name to need to be changed.
</p>
<p>
The "http:" will only be changed if the document is later
served up using a different protocol and, in fact, that is
probably one of the least likely pieces to change.
</p>
<p>
"Pegasus", the name of the computer, probably has a
significance within the university as a computer dedicated to
some particular tasks such as supporting personal student
activities, and maybe maintained by a particular department
or may even be a name from a project for which the computer
was originally put into use before it became shared with
general user space. So, "pegasus" will be changed whenever
the function of supporting this particular student's web
pages has to be shared with other functions.
</p>
<p>
"Cs" indicates the computer science department, so the
document is bound to the computer science department. It may
not be something which the computer science department has a
lot of interest in, and the student may well transfer his or
her interests to other departments in the future.
</p>
<p>
The name of the university "foo.edu" will probably last for a
good while, though whether the university wants to continue
to be associated with the document for more than two or three
years is questionable.
</p>
<p>
The next section of the path, "disk1", is clearly a mistake.
In fact, of course, disc1 is just a name which can be
attached to any physical disk, but by grouping together all
the students on a certain disk in this arbitrary way, one
makes a binding between all the documents which they create
which will have to be broken whenever the computer is
reorganized. In fact, the relocation tables which most
servers support allow much translation of names to take place
and make this sort of path quite unnecessary.
</p>
<p>
The next element identifies Romeo as a student which may
change even though he continues to study for the rest of his
life, and then the next path element "romeo" identifies the
author of the document. As in the case with CERN above, the
original author of a document may later not wish to keep
maintenance or responsibility for ongoing versions. For
example, the document may be submitted to an organization
which publishes it and formally takes over responsibility for
its upkeep; it may achieve a status of some kind as a
standard or an accepted thesis which causes its maintainers
to change. The original author may in fact deliberately
simply pass on authorship of the document to someone else. In
any of these cases the name would have to change, and all
references to that name would break.
</p>
<p>
The student himself has not been very wise with his choice of
path name. For many people, what is "cool" changes with time
and for most people what is "latest" changes with time.
</p>
<p>
Perhaps the unlikely to change piece of information in the
URL "readthis" as it contains no information at all, just
like the proverbial "click here". Effectively, it is a random
name assigned to the document and as such, is perhaps the
safest part of the path.
</p>
<p>
The last element of the path, "html" is not strictly
necessary with most servers, as at least some servers will,
given a URL of "readthis" , serve up the data
from a file which is called "readthis.html". Here the student
is making it difficult for himself later to change the format
or formats in which the file is available, without at least
some confusion. Suppose, for example, that he later decides
that the information is worth providing in audio format for
blind readers. The CERN server can easily be configured so
that clients specifically requesting audio formats in
preference to HTML can be served as preferentially whereas
more normal clients will get the HTML. So, here again is a
part of the path which may be later regretted.
</p>
<p>
You can play this game with almost any name and address in
any system, and it is interesting to ask yourself in each
case: to what extent do I call this a "name" and to what
extend do I call it an "address"? So, in conclusion we see
that any information explicitly owned or implicitly included
in a name is a threat to its longevity. We see
that the difference between a "name" and an "address"
is not so fundamental. That is why
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
When a new URI scheme is defined, the specification
defining ity should describe the name-like and
address-like properties of URIs in the new scheme, so
that that those using them can know what to be able to
expect.
</td>
</tr>
</tbody>
</table>
<h2>
<a name="What" id="What">What's in a name?</a><br />
</h2>
<p>
Why is information included then? Generally, the information
is included because in order to discover anything about the
name, one has to "dereference" the name. Typically this uses
some official or unofficial set of indexes distributed or
otherwise to look up the name. Many names are hierarchical in
the authority which allocates them. DNS names are a good
example. Road names within towns are another good example.
Therefore to find out where the new "North Street" is located
in small town one goes to the town for the definitive answer.
For information as to where the server "pegasus.cs.foo.edu"
is, one must send a message directly or indirectly to a
server controlled by the Foo University.
</p>
<p>
Is it possible to omit all such information from a name?
Certainly. Message identifiers in mail have only the need to
be unique. So, whereas hierarchical names and time stamps may
be used to help make such identifiers unique, you cannot
dereference the names at all. Perhaps we should call these
"identifiers" rather than "names". Within a certain context,
it is extremely useful to be able to refer to a mail message
by its mail identifier. We say that these identifiers support
the notion of equality: even though they cannot be
dereferenced, you can test two mail messages to find out
whether they are in fact the same simply by testing their
identifiers. You can also within a finite set of mail
messages look up a message of a given identifier. You just
can't do this on a global scale. So this then is the essence
of the naming problem:<br />
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
The naming problem: if you put information in a name,
it decreases its longevity; if you don't you can't
dereference it to a resource.
</td>
</tr>
</tbody>
</table>
<h3>
<a name="social" id="social">Naming: A social and contracual
Issue</a>
</h3>
<p>
Many, many solutions to the naming problem have been
attempted and successfully deployed in different
circumstances. At one end of the scale, it would be in fact
possible using a huge network of hash tables around the
world, to keep a hash index of all randomly generated unique
names. The problem with this idea is that there would have to
be one single funding model and one homogeneous quality of
service for all names. There would be no way to pay more for
a more persistent name.
</p>
<p>
At the other end of the scale, hierarchical systems such as
the domain name system, and the x500 name system, have been
implemented. Suppose one wants to use a name which can be
dereferenced and therefore must put some information in it.
That information will lead us to some authority or some root
to dereferencing the name. How can we maintain the lifetime
of that name as something which can be dereferenced? The only
way is that we have a contract with all the agencies which
are involved in supporting the systems which dereference that
name that they should continue their operation giving a
certain quality of service for a certain period of time.
</p>
<p>
Suppose the Foo Alumni Association ran a URL service in which
a special name such as
"http://alumni.foo.edu/1998/romeo/202-aab" would be available
to any graduating paying their dues, and maintained
indefinitely (perpetual care) on receipt of a suitable
endowment.
</p>
<p>
Of course, as organizations disolve and mutate, there is
nothing to stop one organization from taking over the support
of the archives another. Forthis purpose, it
would be very useful to have a syntax for putting a date into
a domain name. This would allow a system to find an
archive server. Imaging that, failing to find
"info.cern.ch", one could search back and find an entry
"info.cern.ch.1994" which pointed to www.w3.org as a current
server holding archive information for info.cern.ch as it was
in 1994, with, of course, pointers to newer versions of
the documents.
</p>
<h3>
<a name="QoS" id="QoS">Quality of Service</a>
</h3>
<p>
Looking at an "http:" URL, while some look more sensible than
others, it is not immediately evident whether great pains are
being taken to make the name very persistent. We have
just discussed such a range of reasons why names can change,
and clearly the social and contractual arrangements can be
quite involved, so it is clearly difficult to simply define a
quality of service for naming. However, defining some
well known quality of service levels would be a very useful
task. This is the sort of task ideally suited to a group of
trechnologies, librraians or archivists.
</p>
<p>
In any event, for identifiers in the http space and
many others, it would be useful to be able to assert what the
quality of service is. This is information about a URI and a
resource. Like the <a href=
"Generic.html#Dimensions">information about generic URIs</a>,
it is about the sort of identity between the URI and the
resource.
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
Metadata should be used to express the quality of
service for the binding between a URI and a resource.
</td>
</tr>
</tbody>
</table>
<h2>
<i><br /></i>
</h2>
<hr />
<p>
<a href="Metadata.html">Next: Metadata architecture</a>
</p>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
</body>
</html>