Model.html
19.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
Univeral Resource Identifiers -- Axioms of Web architecture
</title>
<link href="di.css" rel="stylesheet" type="text/css" />
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
</head>
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
<address>
Tim Berners-Lee
<p>
Date: January 1998
</p>
<p>
Status: personal view. Editing status: Spellchecked.
</p>
</address>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<h3>
Axioms of Web Architecture: 0
</h3>
<ul>
<li>
<a href="Model.html#Model">The Web model</a>
</li>
<li>
<a href="Model.html#Resource">Resources</a>
</li>
<li>
<a href="Model.html#Fragement">Fragment IDs</a>
</li>
<li>
<a href="Model.html">Document sets and relative
addressing</a>
</li>
<li>...
</li>
</ul>
<hr />
<h1>
<a name="Model" id="Model">The Web Model</a>
</h1>
<p>
The web is a very general concept -- one universal space of
information. The concepts it requires such as identifiers and
information resources (documents) are as general and abstract
as possible. However, there have been some design decisions
made which define some interfaces, and effectively define
modules or agents which are independent. These agents are
independent in many ways
</p>
<ul>
<li>There is knowledge they have individually but do not
share
</li>
<li>There is knowledge their designers had individually but
did not share
</li>
</ul>
<p>
This is basic modularity. The interfaces are defined by the
data formats and protocols, and the important features to
understand about the design I have ranted about in the linked
articles in this series. This modularity, ability for
different parts of the system, shows up when different specs
are independent, such that you could change one without
having to change the other.
</p>
<h2>
<a name="Resource" id="Resource">The Information Resource</a>
</h2>
<p>
(Formerly, <a href="#Resource1">Resource</a>)
</p>
<p>
This is the current term for a certain unit of information in
the Web. In many cases on the current Web, thinking
"document" will do. It is something which conveys
information. The Web model is that information in the
information space is in the abstract chunked into addressable
things known as resources.
</p>
<p>
In the technical architecture, resources have identifiers,
Universal Resource Identifiers, and the properties of these
identifiers are elaborated later. In fact the concept of a
unit of information is central, not only in the technical
architecture, but in society's concepts of information, as a
document is not only the unit for reference, retrieval and
presentation (typically), but also the unit of ownership,
license to use, payment, confidentiality, endorsement, etc.
So though technically we can derive such things as compound
document, generic documents, and resources which look
anything but the typical notion of a "document", we have to
be able to support these social aspects of information at the
same time, so we can't mess with it too much.
</p>
<h2>
<a name="Fragement" id="Fragement">Fragment Id and "#"</a>
</h2>
<p>
In the hypertext architecture, when making a reference, such
as a hypertext link, we don't just refer to an information
resource. Well, we can, but we can also refer to a particular
part of or view of a resource. The string which, within the
document, defines the other end of the link has two parts. It
has the identifier of the document as a whole, and then
optionally it has a hash sign "#" and a string representing
the view of the object required. This suffix is called
a fragment identifier. (Even though it doesn't
represent necessarily a fragment of the document: it could
represent how the document should be viewed.). The fragment
identifier only has relevance in the context of the web page
in question. This has an implication how the software is
built. For example, An "access" module can be given just the
bit of the URI without the fragment identifier. It gets the
information, and creates a software object for the hypertext
page. That object is passed the fragment identifier.
</p>
<p>
<img src="ParseHash.png" width="100%" alt=
"The URI is split off at the hash into a fragement ID and the rest"
border="0" />
</p>
<p>
In fact, analyzing the system a little more, the access
function can be broken into the underlying access which
creates the object by passing two things to some kind of
object creator ("factory"): a data stream and a MIME type.
</p>
<h3>
Generally
</h3>
<p>
Hypertext is a specific application, but this principle works
for other applications on the Web. In fact, when we discuss
<a href="Webize">webizing</a> an application, we take some
computer language, and we take what were document-global
things, say global variables in a programming language, and
make them truly global by appending the URI of the document
and "#".
</p>
<p>
Clearly, in different applications the fragment identifier
will have completely different function. The independence
here means that new applications (such as the Semantic Web)
can be built, just like hypertext web, just by introducing
new types of document.
</p>
<h2>
Independence
</h2>
<p>
The model of how the web works is that there are two separate
functions. The part (blue in the picture) which
accesses the document deals with its identifier, but does not
know what view will be required. It creates some
software object which represents and presents the resource.
That object does not need to know how it was created
(necessarily), and so does not need to know the URI it was
identified by. However, it does know how to interpret the
Fragment ID.
</p>
<p>
So we have two axioms:
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
The access machinery does not need to look at the
fragment ID.
</td>
<td></td>
</tr>
</tbody>
</table>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
The presentation object does not need to know the URI
of the resource
</td>
</tr>
</tbody>
</table>
<p>
The equivalent axioms when we are talking about
specifications amount to:
</p>
<table border="1" cellspacing="5" cellpadding="5">
<tbody>
<tr>
<td>
The specifications for access protocols are independent
of the specifications for fragment identifiers.
</td>
</tr>
</tbody>
</table>
<h3>
Why?
</h3>
<p>
For one thing, consider the special case of a link within a
document. In this case, the link <b>only</b> specifies
a fragment identifier. The object can follow the link
itself. It doesn't have to consult the access code in
order to figure out where the link goes to.
Because the "#" syntax s universal to all access
methods, the object can process the link internally.
For a static HTML file, for example, this means that
you can write and HTMl file with internal links without
worrying or knowing about exactly what URIs the file will
get. It means you don't have to alter the file if you
chose to serve it in some new name or address space. If
the "#" syntax was not a universal specification for the web,
this would break: you couldn't do it. As Jim Gettys points
out, as the era of digitally signed documents comes upon us,
changing a signed document will break the signature on it. So
allowing one to make a self-consistent document with internal
links in a way independent of the namespace is even more
essential.
</p>
<h3>
Why else?
</h3>
<p>
This independence is very important for the evolution of the
Web. It means that people can go off and design all
kinds of new systems for naming, addressing and accessing
documents, without having to worry about what sort of
documents will be moved. It means that people can go
off and make new media types (MIME types), each of which can
have different concepts for views and fragments, without
having to talk to the people developing the access
technology. This has already (1998) proved incredibly
enabling to the community, as HTTP has advanced in parallel
with many other ways of accessing data, and the number of
exciting media types has grown very rapidly, and will be the
key to many new revolutions built on top of the basic Web
idea.
</p>
<p>
If you look at the diagram you ill notice how the fragment
IDs are generated by and understood by just the one module.
You see how, when designing a new MIME type, one is
quite free to be creative in making new and powerful forms of
fragment ID, knowing hat no other specifications will refer
to them, and nothing else will break.
</p>
<h2>
Document sets and relative addressing
</h2>
<p>
Now let us look at what happens when we follow a link.
For example, say a hypertext page is clicked on.
The page has a representation of the end point of the
link. It hands it to the application. In fact,
often, there are links between pages whose URIs are very
similar and only differ in the right hand part. This
isn't true of all name spaces: for example, when making links
between news articles identifies by the news id (news:foo)
unique ID, you have to specify the whole thing. However, if
you restrict publication of a set of documents to a
hierarchical name or address space, then you can arrange for
documents which are very related and have many links to be in
the same part of the tree.
</p>
<p>
In this case, the links between these documents are "relative
URIs".
</p>
<p>
What happens then is that the relative URI, which only has
the locally different part of the URI in it, is handed back
to what in the diagram I have called the "application", to be
turned into an absolute URI by being combined with the
absolute URI of the resource, which the application has
remembered.
</p>
<p>
Note that the application is aware of the absolute URI but
still the resource does not have to.
</p>
<p>
Note that the fragment id is still circulated around a loop
between the object (green) which understands it and the
applications (yellow) which handles it transparently but does
not understand or change it.
</p>
<p>
Now there was a design decision that the application could
have passed to the access module both the relative URI and
the absolute URI. Then, different namespaces would have been
able to have different algorithms for resolving a base URI
and a relative URI into a new absolute URI. But the decision
was made that the relative address format should be common
across all name spaces.
</p>
<p>
<img src="Parse2.png" width="100%" alt=
"The URI is split off at the hash into a fragement ID and the rest"
border="0" />
</p>
<h3>
Why?
</h3>
<p>
Just as we considered internal links above, now consider
relative links between a bunch of documents, like the
sections of a book, which are close in the tree. In
practice, such document sets are moved from place to place,
from file systems into HTTP space or FTP space, and because
the relative address rules are universal, the documents do
not have to be modified every time they are moved. (Yes, if
you move half the set to one place and half to another, you
have to fix links). This is happening all the time.
People are creating and programs are generating
hypertext with relative links without knowing or caring what
absolute URI will be used to refer to the material.
</p>
<h2>
The access scheme
</h2>
<p>
<img src="Parse3.png" width="100%" alt=
"The URI is split off at the hash into a fragement ID and the rest"
border="0" />
</p>
<p>
The so-called "access scheme" is the first part of the URI.
As we have seen above, you don't have to know anything about
it to parse relative URIs or to process the fragment
identifier of a URI. The knowledge of particular schemes is
limited to the "access" function (blue in the above diagram).
</p>
<p>
The scheme is a very important flexibility point, and should
not be abused. Anyone dereferencing a URI must have a
knowledge of the scheme it uses.
</p>
<p>
The access scheme defines a huge part of URI space. The
scheme defines a subspace with particular properties
</p>
<p>
The access scheme is <i>by definition</i> the highest point
of flexibility. What does that mean? It means that if the
whole Web develops problems which we cannot solve within the
existing protocols, or if new spaces are designed which
really can't be accessed through or mapped into existing
spaces, then we can create a new space. We have faith that we
will be able to use this flexibility point in the future,
because it worked successfully for integrating the older
spaces such as Gopher and FTP spaces into the Web.
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
If you have ported a concept between environments in
the past, then there is a better hope that you can in
the future.
</td>
</tr>
</tbody>
</table>
<h3>
The danger of too many access schemes
</h3>
<p>
However, we do not do this lightly. When we introduce a new
space, it may have very different properties and we expect
that the deployment of new software will be needed to allow
access to it. Some spaces may be gatewayable into HTTP space,
and this will often provide a transition path. This is why
early browsers allowed one to declare in a configuration file
what gateways to use for what new spaces.
</p>
<p>
If we use this extension point frivolously, ironically, it
will cease to work. Suppose very many schemes are introduced.
The access scheme space itself becomes a namespace with all
the problems which current namespaces such as DNS are trying
to solve, but which are very hard problems:
</p>
<ul>
<li>Clashes in the namespace would destroy interoperability;
</li>
<li>Ownership of the space becomes commercially valuable;
</li>
<li>Democratic and fair management becomes essential and
difficult;
</li>
</ul>
<p>
Worse, though, technology will be needed to automatically
dereference the schemes themselves and download code to
handle them. Something like DNS will be needed. The top level
namespace then becomes in fact DNS, or something like it.
This, however, begs the question. What happens if later DNS
needs to be replaced? There is no top-level extension switch
left. The world is stuck with whatever form of access-scheme
name service exists.
</p>
<p>
Therefore, I conclude that access schemes should not be open
to trivial extension, and that the access scheme should only
be extended by the introduction of new standards with full
open review by the entire community.
</p>
<h3>
Alternatives to new schemes
</h3>
<p>
Whereas some schemes (like "data:") are clearly neat and new
and orthogonal to HTTP, many schemes could in fact be
integrated into http, using HTTP extension mechanisms.
</p>
<p>
In fact, is HTTP is to be taken as a general computing
protocol, then use of an <a href="Extensible.html">extensible
language system</a> for the HTTP request message would allow
a huge amount of extension, covering protocols with different
functionality (exporting different interfaces).
</p>
<h3>
Evolving scheme spaces
</h3>
<p>
When considering the evolution of a space, it is important to
remember that primarily the access scheme refers to a part of
the URI space, and secondarily it refers to a protocol.
Therefore, one can in fact change the protocols used to
access resources within a scheme's namespace, without
changing the space. For example, a new DNS protocol could be
introduced which over time would replace the current one,
without changing the DNS space. This would effectively
redefine the HTTP and FTP protocols, but would not harm the
namespaces. When touch-tone dialing was introduced, the
telephone numbering system remained the same. So an indexing
system could be introduced which, when deployed, would allow
http:// space objects to be found with greater reliability or
speed than the current protocols, while maintaining the HTTP
space as being the concatenation of a DNS name and an opaque
string.
</p>
<hr />
<h2>
Footnote
</h2>
<h4>
<a name="Resource1" id="Resource1">Resource</a>
</h4>
<p>
The word "document" in the original "Universal Document
Identifier" in the first web spec was changed to "Resource"
in the IETF discussions, because (a) the word "document"
didn't seem to cover all kinds of information resources such
as movies and sounds, and (b) actually URIs exist for
communication endpoints such as mailboxes (mailto:) and login
ports (telnet:). "Resource" was, though, later used by RDF as
a term for anything - the top class which is the superclass
of all classes. This stemmed from RDF's initial use as a
language for describing information resources on the Web,
although RDF was designed to be used to describe anything as
a general knowledge representation system. The term
"Information Resource" was adopted by the TAG for the Web
Architecture document. When people, including the author in
the article above, refer to an information resource, they
often
</p>
<h2>
Related material elsewhere in these notes
</h2>
<p>
<i>Content/Version negotiation and Fragment ID persistence:
warnings and awareness.</i> See <a href=
"Fragment.html">Fragment Identifiers</a>
</p>
<p>
<i> If you negotiate between MIME types which have
different fragment ID representations, you run a risk &
should warn the client.</i>
</p>
<p>
To be added:
</p>
<p>
<i>Level breaking with care: optimizing in HTTPNG etc</i>
</p>
<hr />
<p>
<a href="Overview.html">Up to Design Issues</a>, On to URIs
</p>
</body>
</html>