Axioms.html
43.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
Univeral Resource Identifiers -- Axioms of Web architecture
</title>
<link href="di.css" rel="stylesheet" type="text/css" />
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
</head>
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
<address>
Tim Berners-Lee
<p>
Date: December 19, 1996
</p>
<p>
Status: personal view. Editing status: Italic text is
rough. Reques complete edit and possibly massaging, but
content is basically there. Words such as "axiom" and
"theorem" are used with gay abandon and the reverse of
rigour here..
</p>
</address>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<h3>
Universal Resource Identifiers -- Axioms of Web Architecture
</h3>
<ul>
<li>
<a href="#uri">Universal Resource Identifiers</a>
<ul>
<li>
<a href="#Universality">Universality</a>
</li>
<li>
<a href="#unique">Global uniqueness</a>
</li>
<li>
<a href="#same">Sameness</a>
</li>
<li>
<a href="#identity">Identity</a>
</li>
<li>
<a href="#canonicalization">Canonicalization - when is
a URI the saem URI?</a>
</li>
<li>
<a href="#abuse">Identity abuse</a>
</li>
<li>
<a href="#nonunique">Not a unique space, just
universal</a>
</li>
</ul>
</li>
<li>
<a href="#state">Identity, State and GET</a>
<ul>
<li>
<a href="#opaque">Opacity</a>
</li>
<li>
<a href="#Query">Query strings</a>
</li>
</ul>
</li>
<li>
<a href="#relative">Relative URIs</a>
<ul>
<li>
<a href="#matrix">Matrix spaces</a>
</li>
</ul>
</li>
<li>
<a href="Axioms.html#Properties">The properties of
different URI schemes</a>
</li>
</ul>
<hr />
<h1>
Universal Resource Identifiers
</h1>
<p>
The operation of the World Wide Web, and its interoperability
between platforms of differing hardware and software
manufacturers, depend on the specifications of protocols such
as HTTP, data formats such as HTML, and other syntaxes such
as the URL or, more generally, URI specifications. Behind
these specifications lie some important rules of behavior
which determine the foundation of the properties of the Web.
These are rules and principles upon which new designs of
programs and the behavior of people must rely. And it is that
reliance which makes the Web both an information space which
works now, and the foundation for future applications,
protocols, and extensions. The more essential of these I
refer to loosely as axioms, and the most basic of these have
to do with URI.<br />
</p>
<p>
The aim of thes article is to summarize in one place the
axioms of Web architecture: those invariant aspects of Web
design which are implied or stated in various specifications
or in some cases simply part of the folk law of how the Web
ought to be used. Especially for these latter cases, this
article is designed to tie together the Web community in a
common understanding of how we can progress, extend, and
evolve the Web protocols. <i>Terms such as "axiom", and
"theorem" are used with gay abandon rather than precision as
this not a mathematical treatise.</i><br />
</p>
<h2>
<a name="uri" id="uri">Universal Resource
Identifiers</a><br />
</h2>
<p>
The Web is a universal information space. It is a space in
the sense that things in it have an address. The "addresses",
"names", or as we call them here identifiers, are the subject
of this article. They are called <b>Universal Resource
Identifiers</b> (URIs).
</p>
<p>
An information object is "on the web" if it has a URI.
Objects which have URIs are sometimes known as "First
Class Objects" (FCOs). The Web works best when any
information object of value and identity is a first class
object. If something does not have a URI, you can't
refer to it, and the power of the Web is the less for that.
</p>
<p>
By <em>Universal</em> I mean that the web is declared to be
able to contain in principle every bit of information
accessible by networks. It was designed to be able to include
existing information systems such as FTP, and to be able
simply in the future to be extendable to include any new
information system.
</p>
<p>
The URI schemes identify things various different types of
information object, wich play different roles in the
protocols. Some identify services, connection end points, and
so on, but a fundamental underlying architectural notion is
of information objects - otherwise known as generic
<strong>documents</strong>. These can be
<strong>represented</strong> by strings of bits. An
information object conveys something - it may be art, poetry,
sensor values or mathematical equations.
</p>
<p>
The Semantic Web allows an information objects to give
information about anything - real objects, abstract concepts.
In this case, by combining the identifier of a document with
the identifier, within that document, of something it
describes, one forms an idenifier for anything. This is done
with "#" and fragment identifiers, discussed later.
</p>
<h4>
<a name="Universality" id="Universality">Axiom 0:
Universality</a> 1
</h4>
<p class="axiom">
Any resource anywhere can be given a URI
</p>
<h4>
<a name="Universality2" id="Universality2">Axiom 0a:
Universality 2</a>
</h4>
<p class="axiom">
Any resource of significance should be given a URI.
</p>
<p>
(What sorts of things can be resources? A very wide variety.
The URI concept istelf puts no limits on this. However, URIs
are divided into schemes, such as http: and telenet:, and the
specification of each scheme determines what sort of things
can be resources in that scheme. Schemes are discussed
later.)
</p>
<p>
This means that no information which has any significance and
persistence should be made available in a way that one cannot
refer to it with a URI.
</p>
<p>
In fact, we take care before extending the URIs to include
any old system, because URIs of any form must also be
understood anywhere in the world.
</p>
<p>
When you specify a URI, a Universal Resource Identifier,
(often people use the more restricted term "URL", Uniform
Resource Locator), first axiom is:
</p>
<h4>
<a name="unique" id="unique">Axiom 1: Global scope</a>
</h4>
<p class="axiom">
It doesn't matter to whom or where you specify that URI, it
will have the same meaning.
</p>
<p>
So, this means that there is no scope within which a URI must
be placed for it to hold. All you need say is that something
is "on the Web" and that is enough. Anyone can follow that
hypertext link, anyone can look up that URI. Now, the sorts
of URI that we find typically start "<code>http:</code>"
indicating that this URI points into a space of objects
accessed using the hypertext transfer protocol. But there are
many other sorts of URI, and a key to the Universality is
that this universal space of identifiers whether you call
them names addresses or locators, is universal through the
range of pre-existing protocols such as SMTP, and NNTP,
through protocols designed for the Web specifically (HTTP)
through, in principal, to new protocols yet to be invented.
So, there is a theorem, if you like, of URIs that:
</p>
<p class="axiom">
Any new space of identifiers or address space can be
represented as a subset of URI space.
</p>
<p>
You can prove this easily because there is no limit to the
length of a URI and any new name system or address system can
be incorporated simply by encoding the value of names or
addresses into an acceptable, printable string and prefacing
that string with a standard prefix for that new scheme. So,
you could replace http:, for example with ISBN: or X500:
depending on the new scheme.<br />
</p>
<p>
There is a second axiom of URIs which is difficult to
characterize exactly but accepted in some form by everyone
who uses the Web in some form and that is that:
</p>
<h4>
<a name="same" id="same">Axiom 2a: sameness</a>
</h4>
<p class="axiom">
a URI will repeatably refer to "the same" thing
</p>
<p>
The same identifier string is expected from one day to the
next to point to, in some sense, the same object. That is a
very important axiom, and it leaves open the "in some sense"
behind which is a very complicated discussion of the concept
of identity. When are two things "in some sense" the
same.<br />
</p>
<h4>
<a name="identity" href="Axioms.html#Universality" id=
"identity">Axiom 2b: identity</a>
</h4>
<p>
of URIs clears up the vagueness of 2a and is that
</p>
<p class="axiom">
the significance of identity for a given URI is determined by
the person who owns the URI, who first determined what it
points to.
</p>
<p>
We do not discuss in detail here the definition of owner
because the mechanism by which a person or agent comes to get
or create or be allocated a new URI varies from scheme to
scheme. But in every scheme in practice there is a way of
making a new URI. In many schemes, the scheme itself implies
or requires some properties of identity. The scheme, if you
like, imposes constraints within which the owner of a URI is
free to define identity.
</p>
<p>
The implication here is that we will need protocols for
exchanging any guarantees of the properties of given URIs:
they are not simply laid down in the specification of the
web. This is in tune with a general philosophical principle
of design (after Bob Scheifler and others):
</p>
<blockquote>
The technology should define <i>mechanisms</i> wherever
possible without defining <i>policy</i>.
</blockquote>
<p>
because we recognize here that many properties of URIs are
social rather than technical in origin.
</p>
<p>
Therefore, you will find pointers in hypertext which point to
documents which never change but you will also find pointers
to documents which change with time. You will find pointers
to documents which are available in more than one format. You
will find pointers to documents which look different
depending on who is asking for them. There are ways to
describe in a machine or human readable way exactly what sort
of repeatability you would expect from a URI, but the
architecture of the Web is that that is for something for the
owner of the URI to determine.
</p>
<h4>
<a name="abuse" id="abuse">Identity abuse</a>
</h4>
<p>
<i>All the same, a word of caution is appropriate about the
indiscriminate or deliberately misleading abuse of the
identity of the object refered to by a URI. A web server is
often in a position to know a lot of context about a request.
This can include for example, the person who is asking, the
document they were reading last from which they followed the
link. It is possible to use this information to
dramatically change the content of the document refered to.
This undermines the concept of identity and of
reference in general. To do that without making it
clear is misleading both to anyone who quotes the URI
of a page or who follows the link.</i>
</p>
<p>
<i>Unless it is clearly indicated on the page (or using a
future protocol) , to return differing information for the
same URI must be considered a form of deception. It
also of course messes up caches. Note the HTTP 1.1 "Vary"
header allows this indication to be passed.</i>
</p>
<h4>
<a name="canonicalization" id="canonicalization">When is a
URI "the same URI"?</a>
</h4>
<p>
Two URIs are the same if (and only if) they are the same
character for character.
</p>
<p>
Two URIs which are different may in fact be equivalent, in
that they may refer to the same thing, and give the same
result in all operations. In some cases any agent looking at
two URIs can deduce, from knowledge of the various web
standards, that they must be equivalent, in that they must
refer to the same thing. For example, HTTP URIs contain
domain names, and the Domain Name System is case-insensitive.
Therefore, while it is normal practice to use lower case for
domain names, any agent which comes across two URIs which
differ only in the case of the domain name can conclude that
they must refer to the same thing. In another case, a client
agent may use out-of-band information about a web site to
know that its URI paths are case-invariant, or that URIs
ending in "/" and "/index.html" are equivalent. It is bad
engineering practice to make new protocols require such
processing.
</p>
<p>
There are a long series of such algorithms. Which ones an
agent can apply depends on what information it has to hand,
and depend on what knowledge of which protocols has been
programmed into it. New schemes may be defined in the future,
for which different forms of canonicalization can be done.
There is, therefore, <strong>no definitive
canonicalization</strong> algorithm for URIs. Generic URI
handling code should handle URIs as case-sensitive character
strings. It is <strong>not</strong> recommended that, for
example, encryption and signature algorithms attempt to
canonicalize URIs before signature, because of the
arbitrariness of any attempt to define a canonicalization
algorithm.
</p>
<p>
The only canonicalization one could insist upon would be that
defined by the algorithms in the URI specifications. This
incldues the generation of an absolute URI from and the
hex-encoding or decoding of all non-reserved characters.
</p>
<h3>
URIs and the Test of Independent Invention
</h3>
<p>
The concept of a web as a "space" is based on these axioms of
design. As a result, the web behaves to a certain extent as a
system with state, and an important part of the work of the
system is the distribution of visible state rather than the
execution of invisible remote operations.
</p>
<h4>
<a name="nonunique" id="nonunique">Axiom 3: non unique</a>
</h4>
<p class="axiom">
URI space does not have to be the only universal space
</p>
<p>
The assertion that the space of URIs is a universal space
sometimes encounters opposition from those who feel there
should not be one universal space. These people need not
oppose the concept because it is not of a single universal
space: Indeed, the fact that URIs form universal space does
not prevent anyone else from forming their own universal
space, which of course by definition would be able to envelop
within it as a subset the universal URI space. Therefore the
web meets the "independent design" test, that if a similar
system had been concurrently and independently invented
elsewhere, in such a way that the arbitrary design decisions
were made differently, when they met later, the two systems
could be made to interoperate.
</p>
<p>
There may be in the world many universal spaces, and there
need not be any particular quarrel about one particular one
having a special status. (Of course, having very many may not
be very useful, and in the World Wide Web, the URI space
plays a special role by being the universal space chosen in
that design.)
</p>
<p>
For example, it would be possible to map all international
telephone numbers into URI space very easily, by inventing a
new URI "phone:" after which was the phone number. It would
in fact also conversely be possible to map URIs into
international phone numbers by allocating a special phone
number not used by anyone else, perhaps a special country
code for URI space, and then converting all URIs into a
decimal representation. In that case, both URIs and phone
numbers would be universal spaces. Identifiers in one space
would be consisting only of numbers, and in the other of
alphanumeric characters. One would be shorter than the other,
but there is no reason why, in principle, the two could not
co-exist, allowing you to dial any Web object from a
telephone as a telephone number, and point to any phone from
a hypertext document.<br />
</p>
<p>
So, on this last axiom rests not specifically the operation
of the web, but its acceptance as a non-domineering
technology, and therefore our trust in its future
evolvability.
</p>
<h3>
<a name="state" id="state">Identity, State and GET</a>
</h3>
<p>
From the fact that the thing referenced by the URI is in some
sense repeatably the same suggests that in a large number of
cases the result of de-referencing the URI will be exactly
the same, especially during a short period of time. This
leads to the possibility of caching information. It leads to
the whole concept of the Web as an information space rather
than a computing program. It is a very fundamental concept.
Not only do the concepts of navigation around the space
remembering "places" in the Web and other humanly visible
aspects of the nature of the Web depend on it, but also many
technical architectural properties depend on it. For example,
the implication is that the GET operation in HTTP is an
operation which is expected to repeatably return the same
result. As a result of that, anyone may know that under
certain circumstances that they may instead of repeating an
HTTP operation, use the result of a previous operation. The
operation is "idempotent". This, in turn, allows software to
use previously fetched copies of documents and it requires
that the HTTP GET operation should have no <em>side
effects</em>. For example, GET should never be used to
initiate another operation which will change state. In
general (see the HTTP 1.1 spec) the notion of side-effects is
that of any significant communication between the parties. A
user can never be held accountable to anything as a result of
doing a GET. The server may for example log the number of
requests, but the client user cannot be held responsible for
that: it does not constitute communication between the two
parties.
</p>
<p>
It is wrong to represent the user doing a GET as committing
to something or putting themselves on a mailing list, doing
any operation which effects the state of the Web or the state
of the users relationship with the information provider or
the server. To ignore this rule can be to introduce a serious
security problem in a website.
</p>
<p>
So, from this principal, we have a principal of the http
protocol that :
</p>
<h4>
<a name="get" id="get">Axiom</a>
</h4>
<p style="axiom" class="axiom">
In HTTP, GET must not have side effects.
</p>
<p>
The introduction of any other method apart from GET which has
no side effects is also incorrect, because the results of
such an operation effectively form a separate address space,
which violates the universality. A pragmatic symptom would be
that hypertext links would have to contain the method as well
as the URI in order to able to address the new space, which
people would soon want to do.
</p>
<h4>
<a name="GET2" id="GET2">Axiom</a>
</h4>
<p class="axiom">
In HTTP, anything which does not have side-effects should use
GET
</p>
<p>
This means that for people implementing systems in which
users request information and execute operations using forms,
when the form simply requests information it must result in a
GET operation. Indeed this is very much to be favored over a
post operation because the result of a GET operation has a
URI and may be leaked to, for example, may be put into a
bookmark. This violates the <a href=
"Axioms.html#Universality2">axiom of universality</a> above.
</p>
<p>
However, when the result of a form is to execute an
operation, which changes the Web or a relationship of a user
to anyone else, then the GET operation may not be used and
POST or other method either through HTTP or mail must be
used. Only by sticking to this rule can such systems
interoperate with caches and other agents which exploit the
repeatability of HTTP GET of URI dereferencing in the
future.<br />
</p>
<p>
The axiom above about URIs pointing in principle to
conceptually the same thing has a corollary which says that
URIs do not always have to point to <i>exactly</i> the same
set of bits. This means that URIs can be "generic". See the
<a href="Generic.html">discussion of generic URIs</a>.<br />
</p>
<h3>
<a name="opaque" id="opaque">The Opacity Axiom</a><br />
</h3>
<p>
The concept of an identifier referring to a resource is very
fundamental in the World Wide Web. Identifiers will refer to
resources all different sorts. Any addressable thing will
have an identifier. There are mechanisms we have just
discussed for extending the spaces of identifiers into name
spaces which have different properties. Different spaces may
address different sorts of objects, and the relationship
between the identifier and the object, such as the uniqueness
of the object and the concept of identity, may vary. A very
important axiom of the Web is that in general:
</p>
<h4>
Axiom: Opacity of URIs
</h4>
<p class="axiom">
The only thing you can use an identifier for is to refer to
an object. When you are not dereferencing, you should not
look at the contents of the URI string to gain other
information.
</p>
<p>
For the bulk of Web use URIs are passed around without anyone
looking at their internal contents, the content of the string
itself. This is known as the <b>opacity</b>. Software should
be made to treat URIs as generally as possible, to allow the
most reuse of existing or future schemes.
</p>
<p>
For example, within an HTTP identifier, even when access is
made to the object, the client machine looks at the first
part of the identifier to determine which server machine to
talk to and from then on the rest of the string is defined to
be opaque to the client. That is the client does not look
inside it, it can not deduce an information from the
characters in that identifier. It has been very tempting from
time to time for people to write software in which a client
will look at a string such as ".html" on the end of an
identifier, and come to a conclusion that it might be
hypertext markup file when dereferenced. But these thoughts
of breaking of the rule could lead to a broken architecture
in which the generality of URIs is something one can no
longer depend on.
</p>
<p>
Opacity of the URIs opens the door to new URI schemes, it
opens the door to excitingly different interpretations of
HTTP URI spaces. For example, servers can use the opaque
string to carry all kinds of parameters to spaces with new
topologies.
</p>
<p>
As a result of this axiom the many parts of metadata, that
information about the object that a client might be tempted
to infer from the actual sting value of the URI but can't,
have to be made available through the HTTP protocol. That is
the purpose of some of the headers of the HTTP protocol and
is discussed in the next section.
</p>
<p>
Another example of a reason for keeping the URI opaque is
that other address spaces for example within an HTTP servers
address space the rest of the URI can be used as a coded
representation of a name in some local space. Typically, when
that is done, when the server serves as a gateway into an
existing space, then it is extremely useful to be able to use
the string in any way consistent with the URI syntax rules to
represent coded names from the other space. The server can
encode, within the URI, complex locations in some legacy
system which is being mapped into an information space for
the first time. So, for example, names which come from names
in some sort of a database might by coincidence end up with
.html with no implication that there is a hypertext markup
language document involved, just that the particular encoding
used happened to produce that string of bits.
</p>
<h4>
<a name="Query" id="Query">Query strings</a>
</h4>
<p>
An important case is the treatment of the question mark in
HTML forms. There is a convention that infformation returned
from HTML forms is returned by encoding it and appending it
to the URI. The question mark within the URI is used to
separate the basic URI from parameters which are appended to
it to perform an operation. A typical use is for a search,
and the string following the question mark is often known as
a query string.
</p>
<p>
When a query string and fragment identifier are used, the
function evaluated on dereferencing a URL
</p>
<pre>
http://foo/bar?baz#frag
Is
select(get( "foo", "query("bar","baz")), "frag")
</pre>
<p>
where
</p>
<ul>
<li>query (resource, querystring) is evaluated by the
resource "bar"
</li>
<li>"bar" is opaque to all except the server "foo" ;
</li>
<li>"baz" is a format understood by client and by the
resource "foo/bar";
</li>
<li>get(server, restofuri) is executed by the client engine
which understands "foo" but not "bar"
</li>
<li>select(fragmentid, resource) is evaluated on the client
by the resource's handling code
</li>
</ul>
<p>
Query strings are clearly not opaque to the client. However,
they should be opaque to (for example) proxies.
</p>
<p>
Apart from searches, other operations are performed, for
example by those filling out HTML forms which are set up to
have an HTTP "GET" action. This is done in situations in
which the results of that operation of the URI are
quasi-static. In other words, the resource referred to by the
complete URI (including the query mark and the query string
after it) follows the axiom of slow change above: the result
of performing the operation is repeatable in some fashion.
</p>
<p>
It is tempting and often done to assume that the result of
such an operation will be more transient than that of a URI
with less or without a query string. To make this assumption
breaks the Opacity rule in general. Not only that, but this
is in many cases a completely wrong assumption. For example,
the query string is sometimes used to indicate parameters
such as a personalized sub-space which is being browsed.
Unfortunately, because the Opacity rule has been broken by
clients and caches which don't cache documents whose URIs
contain question marks, the question mark is sometimes been
deliberately inserted in order to defeat caches. This
creeping use of non-standard and axiom breaking conventions
could clearly be damaging to other systems which use the
question mark for other reasons for perfectly cacheable
documents.
</p>
<h2>
<a name="relative" id="relative">Hierarchies and Relative
URIs</a>
</h2>
<p>
While discussing the universality of Universal Resource
Identifiers, it is as well to discuss the place of the
Universal Syntax as this has been the source of some
misunderstanding as to the intent and advantages behind this.
The URI Syntax, now famous through its HTTP form uses slashes
to indicate a hierarchical structured name or address. Apart
from that, the strings between the slashes are opaque. There
is nothing to say that the string between a double slash and
a slash must be in all URI schemes a fully qualified domain
name; there is nothing to relate strings between single
slashes to parts of a unix file name. The reason that the
slashes have been instituted as common universal syntax for a
hierarchical boundary is that hierarchical schemes are common
and that relative naming within a hierarchical space has many
advantages.
</p>
<p>
Relative naming allows small groups of documents which are
located close within a tree to refer to each other without
being aware of their absolute position within any absolute
tree. It turns out that for scalability of the creation of
material on the Web, this is essential. This has been found
both for file names in most modern operating systems and for
HTTP URLs, and one can also reasonably assume that it will be
true for any other hierarchical scheme. Therefore, it is
important that the generic concept of a hierarchical scheme
is kept separate for future use from specific schemes which
involve possibly to be outdated forms such as fully qualified
domain names.<br />
</p>
<h4>
An example in using relative URIs
</h4>
<p>
Let us take, for example, the exercise of mapping an
international telephone number onto the URL. International
telephone numbers are hierarchical. For example, the meaning
of and the format of a telephone number depends on the
country, but there is a universal format for a telephone
number in the world which can be understood everywhere. This
format is, in fact, a plus sign indicating that one starts at
the top of the hierarchy: that this is an absolute
international telephone number. It is followed by the country
code, the area code (if any) and the telephone number.
Mapping this onto the URL syntax, the double slash would be
used to indicate that one is starting from the top of the
tree, so the number
</p>
<p>
+1 (617) 253-5708
</p>
<p>
would be written with a double slash instead of the plus and
then slashes at the other hierarchical boundaries.:
</p>
<pre>
phone://1/617/253-5708
</pre>
<p>
(The dash here is used for decoration. In practice people
like to break telephone numbers up in various ways for
readability, even though the punctuation has no hierarchical
significance.) Of course, there could be other mappings used,
but let us look at how this particular mapping using the
slashes would be used in relative URIs. Suppose we are in a
context in which that telephone number is the default.
Suppose, for example we have declared that that number is the
absolute base telephone number ("base URL") within a
conversation: typically, we are talking to somebody who lives
in the same area code.
</p>
<p>
Using the relative URL pausing rules, we can refer to another
local telephone number simple as, for example, 861-5000 with
no punctuation. This is just what we do in practice. We can
refer to a telephone number within the same country as
/800/123-4567. Although these are not quite the conventions
currently used for telephone numbers, they are just as
compact as the various conventions of putting brackets around
the area code, and would probably be parsed correctly by a
human.
</p>
<p>
To indicate an international number we simply start with a
double slash. For example,
</p>
<pre>
phone://41/22/767-6111.
</pre>
<p>
Now, suppose instead we had used another system. We had just
decided that for consistency, we would simply use the plus
sign and for example, parenthesis around an area code. This
would mean that whereas you can use the conventions of simply
omitting the area code for the local telephone number, and
you could use a plus sign to indicate an international
telephone number. If you put <code>phone:</code> in front of
it, to have it correctly parsed by a URL parser, you would
always have to use the full international form. Now, there
may be some who would prefer to always see the full
international form in telephone numbers because telephone
numbers are of fairly limited length. However, the principle
of relative names or local telephone numbers being useful is
established beyond question. There is also perhaps as much
public use of the double slash in URIs as there is of the
plus sign in international telephone numbers. (Within the
United States of America there seem to be relatively few
people who understand the significance of the plus sign or
for that matter know what their country code is!).
</p>
<p>
So, in general when looking at new naming schemes which may
have a hierarchical nature we should regard the slash and the
double slash as common syntax. It may be that we can
transition to a shorter form in which, for example, a double
slash is assumed after the colon in a fully qualified URL in
order to address the worry that the URL syntax is clumsy when
you include the scheme name prefix. <i><br /></i>
</p>
<h4>
<a name="myth1" id="myth1">Myth:</a>
</h4>
<p class="axiom">
Myth: "The // must only be used to introduce a fully
qualified domain name."
</p>
<h4>
Grandfathering hierarchies: generalizing the scheme
</h4>
<p>
It is worth noting that the syntax with the double slash can
in fact be extended for use with a triple slash if one wanted
to be able to start at any level in a much more complicated
hierarchical structure. For example, suppose international
telephone numbers were to be extended to cover a planetary
code in the future. Then the planetary code could be attached
to the front of the international code. The triple slash
could introduce the interplanetary code, and the double slash
would introduce the international code. Indeed, this is how
the double slash came to be: when hierarchical naming schemes
such as those in unix file systems was extended to a networks
file system on the Apollo domain the extra slash was
introduced. Similarly, Microsoft NT networking now uses
double backslash in exactly the same way.
</p>
<p>
RFC1630 is an information RFC I wrote about URIs in WWW
because getting consensus on the philosophy of all this in
open forum ws going to take a long time at best. It contains
an algorithm for parsing relative URIs which in fact would
pause a relative URI in an environment with any arbitrary
number of consecutive slashes. (The only problem with this
scheme is, like others which use the same delimiter for
beginning and ending strings or that one cannot represent an
empty string. This is already a problem with the file syntax
when an empty string is used for the host name resulting in
three consecutive slashes.)<i><br /></i>
</p>
<p>
To quote RFC1630:
</p>
<blockquote>
If the scheme parts are different, the whole absolute URI
must be given. Otherwise, the scheme is omitted, and:
<p>
If the partial URI starts with a non-zero number of
consecutive slashes, then everything from the context URI
up to (but not including) the first occurrence of exactly
the same number of consecutive slashes which has no greater
number of consecutive slashes anywhere to the right of it
is taken to be the same and so prepended to the partial URL
to form the full URL. Otherwise:
</p>
<p>
The last part of the path of the context URI (anything
following the rightmost slash) is removed, and the given
partial URI appended in its place, and then:
</p>
<p>
Within the result, all occurrences of "xxx/../" or "/." are
recursively removed, where xxx, ".." and "." are complete
path elements.
</p>
</blockquote>
<p>
The algorithm may not be perfect in its handling of "." and
"..", but it applies to any numbler of slashes.
</p>
<h3>
<a name="matrix" id="matrix">Matrix spaces and Semicolons</a>
</h3>
<p>
There are a lot of web sites in which documents -- often
virtual document -- vary along several dimensions. They are
naturally arranged not on a tree but on a matrix. The URI for
a map, for example, might be:<i><br /></i>
</p>
<pre>
<i> //moremaps.com/map/color;lat=50;long=20;scale=32000<br />
</i>
</pre>
<p>
(I had an idea to make special form of relative URIs for
these. See <a href="MatrixURIs.html">Matrix URIs</a> for the
idea, not a feature of the web as of 2001.)
</p>
<h2>
<a name="Properties" id="Properties">The properties of
different URI schemes</a>
</h2>
<p>
As noted above, the concept of a URI itself does not define
the particular identity properties which exist between a URI
and the resource associated with it. The <a href=
"Axioms.html#Universality">axiom above</a> leaves the owner
of the URI to define it. However, different URI schemes are
defined and implemented in different ways, and this itself
can impose restrictions on the mapping.
</p>
<p>
Some of the properties of URI to resource mappings which vary
from space to space were discussed in RFC1630. Some schemes
(such as HTTP) leave answers up to the information publisher
(URI owner).
</p>
<p>
There is a lot of flexibility and growth to be gained by
allowing any sort of URI, not one from a particular scheme,
in most circumstances. Similarly, one should not make
assumptions about the schemes involved. This is a facet of
the particular parameters about how the technology is used.
The choice of type URI in a pracical use of a language is an
important flexibility point.
</p>
<table border="1">
<caption>
Comparison of some URI schemes
</caption>
<tbody>
<tr>
<th>
Scheme prefix
</th>
<th>
Identity relationship: what does the URI correspond to?
</th>
<th>
Reuse
</th>
<th>
Persistence
</th>
</tr>
<tr>
<td>
http
</td>
<td>
Geneneric document as :defined by publisher. Generic
URIs possible with content negotiation
</td>
<td>
defined by publisher
</td>
<td>
defined by publisher
</td>
</tr>
<tr>
<td>
ftp:
</td>
<td>
sequence of bits
</td>
<td>
defined by publisher
</td>
<td>
defined by publisher
</td>
</tr>
<tr>
<td>
uuid:
</td>
<td>
expectation of uniqueness has to be upheld by publisher
</td>
<td>
defined by publisher
</td>
<td>
(no dereference)
</td>
</tr>
<tr>
<td>
sha1:
</td>
<td>
sequence of bits.
</td>
<td>
mathematically extremely unlikely
</td>
<td>
(no dereference)
</td>
</tr>
<tr>
<td>
mid:
</td>
<td>
Email message. Should be 1:1 modulo recoding, and
header addition/deletion
</td>
<td>
Can happen after 2 years according to the spec, but
absolutely not recommended
</td>
<td>
(no derefernce)
</td>
</tr>
<tr>
<td>
mailto:
</td>
<td>
mailbox as used in email protocols
</td>
<td>
Socially unacceptable
</td>
<td>
(no dereference)
</td>
</tr>
<tr>
<td>
telnet:
</td>
<td>
connection endpoint for interactive login service
</td>
<td>
defined by publisher
</td>
<td>
(no dereference)
</td>
</tr>
</tbody>
</table>
<h3 id="it">
How not to do it
</h3>
<p>
Typical URI abuse by breaking this rule is occurs when a
document format provides one URI space for a "name"and one
for a "location".
</p>
<p>
<a href="uri1" urn="foo">
</p>
<p>
or for example the SGML reference to a "public identifier"
and a "system identifier".
</p>
<p>
The Web way is to have a reference to one URI. If in the same
document you want to incldue information such as other in
some ways equivalent identifiers, then you embed that in your
document as <a href="Metadata.html">metadata</a>, to be
discussed later.
</p>
<p>
That allows the exatct relationship to be expressed without
ambiguity, with much more pwer and generality, and with
consistency across applications.
</p>
<h3 id="also">
See also
</h3>
<ul>
<li>
<a href="/Provider/Style/URI.html">Cool URIs don't
change</a> - persistence in the HTTP space
</li>
<li>
<a href="AxiomsHAF">Hall of Flame</a> -- how not to do it
</li>
</ul>
<hr />
<p>
<small>$Id: Axioms.html,v 1.36 2009/03/17 17:25:35 timbl Exp
$</small>
</p>
<p>
<a href="Fragment.html">Next: Fragmement Identifiers</a>
</p>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
</body>
</html>