Evolution.html
53.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
The Evolution of a specification -- Commentary on Web
architecture
</title>
<link rel="stylesheet" href="di.css" type="text/css" />
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
</head>
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
<address>
Tim Berners-Lee
<p>
Date: March 1998. Last edited: $Date: 2009/08/27 21:38:07 $
</p>
<p>
Status: . Editing status: incomplete first draft. This
explains the rationale for XML namespaces and RDF schemas,
and derives requirement on them from a discussion of the
process by which we arrive at standards.
</p>
</address>
<p>
<a href="./">Up to Design Issues</a>
</p>
<h3>
Commentary
</h3>
<p>
<i>(These ideas were mentioned in a <a href=
"../Talks/1998/0415-Evolvability/slide1-1.htm">keynote on
"Evolvability"</a> at WWW7 and this text follows closely
enough for you to give yourself the talk below using those
slides. More or less. If and when we get a video from WWW7 of
the talk, maybe we'll be able to serve that up in
parallel.)</i>
</p>
<hr />
<h1>
Evolvability
</h1>
<h3>
<a name="Introduction" id="Introduction">Introduction</a>
</h3>
<p>
The World Wide Web Consortium was founded in 1994 on the
mandate to lead the <b>Evolution</b> of the Web while
maintaining its <b>Interoperability</b> as a universal space.
"Interoperability" and "Evolvability" were two goals for all
W3C technology, and whilst there was a good understanding of
what the first meant, it was difficult to define the second
in terms of technology.
</p>
<p>
Since then W3C has had first hand experience of the tension
beween these two goals, and has seen the process by which
specifications have been advanced, fragmented and later
reconverged. This has led to a desire for a technological
solution which will allow specifications to evolve with the
speed and freedom of many parallel deevlopments, but also
such that any message, whether "standard" or not, at least
has a well defined meaning.
</p>
<p>
There have been technologies dubbed "futureproof" for years
and years, whether they are languages or backplane busses.
I expect you the reader to share my cynicism when
encountering any such claim. We must work though
exactly what we mean: what we expect to be able to do which
we could not do before, and how that will make evolution more
possible and less painfull.
</p>
<h2>
<a name="Free" id="Free">Free extension</a>
</h2>
<p>
A rule explicit or implcit in all the email-like Internet
protocols has always been that if you found a mail header (or
something) which you did not understand, you should ignore
it. This obviously allows people to add all sorts of records
to things in a very free way, and so we can call it the rul
of free extension. It has its advatage of rapid prototyping
and incremental deployment, and the disadvantage of
ambiguity, confusion, and an inability to add a mandatory
feature to an existing protocol. I adopeted the rule for HTML
when initially designing it - and used it myself all the
time, adding elements one by one. This is one way in which
HTML was unlike a conventional SGML application, but it
allowed the dramatic development of HTML.
</p>
<h3>
<a name="cycle" id="cycle">The HTML cycle</a>
</h3>
<p>
The development of HTML between 1994 and 1998 took place in a
cycle, fuelled by the tension between the competitive urge of
companies to outdo each other and the common need for
standards for moving forward. The cycle starts simply simply
bcause the HTML standard is open and usable by anyone: this
means that any engineer, in any company or waiting for a bus
can think of new ways to extend HTML, and try them out.
</p>
<p>
The next phase is that some of these many ideas are tried out
in prototypes or products, using the fact free extension rule
that any unrecongined extensiosn will be ignored by
everything which does not understand them. The result is a
drmatic growth in features. Some of these become product
differentiators, during which time their originators are loth
to discuss the technology with the competition. Some features
die in the market and diappear from the products. Those
successful features have a fairly short lifetime as product
differetiators, as they are soon emulated in some equivalent
(though different) feature in competeing products.
</p>
<p>
After this phase of the cycle, there are three or four ways
of doing the same thing, and engineers in each company are
forced to spend their time writing three of four different
versions of the same thing, and coping with the software
architectural problems which arise from the mix of different
models. This wastes program size, and confuses users. In the
case for example, of the TABLE tag, a browser meeting one in
a document had no idea which table extension it was, so the
situation could become ambiguous. If the interpretation of
the table was important for the safe interpretation ofthe
document, the server would never know whether it had been
done, as an unaware client would blithely ignore it in any
case. This internal software mess resulting from having to
implement multiple models also threatens future deevlopment.
It turns the stable consistent base for future development
into something fragmented and inconsistent: it is difficult
to design new features in such an environment.
</p>
<p>
Now the marketting pressure is off which prevented
discussions, and there is a strong call for the engineers to
get around the W3C table, and iron out a common way of doing
things. As this happens, a system is designed which puts
together the best aspects of each other system, plus a few
weeks experience, so everyone is in the end happier with the
result. The companies all go away making public promises to
implement it, even though the engineering staff will be under
pressure to add the next feature and startthe next cycle. The
result is published as a common specification opene to anyone
to implement. And so the cycle starts again.
</p>
<p>
This is not the way all W3C activities have worked, but it
was the particular case with HTML, and it illustrates some of
the advantages and disadvantages with the free extenstion
rule.
</p>
<h3>
<a name="Breaking" id="Breaking">Breaking the cycle</a>
</h3>
<p>
The HTML cycle as a method of arriving at consensus on a
document has its drawbacks. By 1998, there were reasons to
change the cycle.The work in the W3C, which had started off
in 1994 with several years backlog of work, had more or less
caught up, and was begining to lead, rather than trail,
developments. The work was seen less as fire fighting and
more as consolitation. By this time the spec was growing to a
size where the principle of modularity was seriously
flaunted. Any new developments clearly had to be seperate
modules. Already style information had been moved out into
the Cascading Style Sheets language, the programming
interface work was a seperate Document Object Model activity,
and guidelines for accessibility were tackled by a seperate
group.
</p>
<p>
Inthe future it was clear that we needed somehow to set up a
modular system which would allow one to add to HTML new
standard modules. At the same time, it was clear that with
XML available as a manageble version of SGML as a base for
anyone to define their own tag sets, there was likely to be a
deluge of application-specific and industry-specific XML
based languages. The idea of all this happening underthefree
extension rule was frightening. Most applications would
simply add new tags to HTML. If we continued the process of
retrospectively roping into a new bigger standard, the
document would grow without limit and become totally
unmanageble. The rule of free extesnion was no longer
appropriate.
</p>
<h1>
<a name="wdi" id="wdi">Well defined interfaces</a>
</h1>
<p>
Now let us compare this situation with the way development
occus in the world of distributed computing, specifically
remote rpocedure call (RPC) and distributed object oriented
systems. In these systems, the distributed system (equivalent
to the server plus the client for the web) is viewed as a
single software system which happens to be spread over
several physical machines. [nelson - courier, etc]
</p>
<p>
The network protocols are defined automatically as a function
of the software interfaces which happen to end up being
between modules on different machines. Each interface, local
or remote, has a well documented structure, and the list of
functions (procedures, methods or whatever) and parameters
are defined in machine-processable form. As the system is
built, the compiler checks that the interfaces required by
one module is exactly provided by another module. The
interface, in each version of its development, typically has
an identifying (typically very long) unique number.
</p>
<p>
The interface defines the parameters of a remote call, and
therefore defines exactly what can occur in a message from
one module to another. There is no free extension. If the
interface is changed, and a new module made, any module on
the other side of the interface will have to be changed too,
or you can't build the system.
</p>
<p>
The great advantage of this is that when the system has been
built, you expect it to work. There is no wondering wether a
table is being displayed - if you have called the table
module, you know exactly what the module is supposed to do,
and there is no way the system could be without that module.
Given the chaos of the HTML devleopment world, you can
imagine that many people were hankering after the well
defined interfaces of the distributed computing technology.
</p>
<p>
With well-defined interfaces, either everything works, or
nothing. This was in fact at least formally the case with
SGML documents. Each had a document type definition (DTD)
refered to at the the top, which defiend in principle exactly
what could and could not be in the document. PICS labels were
similar in that thet are self-describing: they actually have
a URI atthe top which points to a machine-readable
description of what can and can't be in athat PICS label.
When you see one of these documents, as when you get an RPC
mesaage with an interface number on it, you can check whether
you understand the interface or not. Another intersting thing
you can do, if you don't have a way of processing it, is to
look it up in some index and dynamically download the code to
process it.
</p>
<p>
The existence of the Web makes all this much smoother:
instead of inventing arbitrary names for inetrfaces, tyou can
use a real URI which can be dereferenecd and return the
master definition of the interface in real time. The Web can
become a decentralised registray of interfaces (languages)
and code modules.
</p>
<p>
The need was clearly for the best of both worlds. One must be
able to freely extend a language, but do so with an extension
language which is itself well defined. If for example,
documents which were HTML 2.0 plus Netscape's version of
tables version 2.01 were identified as such, mcuh o the
problem of ambiguity would have been resolved, but the rest
ofthe world left free to make their own table extensions.
This was the goal of the namespaces work in XML.
</p>
<h3>
<a name="ModularityInHTML" id="ModularityInHTML">Modularity
in HTML</a>
</h3>
<p>
To be able to use the namespaces work in the extension of
HTML, HTML has to transition from being an SGML application
(with certain constraints) to being an XML based langauge.
This will not only give it a certain ease of parsing, but
allow it to build on the modularity introduced by namespaces.
</p>
<p>
In fact, already in April of 1998 there was a W3C
Recommendation for "MathML", defined as as XML langauge and
obviously aimed at being usable in the context of an HTML
document, but for which there was no defined way to write a
combined HTML+MathML document. MathML was already waiting for
XML namespaces.
</p>
<p>
XML namespaces will allow an author (or authoring tool,
hopefully) to declare exactly waht set of tags he orshe is
using in a document. Later, schemas should allow a browser to
decide what to do as a fall back when finding vocabulary
which it does not understand.
</p>
<p>
It is expected that new extensions to HTML be introduced as
namespaces, possibly languages in their own right. The intent
is that the new languages, where appropriate, will be able to
use the existing work on style sheets, such as CSS, and the
existing DOM work which defines a programming interface.
</p>
<h2>
<a name="Mixing" id="Mixing">Language mixing</a>
</h2>
<p>
Language mixing is an important facility, for HTML, for the
evolution of all other Web and application technology. It
must allow, in a mixed labguage document, for both langauges
to be well defined. A mixed langage document is quiote
analogous to a program which makes calls to two runtime
libraries, so it is not rocket science. It is not like an RPC
message, which in most systems is very strongly ytped froma
single rigid definition. (An RPC message can be represented
as a structured document but not, in general, vice-versa)
</p>
<p>
Language mixing is a realtity. Real HTML pages are often HTML
with Javascript, or HTML plus CSS, or both. They just aren't
declared as such. In real life, many documents are made from
multiple vocabularies, only some of which one understands. I
don't understand half the information in the tax form - but I
know enough to know what applies to me. The invoice is a good
example. Many differet coloured copies of the same document
used to serve as a packing list, restocking sheet, invoice,
and delivery note. Different parts of a company would
understand different bits: the financial dividion woul dcheck
amounts and signatures, the store would understand the part
numbers, and the sales and marketting would define dthe
relationship betwene the part numbers and prices.
</p>
<p>
No longer can the Web tolerate the laxness which HTML and
HTTP have been extended. However, it cannot constrain itself
to a system as rigid as a classical disributed object
oriented system.
</p>
<p>
The <a href="Extensible.html">note on namespaces</a> defines
some requirements of a language framework which allows new
schmata to be developed quite independently, and mixed within
one document. This note elaborates on the sorts of things
which have to be possible when the evolution occurs.
</p>
<h3>
<a name="power" id="power">The Power of schema languages</a>
</h3>
<p>
You may notice than nowhere in the architecture do XML or RDF
specify what language the schema should be written in. This
is because much of the future power of the system will lie in
the power of the schema and related documents, so it
isimportant to leave that open as a path for the future. In
the short term, yo can think of a schema being written in
HTML and english. Indeed, this is enough to tie the
significance of documents written in the schema to the law of
the land and mkae the document an effective part of serious
commercial or other social interaction. You can imagine a
schema being in a sort of SGML DTD language which tells a
computer program what constraints there are on the structure
of documents, but nothing about their meaning. This allows a
certain crude validity check to be made on a document but
little else.
</p>
<p>
Now let us imagine further power which we could put into a
schema language.
</p>
<h2>
<a name="PartialUnderstanding" id=
"PartialUnderstanding">Partial Understanding</a>
</h2>
<p>
A crucial first milestone for the system is partial
understanding. Let's use the scenario of an invoice, like the
<a href="Extensible.html#Scenario">scenario in the
"Extensible languages" note</a>. An invoice refers to two
schemata: one is a well-known invoice schema and the other a
proprietory part number schema. The requirement is that an
invoice processing program can process the invoice without
needing to understand the part description.
</p>
<p>
Somehow the program must find out that the invoice is from
its point of view just as valid as an invoice with the
details fo the part description stripped out.
</p>
<h3>
<a name="Optional" id="Optional">Optional parts</a>
</h3>
<p>
One possibility is to mark the part description as "optional"
on the text. We could imagine a well-known way of doing this.
It could be done in the document itself [as usual, using an
arbitrary syntax:]
</p>
<pre>
<item>
<partnumber>8137498237</>
<optional>
<xml:using href="http://aeroco.com/1998/03/sy4" as="A"><br />
<a:partdesc>
...
<a:partdesc>
</xml:using><br />
</opional>
</item>
</pre>
<p>
There are problems with this. One is that we are relying on
the invoice schema to define what in invoice is and isn't and
what it means. It would be nice if the designer of the
invoice could say whether the item should contain a part
description of not, or whether it is possible to add things
into the item description or not. But in general if there is
something to be said we like to allow it to be said anywhere
(like metadata). But for the optionalness to be expressed
elsewhere would save the writer of every invoice the bother
of having to explicitly.
</p>
<h3>
<a name="Partial" id="Partial">Partial Understanding</a>
</h3>
<p>
The other more fundamental problem is that the notion of
"optional" is subjective. We can be more precise about
"partial understanding" by saying that the invoice processing
system needs to convert the document which contains things it
doesn't understand into a document which it does completely
understand: a valid invoice. However, another agent may which
to convert the same detailed invoice into, say, a delivery
note: in this case, quite different information would be
"optional".
</p>
<p>
To be more specific, then, we need to be able to describe a
transformation from one document to another which preserves
"valididy" in some sense. A simple form of transformation is
the removal of sections, but obviously there can be all kinds
of level of transformation language ranging from the cudest
to theturing complete. Whatever the language, statement that
given a document x, that some f(x) can be deduced.
</p>
<h3>
<a name="Least" id="Least">Principle of Least Power</a>
</h3>
<p>
In practice, this suggest that one should leave the actual
choice of the transformation language as a flexibility point.
However, as with most choices of computer language, the
general "principle least power" applies:
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
When expressing something, use the least powerful
language you can.
</td>
</tr>
</tbody>
</table>
<p>
<i>(@@justify in greater depth in footnote)</i>
</p>
<p>
While being able to express a very complex function may feel
good, the result will in general be less useful. As Lao-Tse
puts it, "<a href="Evolution.html#within">Usefulness from
what is not there</a>". From the point of view of translation
algorithms, one usefulness is for them to be reversible. In
the case in which you are trying to prove something (such as
access to a web site or financial credibility) you need to be
able to derive a document of a given form. The rules you use
are the pieces of the web of trust and you are looking for a
path through the web of trust. Clearly, one approach is to
enumerate all the things which can be deduced from a given
document, but it is faster to have an idea of which
algorithms to apply. Simple ones have input and output
patterns. A deletion rule is a very simple case
</p>
<p align="center">
s/.*foo.*/\1\2/
</p>
<p>
This is stream editor languge for "Remove "foo" from any
string leaving what was on either side". If this rule is
allowed, it means that "foo"is optional. @@@ to be continued
</p>
<p>
Optional features and Partial Understanding
</p>
<ul>
<li>Goal: V1 software partially understands V2 document
</li>
<li>Optional features visible as such
</li>
<li>Example: "Mandatory" Internet Draft
</li>
<li>Example: SMIL (P.Rec. 1998/4/9)
</li>
<li>Conversion from unknown language to known language.
</li>
</ul>
<h1>
<a name="ToII" id="ToII">Test of Independent Invention</a>
</h1>
<p>
The test of independent invention is a thought experiment
which tests one aspect of the quality of a design. When you
design something, you make a number of important
architectural decisions, such as how many wheels a car has,
and that an arch will be used between the pillas of the
vault. You make other arbitrary decisions such as the color
of the car, the side of the road everyone will drive, whether
to open the egg at the big end or the little end.
</p>
<p>
Suppose it just happens that another group is designing the
same sort of thing, tackling the same problem, somewhere
else. They are quite unknown to you and you to them, but just
suppose that being just as smart as you, they make all the
same important archietctural decisions. This you can expect
if you believe hat these decisions make logical sense.
Imagine that they have the same philosophy: it is largely the
philosophy which we are testing. However, imagine that they
make all the arbitrary decisions differently. They complement
bit 7. They drive on the other other side of the road. They
use red buoys on the starbord side, and use 575 lines per
screen on their televisions.
</p>
<p>
Now imagine that the two systems both work (locally), and
being usccessful, grow and grow. After a while, they meet.
Suddenly you discover each other. Suddenly, people want to
work across both systems. They want to connect two road
systems, two telephone systems, two networks, two webs. What
happens?
</p>
<p>
I tried originally to make WWW pass the test. Suppose someone
had (and it was quite likely) invented a World Wide Web
system somewhere else with the same principles. Suppose they
called it the Multi Media Mesh <sup>(tm)</sup> and based it
on Media Resource Identifiers<sup>(tm)</sup>, the MultiMedia
Transport Protocol<sup>(tm)</sup>, and a Multi Media Markup
Language<sup>(tm)</sup>. After a few years, the Web and the
Mesh meet. What is the damage?
</p>
<ul>
<li>A huge battle, involving the abandonment of projects,
conversion or loss of data?
</li>
<li>Division of the world by a border commission into two
separate communities?
</li>
<li>Smooth integration with only incremental effort?
</li>
</ul>
<p>
(see also <a href="../People/Berners-Lee/UU.html">WWW and
Unitarian Universalism</a>)
</p>
<p>
Obviously we are looking for the latter option. Fortunately,
we could immediately extend URIs to include "mmtp://" and
extend MRIs to include "http:\\". We could make gateways, and
on the better browsers immediately configure them to go
through a gateway when finding a URI of the new type. The URI
space is universal: it covers all addresses of all accessible
objects. But it does not have to be the only universal space.
Universal, but not unique. We could add MMML as a MIME type.
And so on. However, if we required all Web servers to
synchronise though one and only one master lock server in
Waltdorf, we would have found the Mesh required
synchronisation though a master server in Melbourne. It would
have failed.
</p>
<p>
No system completely passes the ToII - it is always some
trouble to convert.
</p>
<h3>
<a name="real" id="real">Not just a thought experiment</a>
</h3>
<p>
As the Web becomes the basis for many many applications to be
build on top of it, the phenomenon of independent invention
will recur again and again. We have to build technology so as
to make it easy for systems to pass the test, and so survive
real life in an evolving world.
</p>
<p>
If systems cannot pass the TOII, then we can only achieve
worldwide interoperability when one original design has
originally beaten the others. This can happen if we all sit
down together as a worldwide committee and do a "top
down"design of the whole thing before we start. This works
for a new idea but not for the automation of something which,
like pharmacy or trade, has been going on for centuries and
is just being represented in the Semantic Web. For example,
the library community has had endless trouble trying to agree
on a single library card format (MARC record) worldwide.
</p>
<p>
Another way it can happen is if one system is dropped
completely, leading to a complete loss of the effport put
into it. When in the late 1980s Europe eventually abandoned
its suite of ISO protocols for networking because they just
could not interwork with the Internet, a huge amount of work
was lost. Many problems, solved in Europe but not in the US
(including network addresses of more than 32 bits) had to be
solved again on the Internet at great cost. Sweden actually
changed from driving on the left to driving on the right. All
over the world, people have changed word processor formats
again and again but only at the cost of losing access to huge
amounts of legacy information. The test of independent
invention is not just a thought experiment, it is happening
all the time.
</p>
<h1>
<a name="requirements" id="requirements">From philosophy to
requirement</a>
</h1>
<p>
So now let us get more specific about what we really need in
the underlying technology of the Semantic Web to allow
systems in the future to pass the test of independent
invention.
</p>
<h3>
<a name="smarter" id="smarter">We will be smarter</a>
</h3>
<p>
Our first assumption is that we will be smarter in the
future. This means that we will produce better systems. We
will want to move on from version 1 to version 2, from
version n to version n+1.
</p>
<p>
What happens now? A group of people use version 4 of a word
process and share some documents. One touches a document
using a new version 5 of the same program. Oen of the other
people tries to load it using version 4 of the software. The
version 4 program reads the file, and find it is a version5
file. It declares that there is no way it can read the
file,as it was produced in the future, and there is no way it
can predict the future to know how to read a version 5 file.
A flag day occurs: everyone in the group has to upgrade
immediately - and often they never even planned to.
</p>
<p>
So the first requirement is for a version 4 program to be
able to read a version 5 file. Of course there will be some
features in version 5 that the version 4 program will not be
able to understand. But most of the time, we actually find
that what we want to achieve can be done by partial
understanding - understanding those parts of the document
which correspond to functions which exist in version 4. But
even though we know partial understanding would be
acceptable, with most systems we don't know how to do even
that.
</p>
<h3>
<a name="others" id="others">We are not the smartest</a>
</h3>
<p>
The philosophical assumption that we may not be smarter than
everyone else (a huge step for some!) leads us to realise
that others will have gret ideas too, and will independently
invent the same things. It forces us to consider the test of
independent invention.
</p>
<p>
The requirement for the system to pass the ToII is for one
program which we write to be able to read somehow (partially
if not totally) data written by the program written by the
other folks. This simple operation is the key to
decentralised evolution of our technology, and to the whole
future of the Web.
</p>
<p>
So we have deduced two requirements for the system from our
simple philosophical assumptions:
</p>
<ul>
<li>We will be smarter in the future
<ul>
<li>Technology: Moving Version 1 to Version 2
</li>
</ul>
</li>
<li>We are not smarter than everyone else
<ul>
<li>Decentralized evolution
</li>
<li>Technology: Moving between parallel Version A and
Version B
</li>
</ul>
</li>
</ul>
<h3>
<a name="sofar" id="sofar">The story so far</a>
</h3>
<p>
We are we with the requirements for evolvability so far? We
are looking for a tecnology which has free but well defined
extension. We want to do it by allowing documents to use
mixed vocabularies. We have already found out (from PICS work
for example) that we need to be abl eto know whether
extension vocabulary is mandatory or can be ignored. We want
to use the Web for any registry, rather than any central
point. The technology has to be allow an application to be
able to convert the output of a future version of itself, or
the output of an equivalent program written indpendently,
into something it can process, just by looking up schema
information.
</p>
<h2>
<a name="data" id="data">Evolution of data</a>
</h2>
<p>
Now let us look at the world of data on the Web, the <a href=
"Semantic.html">Semantic Web</a>, which I expect we expect to
become a new force in the next few years. By "data" as
opposed to "documents", I am talking about information on the
Web in a form specifically to aid automated processing rather
than human browsing. "Data" is characterised by infomation
with a well defined strcuture, where the atomic parts have
wel ldefined types, such as numbers and choices from finite
sets. "Data", as in a relational database, normally has well
defined meaning which has rarely been written down. When
someone creates a new databse, they have to give the data
type of each column, but don't have to explain what the field
name actually means in any way. So there is a well defined
semantics but not one which can be accessed. In fact, the
only time you tells the machine anything about the semantics
is when you define which two columns of different tables are
equivalent in some way, so that they can be used for example
as the basis for joining the two databases. (That the meaning
of data is only defined relative to the meaning of other data
is of course quite normal - we don't expect machines to have
any built in understanding of what "zip code" might mean
apart from where you can read it and write it and what you
can compare it with). Notice that what happens with real
databases is that they are defined by users one day, and they
evolve. They are rarely the result of a committee sitting
down and deciding on a set of concepts to use across a
company or an industry, and then designing the data schema.
The schema is craeted on the fly by the user.
</p>
<p>
We can distinguish two ways in which tha word "schema" has
been used:
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
Syntactic Schema: A document, real or imagined, which
constrains the structure and/or type of data. <i>(pl.:
Schemata)</i>.
</td>
</tr>
</tbody>
</table>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
Semantic schema: A document, real or imagined, which
defines the infereneces from one schema to another,
thus defining the semantics of one syntactic schema in
terms of another.
</td>
</tr>
</tbody>
</table>
<p>
I will use it for the first only. In fact, a syntactic schema
dedfines a class of document, and often is accompanied by
human documentation which provides some rough semantics.
</p>
<p>
There is a huge amount ("legacy" would unfairly suggest
obsolescence) of data in relational databases. A certain
amount of it is being exported onto the web as virtual
hypertext. There are many applications which allow one to
make hypertext views of difeferent aspects of a database, so
that each server request is met by performing adatabse query,
and then formatting the result as a report in HTML, with
appropriate style and decoration.
</p>
<h2>
Data about data: Metadata
</h2>
<p>
Information about information is interesting in two ways.
Firstly, it is interesting because the Web society
desperately needs it to be able to manage social aspects of
information such as endorsement (PICS labels, etc), ownership
and access rights to information, privacy policies (P3P,
etc), structuring and cataloguing information and a hundred
otehr uses which I will not try to ennumerate. This first
aspect is discussed elsewhere. (See <a href=
"http://www.w3.org/DesignIssues/Metadata.html">Metadata
architecture</a> about general treatment of metadata and
labels, and the <a href="../TandS/Overview.html">Technology
and Society domain</a> for overveiw of many of the social
drivers and related projects and technology)
</p>
<p>
The second interest in metadata is that it is data. If we are
looking for a language for putting data onto the Web, in a
machine understandable way, then metadata happens to be a
first application area. Also, because metadat ais fundamental
to most data on eth web, it is the focus of W3C effort, while
many other forms of data are regarded as applications rather
than core Web archietcure, and so are not.
</p>
<h3>
Publishing data on the web
</h3>
<p>
Suppose for example that you run a server which provides
online stock prices. Your application which today provides
fancy web pages with a company's data in text and graphs (as
GIFs) could tomorrow produce the same page as XML data, in
tabular form, for machine access. The same page could even be
produced at the same URL in two formats using content
negotiation, or you could have a typed link between the
machine-understandable and person-understandable versions.
</p>
<p>
The XML version contains at the top (or soemewhere) a pointer
to a schema document. This poiner makes the document
"self-describing". It is this pointer which is the key to any
machine "understanding" of the page. By making the schema a
first class object, in other words by giving its URL and
nothing else, we are leaving the dooropen to many
possibilities. Now it is time to look at the various sorts of
schema document which it could point to.
</p>
<h2>
Levels of schema language
</h2>
<p>
Computer languags can be classified into various types, with
various capabilities, and the sort we chose for the schema
document, and information we allow the schema fundamentally
affects not just what the semantic web can be but, more
importantly, how it can grow.
</p>
<p>
The schema document can, broadly, be one of the following:
</p>
<ol>
<li>Notional only: imaginary, non-existent but named.
</li>
<li>Human readable
</li>
<li>Machine-understandable and defining structure
</li>
<li>Machine-understandable and slo which are optional parts
</li>
<li>A Turing-complete recipe for conversion into othr
langauges
</li>
<li>A logical model of document
</li>
</ol>
<p>
We'll go over the pros and cons of each, because none of
these should be overlooked, but some are often way better
than others.
</p>
<h3>
Schema 1: URI only
</h3>
<ul>
<li>No supporting documentation
</li>
<li>Allows compatibility yes/no test
</li>
</ul>
<p>
This may sound like a silly trivial example, but like many
trival examples, it is not silly. If you just name your
schema somewhere in URI space, then you have identified it.
This deosn't offer a lot of help to anyone to find any
documentation online, but one fundamental function is
possible. Anyone can check compatability: They can compare
the schema against a list of schemata they do understand, and
return yes or no.
</p>
<p>
In fact, they can also se an idnex to look up information
about the schema, including ifnromation about suitable
software to download to add understanding of the document. In
fact this level is the level which many RPC systems use: the
interface is given a unique but otherwise random number which
cannot be dereferenced directly.
</p>
<p>
So this is the level of machine-understanding typical of
distributed ocmputing systems and should not be
underestimated. There are lot sof parts of URI space you can
use for this: yo might own some http: space (but never
actually serve the document at that point) , but if you
don't, you can always generate a URI in a mid: ro cid: space
or if desperate in one of the hash spaces.
</p>
<h3>
Schema option 2: Human readable
</h3>
<p>
The next step up from just using the Schema identifier as a
document tyope identifier is to make that URI one which will
dereference to a human-readable document. If you're a
computer, big deal. But as well as allowing a strict
compatiability test (test for equality of the schema URI),
this also allows human beings to get involed if ther is any
argument as to what a document means. This can be signifiant!
For example, the schema could point to a complete technical
spec which is crammed with legalese about what the document
does and does not imply and commit to. At the end of the day,
all machine-understandable descriptions of documents are all
very well, but until the day that they bootstrap themselves
into legality, they must all in the end be defined in terms
of human-readable legalese to have social effect. Human
legalese is the schema language of our society. This is level
2.
</p>
<h3>
Schema option 3: Define structure
</h3>
<p>
Now we move into the meat of the schema system when we start
to discuss schema documents which are machine readable. now
we are satrting to enable some machine understanding and
automatic processing of document types which have not been
pre-programmed by people. Ça commence.
</p>
<p>
The next level we conside is that when your brower (agent,
whatever) dereferences the namespace URI, it find a schema
which defines the structure of the document. this is a bit
like an SGML Doctument type Definition (DTD). It allows you
to do everything which the levels 1 and 2 allowed, if it has
sufficient comments in it to allow human arguments to be
settled.
</p>
<p>
In addition, a system which has a way of defineing structure
allows everyone to have one and only one parser to handle all
manner of documents. Any document coming across the threshold
can be parse into a tree.
</p>
<p>
More than that, it allows a document o be validated against
allowed strctures. If a memeo contains two subject fields, it
is not valid. Tjis is one fo the principal uses of DTDs in
SGML.
</p>
<p>
In some cases, there maybe another spin-off. You canimagine
that if the schema document lists the allwoed structrue of
the document, and the types (and maybe names) of each
element, then this would allow an agent to construct on the
fly a graphic user interafce for editing such a document.
This was theintent with PICS rating systems: at least, a
parent coming across a new rating system would be be given a
ahuman-readable descriptoin of the various parameters and
would be able to select
</p>
<h3>
Schema option 4: Structure + Optional flags
</h3>
<p>
The "optional" flag is a term I use here for a common crucial
step which can make the difference between chaos and smooth
evolution. All you need to do is to mark in the schema of a
new version of the language which elements of the langauge
can be ignored if you don't understand them. This simple step
allows a processor which handled the old language, giventhe
schema of the new langauge, to filter it so as to produce a
document it can legitimately understand.
</p>
<p>
Now we have a technology which ahs all the benefits to date,
plus it can handle that elusive <strong>version 2 to version
1 conversion</strong> problem!
</p>
<h3>
Schema option 5: Turning complete language
</h3>
<p>
Always in langauges there is the balance between the
declarative limited langauge, whose foprmulae can be easily
manipulated, and the powerful programming language whose
programs cannot be analyzed in general, but which have to be
left to run to see what they do. Each end of the spectrum has
its benefits. In describing a lanuage in terms of another,
one way is to provide a black box program, say in Java or
Javascript, which will convert from one to the other.
</p>
<p>
Filters written in turing-complete languages generally have
to be trusted, as you can't see what rules they are based on
by looking at them. But they can do weird and wonderful
things. (They can also crash and loop forever of course!).
</p>
<p>
A good language for conversion from one XML-based language to
another is XSL. It lstarted off as a template-like system for
building one document from another (and can be very simple)
but is in fact Turning-complete.
</p>
<p>
When you do publish a program to convert language A to
language B, then anyone who trusts it has that capability. A
disadvantage is that they never know how it works. You can't
deduce things about the individual components of the
languages. You can't therefore infer much indirectly about
relationships to other languages. The only way such a filter
can be used is to get whatever you have into language A and
then put it though the filter. This might be useful. But it
isn't as fascinating as the option of blowing language A
open.
</p>
<h3>
Schema option 6: Expose logic of document
</h3>
<p>
What is fundamentally more exciting is to write down as
explicitly as posible wahteth new language means. Sorry, let
me take that back, in case you think that I am talking about
some absulte meaning of meaning. If you know me, I am not.
All I mean is that we write in a machine-processable logical
way the equivalences and conversions which are possible in
and out of language A from other languages. And other
languages.
</p>
<p>
A specific case of course, is when we document the
relationship betwen version 2 and version 1. The schema
document for version 2 could explain that all the terms are
synonyms, except for some new terms which can be converted to
nothing (ie are optional) and some which affect the meaning
of the document completely and so if you don't understand
them you are stuck.
</p>
<p>
In a more general case, take a language like iCalendar in RDF
(were it in RDF), which is for describing events as would be
in a personal organizer. A schema for the language might
declare equivalences betwen a calendar's concept of group
MEMBER ship and an access control system's concept of group
membership; it might declare the equivalence of eth concept
of LOCATION to be the text description of a Geographical
Information Systems standard's location, and it may declare
an INDIVIDUAL to be a superset of the HR department's concept
of employee. These bits of information of the stuff of the
semantic web, as they allow inference to stretch across the
gloabe and conclude things which we knew as whole but no one
person knew. This is what RDF and the Semnatic Web logic
built on top of it is all about.
</p>
<hr />
<p>
So, what will semantic web engine be able to do? They will
not all have the same inference abilities or algorithms. They
will share a core concept of an RDF statement - an assertion
that a given <em>resource</em> has a <em>property</em> with a
given <em>value</em>. They will use this as a common way of
exchanging data even when their inference rules are not
compatible. An agent will be able to read a document in a new
version of a language, by looking up on the web the
relationship with the old version that it can natively read.
It will be able to combine many documents into a single graph
of knowledge, and draw deductions from the combination. And
even though it might not be able to find a proof of a given
hypothesis, when faced with an elaborated proof it will be
able to check its veracity.
</p>
<p>
At this stage (1998) we need relational database experts in
the XML and RDF groups, [2000 -- include ontology and
conceptual graph and knowledge representation experts].
</p>
<h2 id="maps">
Evolvability in the real world
</h2>
<p>
Examples abound of language mixing and evolution in the real
world which make the need for these capabilities clear. There
is a great and unused overlap in the concepts used by, for
example, personal information managers, email systems, and so
on. These capabilities would allow information to flow
between these applications.
</p>
<p>
You just have to look at the history of a standard such as
MARC record for library information to see that the tension
between agreeing on a standard (difficult and only possible
for a common subset) and allowing variations (quick by not
interoperable) would be eased by allowing language mixing. A
card could be written out in a mixture of standard and local
terms.
</p>
<p>
The real world is full of times when conventions have been
developed separately and the relationships have been deduced
afterward: hence the market for third party converters of
disk formats, scheduler files, and so on.
</p>
<h1>
<a name="Engines" id="Engines">Engines of the future</a>
</h1>
<p>
I have left open the discussion as to what inference power
and algorithms will be useful on the semantic web precisely
because it will always be an open question. When a language
is sufficiently expressive to be able to express teh state of
the real world and real problems then there will be no one
query engine which will be able to solve real problems.
</p>
<p>
We can, however, guess at how systems might evolve. No one at
the beginning of the Web foresaw the search engines which
could index almost all the web, so these guesses may be very
inaccurate!
</p>
<p>
We note that logical systems provide provably good answers,
but don't scale to large problems. We see that search
engines, remarkably, do scale - but at the moment produce
very unreliable answers. Now, on a semantic web we can
imagine a combination of the two. For example, a search
engine could retrieves all the documents which reference the
terms used in the query, and then a logical system act on
that closed finite world of information to determine a
reliable solution if one exists.
</p>
<p>
In fact I thing we will see a huge market for interesting new
algorithms, each to take advantage of particular
characteristics of particular parts of the Web. New
algorithms around electronic commerce may have directly
beneficial business models, to there will be incentive for
their development.
</p>
<p>
Imagine some questions we might want to ask an engine of the
future:
</p>
<ul>
<li>Can Joe access the party photos?
</li>
<li>Who are all the people who can?
</li>
<li>Is there a green car for sale for around $15000 in
Queensland?
</li>
<li>Did someone driving a blue car send us an invoice for
over $10000?
</li>
<li>What was the average temperature in 1997 in Brisbane?
</li>
<li>Please fill in my tax form!
</li>
</ul>
<p>
All these involve bridging barriers between domains of
knowledge, but they do not involve very complex logic --
except for the tax form, that is. And who knows, perhaps in
the future the tax code will have to be presented as a
formula on the semantic web, just as it is expected now that
one make such a public human-readable document available on
the Web.
</p>
<h2 id="Conclusion">
Conclusion
</h2>
<p>
There are some requirements on the Semantic Web design which
must be upheld if the technology is to be able to evolve
smoothly. They involve both the introduction of new versions
of one language, and also the merging of two originally
independent languages. XML Namespaces and RDF are designed to
meet these requirements, but a lot more thought and careful
design will be needed before the system is complete.
</p>
<hr />
<blockquote>
<h4>
<a name="within" id="within">The Space Within</a>
</h4>
<p>
Thirty spokes share the wheel's hub;<br />
It is the center hole that makes it useful.<br />
Shape clay into a vessel;<br />
It is the space within that makes it useful.<br />
Cut doors and windows for a room;<br />
It is the holes that make it useful.<br />
Therefore profit comes from what is there;<br />
Usefulness from what is not there.
</p>
</blockquote>
<address>
Lao-Tse
</address>
<p>
(UU-STLT#600)
</p>
<p>
...
</p>
<p>
Imagine that the EU and the US independently define RDF
schemata for an invoice. Invoice are traded around Europe
with a schema pointer at the top which identifies the smema.
Indeed, the schema may be found on the web.
</p>
<hr />
<hr />
<p>
<a href="Metadata.html">Next: Metadata architecture</a>
</p>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<p>
<a href="../People/Berners-Lee">Tim BL</a>
</p>
</body>
</html>