Semantic
26.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
Semantic Web roadmap
</title>
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
<link />
<link href="di.css" rel="stylesheet" type="text/css" />
</head>
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
<address>
Tim Berners-Lee
<p>
<small>Date: September 1998. Last modified: $Date:
1998/10/14 20:17:13 $</small>
</p>
<p>
Status: An attempt to give a high-level plan of the
architecture of the Semantic WWW. Editing status: Draft.
Comments welcome
</p>
</address>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<hr />
<h1>
Semantic Web Road map
</h1>
<p>
<i>A road map for the future, an architectural plan untested
by anything except thought experiments.</i>
</p>
<p>
This was written as part of a requested road map for future
Web design, from a level of 20,000ft. It was spun off from an
Architectural overview for an area which required more
elaboration than that overview could afford.
</p>
<p>
Necessarily, from 20,000 feet, large things seem to get a
small mention. It is architecture, then, in the sense of how
things hopefully will fit together. So we should recognize
that while it might be slowly changing, this is also a living
document.
</p>
<p>
This document is a plan for achieving a set of connected
applications for data on the Web in such a way as to form a
consistent logical web of data (semantic web).
</p>
<h3>
<a name="Introduction" id="Introduction">Introduction</a>
</h3>
<p>
The Web was designed as an information space, with the goal
that it should be useful not only for human-human
communication, but also that machines would be able to
participate and help. One of the major obstacles to this has
been the fact that most information on the Web is designed
for human consumption, and even if it was derived from a
database with well defined meanings (in at least some terms)
for its columns, that the structure of the data is not
evident to a robot browsing the web. Leaving aside the
artificial intelligence problem of training machines to
behave like people, the Semantic Web approach instead
develops languages for expressing information in a machine
processable form.
</p>
<p>
This document gives a road map - a sequence for the
incremental introduction of technology to take us, step by
step, from the Web of today to a Web in which machine
reasoning will be ubiquitous and devastatingly powerful.
</p>
<p>
It follows the note on the <a href=
"Architecture.html">architecture</a> of the Web, which
defines existing design decisions and principles for what has
been accomplished to date.
</p>
<h2>
<a name="SemanticWeb" id="SemanticWeb">Machine-Understandable
information: Semantic Web</a>
</h2>
<p>
The Semantic Web is a web of data, in some ways like a global
database. The rationale for creating such an infrastructure
is given elsewhere [Web future talks &c] here I only
outline the architecture as I see it.
</p>
<h2>
<a name="Assertion" id="Assertion">The basic assertion
model</a>
</h2>
<p>
When looking at a possible formulation of a universal Web of
semantic assertions, the principle of minimalist design
requires that it be based on a common model of great
generality. Only when the common model is general can any
prospective application be mapped onto the model. The general
model is the Resource Description Framework.
</p>
<p>
<i>See the</i> <a href="../TR/WD-rdf-syntax/"><i>RDF Model
and Syntax Specification</i></a>
</p>
<p>
Being general, this is very simple. Being simple there is
nothing much you can do with the model itself without
layering many things on top. The basic model contains just
the concept of an <b>assertion</b>, and the concept of
<b>quotation</b> - making assertions about assertions. This
is introduced because (a) it will be needed later anyway and
(b) most of the initial RDF applications are for data about
data ("metadata") in which assertions about assertions are
basic, even before logic. (Because for the target
applications of RDF, assertions are part of a description of
some resource, that resource is often an implicit parameter
and the assertion is known as a <b>property</b> of a
resource).
</p>
<p>
As far as mathematics goes, the language at this point has no
negation or implication, and is therefore very limited. Given
a set of facts, it is easy to say whether a proof exists or
not for any given question, because neither the facts nor the
questions can have enough power to make the problem
intractable.
</p>
<p>
Applications at this level are very numerous. Most of the
<a href="Architecture.html#Metadata">applications for the
representation of metadata</a> can be handled by RDF at this
level. Examples include card index information (the Dublin
Core), Privacy information (P3P), associations of style
sheets with documents, intellectual property rights labeling
and PICS labels. We are talking about the representation of
data here, which is typically simple: not languages for
expressing queries or inference rules.
</p>
<p>
RDF documents at this level do not have great power, and
sometimes it is less than evident why one should bother to
map an application in RDF. The answer is that we expect this
data, while limited and simple within an application, to be
combined, later, with data from other applications into a
Web. Applications which run over the whole web must be able
to use a common framework for combining information from all
these applications. For example, access control logic may use
a combination of privacy and group membership and data type
information to actually allow or deny access. Queries may
later allow powerful logical expressions referring to data
from domains in which, individually, the data representation
language is not very expressive. The purpose of this document
is partly to show the plan by which this might happen.
</p>
<h2>
<a name="Schema" id="Schema">The Schema layer</a>
</h2>
<p>
The basic model of the RDF allows us to do a lot on the
blackboard, but does not give us many tools. It gives us a
model of assertions and quotations on which we can map the
data in any new format.
</p>
<p>
We next need a schema layer to declare the existence of new
property. We need at the same time to say a little more about
it. We want to be able to constrain the way it used.
Typically we want to constrain the types of object it can
apply to. These meta-assertions make it possible to do
rudimentary checks on a document. Much as in SGML the "DTD"
allows one to check whether elements have been used in
appropriate positions, so in RDF a schema will allow us to
check that, for example, a driver's license has the name of a
person, and not a model of car, as its "name".
</p>
<p>
It is not clear to me exactly what primitives have to be
introduced, and whether much useful language can be defined
at this level without also defining the next level. There is
currently a <a href="http://www.w3.org/RDF/Group/Schema/">RDF
Schema working group</a> in this area. The schema language
typically makes simple assertions about permitted
combinations. If the SGML DTD is used as a model, the schema
can be in a language of very limited power. The constraints
expressed in the schema language are easily expanded into a
more powerful logical layer expressions (the next layer), but
one chose at this point, in order to limit the power, not to
do that. For example: one can say in a schema that a property
foo is unique. Expanded, that is that for any x, if y is the
foo of x, and z is the foo of x, then y equals z. This uses
logical expressions which are not available at this level,
but that is OK so long as the schema language is, for the
moment, going to be handled by specialized schema engines
only, not by a general reasoning engine.
</p>
<p>
When we do this sort of thing with a language - and I think
it will be very common - we must be careful that the language
is still well defined logically. Later on, we may want to
make inferences which can only be made by understanding the
semantics of the schema language in logical terms, and
combining it with other logical information.
</p>
<h2>
<a name="Conversion" id="Conversion">Conversion language</a>
</h2>
<p>
A requirement of namespaces work for <a href=
"Evolution.html">evolvability</a> is that one must, with
knowledge of common RDF at some level, be able to follow
rules for converting a document in one RDF schema into
another one (which presumably one has an innate understanding
of how to process).
</p>
<p>
By the principle of least power, this language can in fact be
made to have implication (inference rules) without having
negation. (This might seem a fine point to make, when in fact
one can easily write a rule which defines inference from a
statement A of another statement B which actually happens to
be false, even though the language has no way of actually
stating "False". However, still formally the language does
not have the power needed to write a paradox, which comforts
some people. In the following, though, as the language gets
more expressive, we rely not on an inherent ability to make
paradoxical statements, but on applications specifically
limiting the expressive power of particular documents.
Schemas provide a convenient place to describe those
restrictions.)
</p>
<p>
<img src="diagrams/zipcode.png" alt=
"Links between the table for Emp" align="left" />A simple
example of the application of this layer is when two
databases, constructed independently and then put on the web,
are linked by semantic links which allow queries on one to
converted into queries on another. Here, someone noticed that
"where" in the <em>friends</em> table and "zip" in a
<em>places</em> table mean the same thing. Someone else
documented that "zip" in the <em>places</em> table meant the
same things as "zip" in the <em>employees</em> table, and so
on as shown by arrows. Given this information, a search for
any employee called Fred with zip 02139 can be widened from
<em>employees</em> to in include <em>friends</em>. All that
is needed some RDF "equivalent" property.
</p>
<h2>
<a name="Logical" id="Logical">The logical layer</a>
</h2>
<p>
The next layer, then is the logical layer. We need ways of
writing logic into documents to allow such things as, for
example, rules the deduction of one type of document from a
document of another type; the checking of a document against
a set of rules of self-consistency; and the resolution of a
query by conversion from terms unknown into terms known.
Given that we have quotation in the language already, the
next layer is predicate logic (not, and, etc) and the next
layer quantification (for all x, y(x)).
</p>
<p>
The applications of RDF at this level are basically limited
only by the imagination. A simple example of the application
of this layer is when two databases, constructed
independently and then put on the web, are linked by semantic
links which allow queries on one to converted into queries on
another. Many things which may have seemed to have needed a
new language become suddenly simply a question of writing
down the right RDF. Once you have a language which has the
great power of predicate calculus with quotation, then when
defining a new language for a specific application, two
things are required:
</p>
<ul>
<li>One must settle on the (limited) power of the reasoning
engine which the receiver must have, and define a subset of
full RDF which will be expected to be understood;
</li>
<li>One will probably want to define some abbreviated
functions to efficiently transmit expressions within the set
of documents within the constrained language.
</li>
</ul>
<p>
<i>See also, if unconvinced:</i>
</p>
<ul>
<li>
<a href="RDFnot.html"><i>What the Semantic Web is
not</i></a> - answering some FAQs
</li>
</ul>
<p>
The metro map below shows a key loop in the semantic web. The
Web part, on the left, shows how a URI is, using HTTP, turned
into a representation of a document as a string of bits with
some MIME type. It is then parsed into XML and then into RDF,
to produce an RDF graph or, at the logic level, a logical
formula. On the right hand side, the Semantic part, shows how
the RDF graph contains a reference to the URI. It is the
trust from the key, combined with the meaning of the
statements contained in the document, which may cause a
Semantic Web engine to dereference another URI.
</p>
<p>
<img src="diagrams/loop.gif" alt=
"URI gets document which a parse" />
</p>
<h3>
<a name="Validation" id="Validation">Proof Validation - a
language for proof</a>
</h3>
<p>
The RDF model does not say anything about the form of
reasoning engine, and it is obviously an open question, as
there is no definitively perfect algorithm for answering
questions - or, basically, finding proofs. At this stage in
the development of the Semantic Web, though, we do not tackle
that problem. Most applications construction of a proof is
done according to some fairly constrained rules, and all that
the other party has to do is validate a general proof. This
is trivial.
</p>
<p>
For example, when someone is granted access to a web site,
they can be given a document which explains to the web server
why they should have access. The proof will be a chain [well,
DAG] of assertions and reasoning rules with pointers to all
the supporting material.
</p>
<p>
The same will be true of transactions involving privacy, and
most of electronic commerce. The documents sent across the
net will be written in a complete language. However, they
will be constrained so that, if queries, the results will be
computable, and in most cases they will be proofs. The HTTP
"GET" will contain a proof that the client has a right to the
response. the response will be a proof that the response is
in deed what was asked for.
</p>
<h3>
<a name="Inference" id="Inference">Evolution rules
Language</a>
</h3>
<p>
RDF at the logical level already has the power to express
inference rules. For example, you should be able to say such
things as "If the zipcode of the organization of x is y then
the work-zipcode of x is y". As noted above, just scattering
the Web with such remarks will in the end be very
interesting, but in the short term won't produce repeatable
results unless we restrict the expressiveness of documents to
solve particular application problems.
</p>
<p>
Two fundamental functions we require RDF engines to be able
to do are
</p>
<ol>
<li>for a version <i>n</i> implementation to be able to read
enough RDF schema to be able to deduce how to read a version
<i>n+1</i> document;
</li>
<li>for a type A application developed quite independently of
a type B application which has the same or similar function
to be able to read and process enough schema information to
be able to process data from the type B application.
</li>
</ol>
<p>
(See <a href="Evolution.html">evolvability article</a>)
</p>
<p>
The RDF logic level is sufficient to be usable as a language
for making inference rules. Note it does not address the
heuristics of any particular reasoning engine, which which is
an open field made all the more open and fruitful by the
Semantic Web. In other words, RDF will allow you to write
rules but won't tell anyone at this stage in which order to
apply them.
</p>
<p>
Where for example a library of congress schema talks of an
"author", and a British Library talks of a "creator", a small
bit of RDF would be able to say that for any person x and any
resource y, if x is the (LoC) author of y, then x is the (BL)
creator of y. This is the sort of rule which solves the
evolvability problems. Where would a processor find it? In
the case of a program which finds a version 2 document and
wants to find the rules to convert it into a version 1
document, then the version 2 schema would naturally contain
or point to the rules. In the case of retrospective
documentation of the relationship between two independently
invented schemas, then of course pointers to the rules could
be added to either schema, but if that is not (socially)
practical, then we have another example of the the annotation
problem. This can be solved by third party indexes which can
be searched for connections between two schemata. In practice
of course search engines provide this function very
effectively - you would just have to ask a search engine for
all references to one schema and check the results for rules
which like the two.
</p>
<h3>
<a name="Query" id="Query">Query languages</a>
</h3>
<p>
One is a query language. A query can be thought of as an
assertion about the result to be returned. Fundamentally, RDF
at the logical level is sufficient to represent this in any
case. However, in practice a query engine has specific
algorithms and indexes available with which to work, and can
therefore answer specific sorts of query.
</p>
<p>
It may of course in practice to develop a vocabulary which
helps in either of two ways:
</p>
<ol>
<li>It allows common powerful query types to be expressed
succinctly with fewer pages of mathematics, or
</li>
<li>It allows certain constrained queries to be expressed,
which are interesting because they have certain computability
properties.
</li>
</ol>
<p>
SQL is an example of a language which does both.
</p>
<p>
It is clearly important that the query language be defined in
terms of RDF logic. For example, to query a server for the
author of a resource, one would ask for an assertion of the
form "x is the author of p1" for some x. To ask for a
definitive list of all authors, one would ask for a set of
authors such that any author was in the set and everyone in
the set was an author. And so on.
</p>
<p>
In practice, the diversity of algorithms in search engines on
the web, and proof-finding algorithms in pre-web logical
systems suggests that there will in a semantic web be many
forms of agent able to provide answers to different forms of
query.
</p>
<p>
One useful step the specification of specific query engines
for for example searches to a finite level of depth in a
specified subset of the Web (such as a web site). Of course
there could be several alternatives for different occasions.
</p>
<p>
Another metastep is the specification of a query engine
description language -- basically a specification of the sort
of query the engine can return in a general way. This would
open the door to agents chaining together searches and
inference across many intermediate engines.
</p>
<h2>
<a name="Signature" id="Signature">Digital Signature</a>
</h2>
<p>
Public key cryptography is a remarkable technology which
completely changes what is possible. While one can add a
digital signature block as decoration on an existing
document, attempts to add the logic of trust as icing on the
cake of a reasoning system have to date been restricted to
systems limited in their generality. For reasoning to be able
to take trust into account, the common logical model requires
extension to include the keys with which assertions have been
signed.
</p>
<p>
Like all logic, the basis of this, may not seem appealing at
first until one has seen what can be built on top. This basis
is the introduction of keys as first class objects (where the
URI can be the literal value of a public key), and a the
introduction of general reasoning about assertions
attributable to keys.
</p>
<p>
In an implementation, this means that reasoning engine will
have to be tied to the signature verification system .
Documents will be parsed not just into trees of assertions,
but into into trees of assertions about who has signed what
assertions. Proof validation will, for inference rules, check
the logic, but for assertions that a document has been
signed, check the signature.
</p>
<p>
The result will be a system which can express and reason
about relationships across the whole range of public-key
based security and trust systems.
</p>
<p>
Digital signature becomes interesting when RDF is developed
to the level that a proof language exists. However, it can be
developed in parallel with RDF for the most part.
</p>
<p>
In the W3C, input to the digital signature work comes from
many directions, including experience with DSig1.0 signed
"pics" labels, and various submissions for digitally signed
documents.
</p>
<h3>
<a name="Indexes" id="Indexes">Indexes of terms</a>
</h3>
<p>
Given a worldwide semantic web of assertions, the search
engine technology currently (1998) applied to HTML pages will
presumably translate directly into indexes not of words, but
of RDF objects. This itself will allow much more efficient
searching of the Web as though it were one giant database,
rather than one giant book.
</p>
<p>
The Version A to Version B translation requirement has now
been met, and so when two databases exist as for example
large arrays of (probably virtual) RDF files, then even
though the initial schemas may not have been the same, a
retrospective documentation of their equivalence would allow
a search engine to satisfy queries by searching across both
databases.
</p>
<h2>
<a name="Engines" id="Engines">Engines of the Future</a>
</h2>
<p>
While search engines which index HTML pages find many answers
to searches and cover a huge part of the Web, then return
many inappropriate answers. There is no notion of
"correctness" to such searches. By contrast, logical engines
have typically been able to restrict their output to that
which is provably correct answer, but have suffered from the
inability to rummage through the mass of intertwined data to
construct valid answers. The combinatorial explosion of
possibilities to be traced has been quite intractable.
</p>
<p>
However, the scale upon which search engines have been
successful may force us to reexamine our assumptions here. If
an engine of the future combines a reasoning engine with a
search engine, it may be able to get the best of both worlds,
and actually be able to construct proofs in a certain number
of cases of very real impact. It will be able to reach out to
indexes which contain very complete lists of all occurrences
of a given term, and then use logic to weed out all but those
which can be of use in solving the given problem.
</p>
<p>
So while nothing will make the combinatorial explosion go
away, many real life problems can be solved using just a few
(say two) steps of inference out on the wild web, the rest of
the reasoning being in a realm in which proofs are give, or
there are constrains and well understood computable
algorithms. I also expect a string commercial incentive to
develop engines and algorithms which will efficiently tackle
specific types of problem. This may involve making caches of
intermediate results much analogous to the search engines'
indexes of today.
</p>
<p>
Though there will still not be a machine which can guarantee
to answer arbitrary questions, the power to answer real
questions which are the stuff of our daily lives and
especially of commerce may be quite remarkable.
</p>
<hr />
<p>
In this series:
</p>
<ul>
<li>
<a href="RDFnot.html"><i>What the Semantic Web is
not</i></a> - answering some FAQs of the unconvinced.
</li>
<li>
<a href="Evolution.html">Evolvability</a>: properties of
the language for evolution of the technology
</li>
<li>
<a href="Architecture.html">Web Architecture from 50,000
feet</a>
</li>
</ul>
<h2>
<a name="References" id="References">References</a>
</h2>
<p>
<a href="http://www.cyc.com/tech.html#cycl" name="cyc" id=
"cyc">The CYC Representation Language</a>
</p>
<p>
<a href="http://logic.stanford.edu/kif/kif.html" name="kif"
id="kif">Knowledge Interchange Format (KIF)</a>
</p>
<p>
@@
</p>
<p>
Acknowledgements
</p>
<p>
This plan is based in discussions with the W3C team, and
various W3C member companies. Thanks also to David Karger and
Daniel Jackson of MIT/LCS.
</p>
<hr />
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
</body>
</html>