enabling.html
16.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
<html>
<head><title>W3C QL98 Query Position Paper: RDF - Enabling Inferencing</title></head>
<body bgcolor="#ffffff">
<h1>Enabling Inferencing</h1>
Authors:<BR>
R.V. Guha (Netscape) <guha@netscape.com><BR>
Ora Lassila (Nokia) <ora.lassila@research.nokia.com><BR>
Eric Miller (OCLC) <emiller@oclc.org><BR>
Dan Brickley (ILRT, University of Bristol)
<daniel.brickley@bristol.ac.uk><BR>
<BR>
Date: 18 Nov 1998<BR>
<P>
Status of this document:<BR>
This is a position paper for the W3C Query Languages meeting in
Boston, December 3-4th 1998.</P>
<h3>Abstract</h3>
<BLOCKQUOTE>
The world wide web today is a network of hyperlinked resources. The
content of these resources is in most part opaque to
computers. Browsers display them and search
engines locate occurances of words within them, but the level
of "machine understanding" of the content, if any, is
very limited. A search engine, for example, might know that a resource
contained the textual string "<code>lion</code>" but not that it was a
representation of a <em>lion</em>, where lions are known to be members of
the class of <em>mammals</em>. By enabling richer representation such as
this, RDF makes it possible to express queries that go beyond simple
text-matching.
<BR><BR>
This paper presents an overview of the
query services that might be built on top of XML/RDF data. It does not
present a specific proposal for an RDF query language; instead, it
argues for a query language that is expressed in terms of the RDF
logical data model rather than one particular concrete syntax
</BLOCKQUOTE>
<h4>Evolving the Web from a Document Repository to a Knowledge Base</h4>
With the advent of RDF and XML, we how have an opportunity
to encode or annotate the content in a more machine
understandable way. This will help the web evolve from
a set of opaque pages to a rich knowledge base.
This in turn will enable many new interesting applications,
one of the most important of which will be the
precise search and retrieval of content.
<P>
<h5>Example</h5>
<blockquote>
The content of images is typically very opaque to computers.
Searching for images that contain particular kinds of scenes
or items is usually done by searching for words which might occur
on a page which refers to the image. This method is highly
inaccurate. If the image were associated with a piece of RDF
that clearly specified its content, significantly more precise
retrieval would be possible. E.g., a photo of a lion could
be annotated as depicting a lion. The following piece of RDF
does this.
</blockquote>
<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">
<tr><td><pre>
<!-- somewhere on the web.. some RDF statements about a picture -->
<RDF xmlns:cx="http://www.wwc.org/cat.rdf"
xmlns:P="http://www.images.org/image-desc-schema.rdf"
xmlns:vocab="http://vocab.org/useful#"
xmlns:rdf="http://www.w3.org/TR/rdf.rdf">
<P:Photograph rdf:about="http://www.imagelib.com/lion1.jpg">
<P:depicts>
<cx:Lion>
<vocab:color resource="http://vocab.org/useful#tan"/>
<vocab:gender resource="http://vocab.org/useful#female"/>
</cx:Lion>
</P:depicts>
<P:depicts rdf:resource="http://registries.org/people/Fred"/>
</P:Photograph>
<rdf:/RDF>
<!-- a picture of a tan coloured female lion and a
person identified only by URI -->
<!-- somewhere else on the web... -->
<rdf:RDF xmlns:vocab="http://vocab.org/useful#"
xmlns:rdf="http://www.w3.org/TR/WD-rdf-syntax#">
<vocab:Person rdf:about="http://registries.org/people/Fred">
<vocab:gender rdf:resource="http://vocab.org/useful#male"/>
<vocab:name> Fred </vocab:name>
</vocab:Person>
</rdf:RDF>
</pre>
see <A HREF="#footnote">footnote</A> for explanation of syntax
</table></center>
<P>
With this information, a search engine could do a very
precise search for pictures of lions. Searching for the word
"lion" on the other hand retrieves 784150 pages in Alta Vista,
most of which are references to the Lions Club, Lion King, etc.
</P>
<P>
While this kind of simple matching based retrieval
would be useful, it does not even come close to exploiting
the full potential of having machine understandable content (MUC).
</P>
<P>
In order to fully exploit this, we need to be able to
build inferential services on top of this MUC. Such a service
would combine the "raw" MUC with a set of axioms/rules,
enabling machines to infer knowledge that is implicit
in the MUC.
</P>
<h5>Example: inference-based image retrieval</H5>
<P>Imagine an appropriately captioned photograph of
a child's birthday party. Now consider someone searching
for an image with decorations. Typically, children's birthday
parties have decorations. However, since the caption does
not explicitly state that the image scene contains decorations,
a simple matching based algorithm will not find this image.
<BR>
An inferential service could draw on a rich set of rules
about the world (including events like birthday parties) to
infer that the photograph probably includes some decorations,
thereby improving the retrieval.
</P>
<h5>Example: class hierarchies</h5>
<P>The RDF Schema specification language provides facilities for
machine-readable vocabularies to be specified using a hierarchical type
system. This allows a resource to be described as member of some
specific class
(eg. 'Snow Leopard') and have it's membership of more general classes (eg.
'Big Cat', 'Mammal') implied by the RDF type system. This makes it
feasible to express searches for resources using general categories, and
have the results include resources whose membership of those broad
categories is inferred from their membership of some more detailed
sub-category.</P>
<h3>Logical vs Physical Models</h3>
Inferencing engines typically work by applying a set of "rules"
--- statements such as, if (a and b) then infer c) --- to a set of
ground atomic facts (this is a gross simplification of inferencing, but will
suffice for the purposes of this paper). The application of
rules to derive new conclusions can occur either when
a query is posed or when new facts or obtained. The set of rules
and facts can be centralized or distributed.
<P>
Inferencing engines always work on a "logical model".
The logical model is an (typically set-theoretic) abstraction.
<P>
The logical model is by definition an abstract entity.
Logical models are typically grounded in
one or more concrete syntaxes (aka physical models).
<P>
W3C logical models are based on RDF and syntax models are based on XML.
<P>
The distinction between the Logical Model vs the Syntax Model
has evolved over decades of work in math and computer science and
is found wherever representation of information is involved.
<P>
<UL>
<LI> <B>Analogy:</B>
In relational databases, we have <em> Relational Algebra </em>
which provides the logical model and syntaxes such as tab delimited
tables which provide the physical model. The former is important
for defining query languages (such as SQL) and the later is important
for transfering content across the wire.
</UL>
<P>
Any particular concrete manipulation is always on a physical model.
Therefore, it is often tempting to either confuse the
two or try make do with just the physical model. However,
there are several reasons why complex applications such as
inferencing engines are based on the logical model.
<OL>
<LI>There is a one-to-many mapping from a logical model to the concrete
representations of that logical model. A concrete syntax needs
to make certain commitments that a logical model need not.
The logical model hides details of the physical model
that don't carry any semantics. This in turn makes it easier
to build applications such as inferencing engines and higher
level query languages.
<P>
Examples:
<UL>
<LI> In Relational Databases the "order" of rows in a table carries
no semantics. However, in a tab-delimited file the rows have
to appear in <it>some</it> order. By operating at the relational
level (as opposed to the file format level), SQL hides an aspect
of the data that does not carry any semantics.
<LI> In predicate logic, "A or B" is semantically equivalent to "B or A",
even though the two physical strings are very different.
A logical model for predicate logic (such as Tarskian semantics)
hides this difference.
</UL>
<P>
<LI> Logical models provide support for certain operations which may
be difficult or impossible to properly define at a physical level.
<P>
For example, if the logical model of a knowledge is a directed
labelled graph (as with RDF), the aggregation of multiple knowledge
bases can be defined cleanly as graph superposition at the logical
level, even though it would be hard to define the concept of "aggregation"
two XML files.
<P>
<LI> Most interesting inferencing systems have infinite deductive
closures. This means that the deductive closure --- the set of
all the statements implied --- can never have a completele concrete representation.
In such cases, a query language for determining whether a
proposition is in the deductive closure
necceserily needs to be in terms of the logical model.
</OL>
<P>
<B>
Given the importance of the logical model, it is clear that we need
query languages not just for XML but also for RDF.
</B>
<BR>
<BR>
<h3>Query languages for RDF.</h3>
<P>
This position paper suggests a general outline for an RDF querying
system.
RDF's simple yet powerful data model allows for an equally simple
yet powerful query language. The query language is based on a single
query mechanism : <em>subgraph matching.</em>
</P>
<P>
Every query is against an RDF knowledge base (KB), which in turn could be
an aggregation
of two or more RDF knowledge bases. Every RDF/XML block (i.e., the RDF
within a <RDF> ...</RDF>) can be thought of as a serialised
RDF knowledge base.
</P>
<P>
The query is itself simply an RDF model (i.e., a directed labelled
graph), some of whose resources and properties may represent
<em>variables</em>.
There are two outputs to every query,
<OL>
<LI> A subgraph (of the KB against which the query is issued) which
matches the query.
<LI> A table of sets of legal bindings for the variables, i.e., when these
bindings are applied to the variables in the query, we get (1).
</OL>
</P>
<P>
Here are a couple of salient points about the query language outlined above.
</P>
<UL>
<LI> It can be used for a wide range of queries from simple graph traversal to
complex datalog like queries. For the sake of efficiency, a concrete query
language might add a number of "utility" functions for the simple graph
traversal.
<LI> One of the results of a query is itself an RDF knowledge base. This means
that it is possible to issue a query against the result of another query. In this
sense, this query language is similar to relational query languages. This feature
will make it possible to construct recursive queries.
</UL>
<P>
RDF Schema constructs such as <em>subClassOf</em> and <em>subPropertyOf</em>
allow some simple inferences. In future, more complex rules will be
expressible and more powerful inference engines will become possible.
Ideally, the query language used by an inferencing system to access
the knowledge base should be the same the query language the inferencing
system responds to.
</P>
<P>
To enable this, a query can take an additional parameter which specifies
whether its answer should be based on either the "raw RDF graph" or on
the deductive closure of the knowledge base.
</P>
<h4>Examples</h4>
<P><B>Note:</B> The syntax of these queries could easily be represented in RDF/XML
syntax. For the purposes of this paper we use a simple syntax
in which '$x' and '$y' represent variables and properties are shown
using namespace prefixes.</P>
<H5>Query 1: A lion</H5>
The query: <B>rdf:type($y, cx:Lion).</B>
<BR><BR>
<EM>
(the example syntax used here means
"Find resources 'y' which have a
http://www.w3.org/TR/WD-rdf-syntax#type
property whose value is the resource
http://www.wwc.org/cat.rdf#Lion").</EM>
<BR>
<BR>returns:
<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">
<tr><td><pre>
<rdf:RDF xmlns:P = "http://www.images.org/image-desc-schema.rdf#"
xmlns:rdf = "http://www.w3.org/TR/WD-rdf-syntax#">
<cx:Lion/>
</rdf:RDF>
and
(($y . [anonymous-resource]))
</pre>
</table></center>
<H5>Query 2 : A photograph depicting a male</H5>
The query: <B>P:depicts($x, $y) and rdf:type($x, P:Photograph)
and vocab:gender($y, vocab:male)
</B>
<BR><BR>
(meaning: <EM>"Find values for 'x' and 'y' where resource
'x' has an http://www.images.org/image-desc-schema.rdf#depicts property
whose value is 'y', and where 'x' has an
http://www.w3.org/TR/WD-rdf-syntax#type property with value
http://www.images.org/image-desc-schema.rdf#Photograph and
'y' has an http://vocab.org/useful#gender property whose value is
http://vocab.org/useful#male
</EM>)
<BR><BR>
returns:
<center><table cellpadding="5" border="1" bgcolor="#80ffff" width="95%">
<tr><td><pre>
<rdf:RDF
xmlns:cx = "http://www.wwc.org/cat.rdf"
xmlns:P = "http://www.images.org/image-desc-schema.rdf#"
xmlns:rdf = "http://www.w3.org/TR/WD-rdf-syntax#">
<P:Photograph rdf:about="http://www.imagelib.com/lion1.jpg">
<P:depicts>
<cx:Lion>
<vocab:color resource="http://vocab.org/useful#tan"/>
<vocab:gender resource="http://vocab.org/useful#female"/>
</cx:Lion>
</P:depicts>
<P:depicts>
<cx:Person rdf:about="http://registries.org/people/Fred" >
<vocab:gender rdf:resource="http://vocab.org/#male"/>
</cx:Person>
</P:depicts>
</P:Photograph>
</rdf:RDF>
<!-- note that the sub-graph returned here includes information
from two sources; statements about the photograph and about
Fred when taken together tell us that this is a photograph of
a male -->
</pre>
and<BR>
(($x . [http://www.imagelib.com/lion1.jpg])($y . [http://registries.org/people/Fred]))
</table></center>
<BR>
Similarly, the query "<B>P:depicts($x, $y) and rdf:type($y, cx:Lion) and
vocab:gender($y, vocab:female)</B>" would retrieve only illustrations of
female <em>Lions</em>.
<BR><BR>
<h3>Conclusion</h3>
With the advent of simple and powerful data models such as RDF and formal,
flexible syntaxes such as XML, we how have an opportunity to encode or
annotate web content in a more machine understandable way. These
standards
provide the ability to layer inferencing services that will facilitate
the evolution of the web from a set of opaque pages to a rich knowledge base.
As such, the web of today, the vast unstructured mass of information, may
in the future be transformed into something more manageable - and thus
something far more useful.
<P>
<BR><BR>
</P>
<HR NOSHADE />
<h3>Notes</H3>
<A NAME="footnote"></A>
<P>The following is a human-readable interpretation of the RDF used in the
example...</P>
<P>The first block of RDF uses four vocabularies to state that there is
a resource (http://www.imagelib.com/lion1.jpg) which is a member of the class
'Photograph' and which depicts an object that is an member of the class
'Lion' and which in turn has a color property with value 'tan', and a
gender
property with value 'female'. The photograph also depicts a second object
identified only by URI (http://registries.org/people/Fred). A second
source of information provides further RDF statements about
[http://registries.org/people/Fred]. In this case, we learn a name
("Fred") and that Fred is male.</P>
<h3>References</h3>
W3C Data Formats (W3C NOTE 29-October-1997)
<A
HREF="http://www.w3.org/TR/NOTE-rdfarch">http://www.w3.org/TR/NOTE-rdfarch</A>
<BR><BR>
Resource Description Framework (RDF) Schemas; <A
HREF="http://www.w3.org/TR/WD-rdf-schema">http://www.w3.org/TR/WD-rdf-schema</A>
<BR><BR>
Resource Description Framework (RDF) Model and Syntax;
<A
HREF="http://www.w3.org/TR/WD-rdf-syntax/">http://www.w3.org/TR/WD-rdf-syntax/</A>
<BR><BR>
Extensible Markup Language (XML) 1.0;
<A HREF="Extensible Markup
Language (XML) 1.0">http://www.w3.org/TR/1998/REC-xml-19980210</A>
</body>
</html>