RDB-RDF.html
21.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<meta http-equiv="content-type" content=
"text/html; charset=us-ascii" />
<title>
Relational Databases and the Semantic Web (in Design Issues)
</title>
<style type="text/css">
/*<![CDATA[*/
.work {background-color: #FFFFC1}
/*]]>*/
</style>
<link href="di.css" rel="stylesheet" type="text/css" />
</head>
<body bgcolor="#DDFFDD" text="#000000">
<address>
Tim Berners-Lee Created
<p>
<small>Date: September 1998.</small>
</p>
</address>
<p>
$Id: RDB-RDF.html,v 1.25 2009/08/27 21:38:09 timbl Exp $
</p>
<address>
<p>
Status: . Editing status: Comments please. An parenthetical
discussion to the <a href="Architecture.html">Web
Architecture at 50,000 feet</a>. and the <a href=
"Semantic.html">Semantic Web roadmap</a>.
</p>
</address>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<hr />
<h1>
Relational Databases on the Semantic Web
</h1>
<p>
There are many other data models which RDF's Directed
Labelled Graph (DLG) model compares closely with, and maps
onto. See a summary in
</p>
<ul>
<li>
<a href="RDFnot.html">What the Semantic Web can
represent</a>
</li>
</ul>
<p>
One is the Relational Database (RDB) model.
</p>
<h2>
<a name="ER" id="ER">The Semantic Web and Entity-Relationship
models</a>
</h2>
<p>
Is the RDF model an entity-relationship mode? Yes and no. It
is great as a basis for ER-modelling, but because RDF is used
for other things as well, RDF is more general. RDF is a model
of entities (nodes) and relationships. If you are used to the
"ER" modelling system for data, then the RDF model is
basically an openning of the ER model to work on the Web. In
typical ER model involved entity types, and for each entity
type there are a set of relationships (slots in the typical
ER diagram). The RDF model is the same, except that
relationships are first class objects: they are identified by
a URI, and so anyone can make one. Furthurmore, the set of
slots of an object is not defined when the class of an object
is defined. The Web works though anyone being (technically)
allowed to say anything about anything. This means that a
relationship between two objects may be stored apart from any
other information about the two objects. This is different
from object-oriented systems often used to implement ER
models, which generally assume that information about an
object is stored in an object: the definition of the class of
an object defines the storage implied for its properties.
</p>
<p>
For example, one person may define a vehicle as having a
number of wheels and a weight and a length, but not foresee a
color. This will not stop another person making the assertion
that a given car is red, using the color vocabular from
elsewhere.
</p>
<p>
Apart from this simple but significant change, many concepts
involved in the ER modelling take across directly onto the
Semantic Web model.
</p>
<h2>
The Semantic Web and Relational Databases
</h2>
<p>
The semantic web data model is very directly connected with
the model of relational databases. A relational database
consists of tables, which consists of rows, or records. Each
record consists of a set of fields. The record is nothing but
the content of its fields, just as an RDF node is nothing but
the connections: the property values. The mapping is very
direct
</p>
<ul>
<li>a record is an RDF node;
</li>
<li>the field (column) name is RDF propertyType; and
</li>
<li>the record field (table cell) is a value.
</li>
</ul>
<p>
Indeed, one of the main driving forces for the Semantic web,
has always been the expression, on the Web, of the vast
amount of relational database information in a way that can
be processsed by machines.
</p>
<p>
RDF's serialization format -- its syntax in XML -- is a very
suitable format for expressing relational database
information.
</p>
<h3>
Special aspects of the RDB model
</h3>
<p>
Relational database systems manage RDF data, but in a
specialized way. In a table, there are many records with the
same set of properties. An individual cell (which corresponds
to an RDF property) is not often thought of on its own. SQL
queries can join tables and extract data from tables, and the
result is generally a table. So, the practical use for which
RDB software is used typically optimized for doing operations
with a small number of tables some of which may have a large
number of elements.
</p>
<p>
A fundamental aspect of a database table is that often the
data in a table can be definitive. Neither RDF nor RDB models
have simple ways of expressing this. For example, not only
does a row in a table indicate that there is a red car whose
Massachusetts plate is "123XYZ", but the table may also carry
the unwritten semantics that if any car has a Massachusetts
plate then it must be in the table. (If any RDF node has
"Massachusetts plate number" property then than node is a
member of the table) The scope of the uniquenes of a value is
in fact a very interest property.
</p>
<p>
The original RDB model defined by E.F. Codd included
datatyping with inheritance, which he had intended would be
implememnted in the RDB products to a greater extent that it
has. For example, typically a person's home address house
number may be typed as an an integer, and their shoe size may
also be also be typed as an integer. One can as a result join
to tables through those fields, or list people whose shoe
size equals their house number. Practical RDB systems leave
it to the application builder to only make operations which
make sense. Once a database is expreted onto the Web, it
becomes possible to do all kinds of strange combinations, so
a stronger typing becomes very useful: it becomes a set of
inference rules.
</p>
<p>
In a pure RDB model, every table has a primary key: a column
whose value can be used to uniquely identify every row. Some
products do not enforce this, leading to an ambiguity in the
significance of duplicate rows. A curious feature is that the
primary key can be changed without changing the identity of a
row. (A person can change their name for example). SQL allows
tables to be set up so that such changes can cascade through
the local system to preseve referential integrity. This
clearly won't work on the Web. One solution is to use a row
ID -- which many systems do in fact use although SQL doesn't
expose it in a standard way. Another is for the application
to coinstrain the primary key not to change. Another is to
put up with links breaking.
</p>
<p>
RDB systems have datatypes at the atomic (unstructured)
level, as RDF and XML will/do. Combination rules tend in RDBs
to be loosely enforced, in that a query can join tables by
any columns which match by datatype -- without any check on
the semantics. You could for example create a list of houses
that have the same number as rooms as an employee's shoe
size, for every employee, even though the sense of that would
be questionable.
</p>
<p>
The new SQL99 standard is going to include new
object-oriented features, such as inherited typing and
structured contents of cells - arrays and structs. This RDB
model with things from the OO world. I don't deal with that
here in that the RDF model works as a lowest commoin
denominator being able to express either and both.
</p>
<h3>
Schemas and Schemas
</h3>
<p>
A difference between XML/RDF schemas (and SGML) on the one
hand and database schemas on the other is the expectation
that there will be a relatively small number of XML/RDF
schemas. Many web sites will export documents whose structure
is defined by the same schema, and this is in fact what
provides the interoperability.
</p>
<p>
A database schema is, as fasr as I know, created
independently for each database. Even if a million companies
clone the same form of employee database, there will be a
million schemas, one for each database.
</p>
<p>
It may be that RDF will fill a simple role in simply
expressing the equivalence of the terms in each database
schema.
</p>
<h3>
Exposing a database on the Web
</h3>
<p>
In order to be able to access a table, and make extra
statements about it which will enable its use in more and
more ways, the essential objects of the table must be
exported as first class objects on the Web.
</p>
<p>
When mapping any system onto the Web, the mapping into URI
space is critical. Here we are doing this common operation
generically for all relational databases. It is obviously
usefuil for this to be done in a consistent ways between
multiple vendors would be useful - an area for possible
standardization.
</p>
<p>
Here is a random example I may have gotten wrong, basd on
whatI understand of the naming within databases. The database
itself is defined within a schema which is listed in a
catalog.
</p>
<table border="1">
<caption>
Mapping an RDB into the Web - strawman
</caption>
<tbody>
<tr>
<td>
Catalog
</td>
<td>
http://www.acme.com/mycat
</td>
<td></td>
</tr>
<tr>
<td>
Schema
</td>
<td>
http://www.acme.com/mycat/schema1
</td>
<td></td>
</tr>
<tr>
<td>
Database
</td>
<td>
http://www.acme.com/mycat/schema1/empdb/
</td>
<th>
Relative:
</th>
</tr>
<tr>
<td>
Table
</td>
<td>
/mycat/schema1/empdb/emps
</td>
<td>
emps
</td>
</tr>
<tr>
<td>
Column name
</td>
<td>
/mycat/schema1/empdb/emps/shoe
</td>
<td>
emps/shoe
</td>
</tr>
<tr>
<td>
View
</td>
<td>
/mycat/schema1/empdb/emps2
</td>
<td>
emps2
</td>
</tr>
<tr>
<td>
Row
</td>
<td>
/mycat/schema1/empdb/emps/rowid=123
</td>
<td>
emps/rowid=123
</td>
</tr>
<tr>
<td>
Cell
</td>
<td>
/mycat/schema1/empdb/emps/rowid=123;col=shoe
</td>
<td>
emps/rowid=123;col=shoe
</td>
</tr>
<tr>
<td>
Arbitrary query
</td>
<td>
/mycat/schema1/empdb/?select+empno+from<em>[...]</em>
</td>
<td>
?select<em>[...]</em>
</td>
</tr>
</tbody>
</table>
<p>
2002 version, see <a href=
"http://www.w3.org/2000/10/swap/dbork/dbview.py">real
code</a> implemented by Dan Connolly:
</p>
<table border="1">
<tbody>
<tr>
<th>
<a name="table" id="table">What</a>
</th>
<th>
Uriref relative to http://www.acme.com/wherever/
</th>
<th>
rdf:type
</th>
</tr>
<tr class="work">
<td>
<p>
Database description of database "personnel"
</p>
</td>
<td>
personnel
<p>
(say - whatever)
</p>
</td>
<td>
soc:Work, rdfdocument, db:DatabaseDescription
</td>
</tr>
<tr>
<td>
The conceptual database(a table of tables??)
</td>
<td>
personnel#_database
<p>
(Arbitrary, must not clash, linked by
<code><strong>db:describes</strong></code> from
personnel)
</p>
</td>
<td></td>
</tr>
<tr class="work">
<td>
A document giving all the data in the database. May
support PUT?
</td>
<td>
personnel/_data
<p>
(Arbitrary, must not clash with table names, linked
by <strong><code>db:allData</code></strong> from
personnel)
</p>
</td>
<td>
soc:Work, rdfdocument
</td>
</tr>
<tr>
<td>
The concept of the table "employees": The class of
exactly those things which are in the table.
</td>
<td>
<p>
personnel/employees#.table
</p>
<p>
(was: personnel#employees, but changed to allow it to
be deref'd to giev useful data)
</p>
<p>
(defined in personnel)
</p>
</td>
<td>
rdfs:Class, db:Table
</td>
</tr>
<tr class="work">
<td>
A description of the table. Optimization: includes the
current size of the table. Identifies primary key if
any.
</td>
<td>
personnel/employees
<p>
(<strong>Convention</strong>. The bit of the
classname before the #)
</p>
</td>
<td>
soc:Work, rdfdocument, db:TableDescription
</td>
</tr>
<tr class="work">
<td>
A description of all the tables. Just an (optional)
optimization.
</td>
<td>
personnel/_all
<p>
(Arbitrary, must not clash, linked by
<code><strong>db:tableSchemas</strong></code> from
personnel/employees)
</p>
</td>
<td>
soc:Work, rdfdocument, db:TableDescription
</td>
</tr>
<tr>
<td>
The concept of a column in the table, the Property
something has iff that is recorded in the table.
</td>
<td>
personnel/employees#email
<p>
(Defined in personnel/employees)
</p>
</td>
<td>
rdf:Property, db:Column
</td>
</tr>
<tr class="work">
<td>
A document giving all the data in the table. May
support PUT
</td>
<td>
personnel/employees/_data
<p>
(Arbitrary, must not clash, linked by
<strong><code>db:tableData</code></strong> from
personnel/employees)
</p>
</td>
<td>
soc:Work, rdfdocument,
</td>
</tr>
<tr class="work">
<td>
A document giving the data in the row for which the
primary key is 1234. (Iff primary key exists). May
support PUT
</td>
<td>
personnel/employees/1234
<p>
(<strong>Convention.</strong> Note the primary key
value must be encoded suitably!)
</p>
</td>
<td>
soc:Work, rdfdocument
</td>
</tr>
<tr>
<td>
The concept of the thing describd by that row.
</td>
<td>
<p>
personnel/employees/1234#item
</p>
<p>
(<strong>Convention</strong>)
</p>
<p>
(when primary key exists, then employees#_data etc
use this URIref for the item 1234 intead of making
anonymous nodes)
</p>
<p>
(employees/_data#1234?@@)
</p>
</td>
<td>
personnel/employees#_Class
</td>
</tr>
<tr class="work">
<td>
A document giving the information in just one cell
</td>
<td>
personnel/employees/1234/email
<p>
(<strong>Convention</strong>)
</p>
</td>
<td>
[ is rdf:domain of personnel/employees#email ]
</td>
</tr>
<tr class="work">
<td>
Arbitrary query
</td>
<td>
personnel/_sql?select+empno+from<em>[...]</em>
<p>
(arbitrary, linked by
<code><strong>db:sqlService</strong></code> from
personnel if supported.)
</p>
</td>
<td>
soc:Work, rdfdocument
</td>
</tr>
<tr class="work">
<td>
Arbirary HTML form field match (select * from employees
where email like "*fred*") [@details]
</td>
<td>
personnel/_fquery?email=*fred*;name=Joe
<p>
(arbitrary, linked by
<code><strong>db:formService</strong></code> from
personnel if supported)
</p>
</td>
<td>
soc:Work, rdfdocument
</td>
</tr>
<tr>
<td>
POST point for RDF data, either new data, or assertions
that some (n3) Formula is a log:Falsehood.
</td>
<td>
<p>
personnel/_postme
</p>
<p>
(arbitrary, linked by
<code><strong>db:deltaService</strong></code> from
personnel if supported. Could be same URI
<code>personnel</code> in fact, as we are dealing
iwth a different method)
</p>
</td>
<td>
db:postable
</td>
</tr>
</tbody>
</table>
<p>
@@@ How to use typing to indicate that the URI in the table
is a (relative?) URI to another object, not a string?
</p>
<p>
@@@ This works fine when implemented live on a database.
However, it is a little tricky to emulate in a typical
file-based web server because of the use of "personnel" in
this case both as directory and as
</p>
<p>
One of the things which makes life easier is to make the
mapping so that the relative URI syntax can be used to
advantage. For example, here, everything within the database
(the scope of an SQL statement) can be writted as a short
URI.
</p>
<p>
There is a question as to how much of the SQL query syntax
should be turned into identifier. For example, is a query on
a primary key really an identifier? Is the extraction of a
single cell really an identifier? It would be useful to be
able to treat them as such. However, it would be wiser to use
the "?" convention to indicate a generalized SQL idempotent
query. (A URL should <a href="Axioms.html#get">of course</a>
<em>never</em> be used to refer to the results of a
table-changing operation such as UPDATE or DELETE. In this
case, if HTTP were used, an SQL query should IMHO be POST ed
to the database URI. Of course, you can use your favorite
networked database access protocol)
</p>
<p>
In the above the column name of the table could be refered to
using the table as a namespace, a row for example being
</p>
<pre>
<foo<br /> xmlns:t="http://www.example.com/mycat/personnel/employees"><br /> <t:email>joe@example.com</t:email><br /> <t:age>45</t:age><br /></foo>
</pre>
<p>
and one row of the the result of joining this table (of
people) and another table (about people) by their primary
keys would use namespaces from both tables:
</p>
<pre>
<foo<br /> xmlns:t="http://www.example.com/mycat/personnel/employees"<br /> xmlns:u="http://www.acme.com/mycat/schema1/empdb/likes"><br /> <t:email>joe@example.com</t:email><br /> <t:age>45</t:age><br /> <u:music>blues</u:music><br /></foo>
</pre>
<hr />
<h2>
Later related work:
</h2><a href=
"http://www.cs.man.ac.uk/~ocorcho/documents/SWDB2004_BarrasaEtAl.pdf">R2O,
an Extensible and Semantically Based Database-to-Ontology
Mapping Language.</a> Barrasa J, Corcho O,
Gómez-Pérez A. Second Workshop on
Semantic Web and Databases (SWDB2004). Toronto, Canada. August
2004.
<hr />
<p>
<em>This has been elaborated with help of an RDB tutorial and
discussion from Andrew Eisenberg/Sybase</em>.
</p>
<hr />
<p>
See also: <a href="RDF-XML.html">Why RDF is more than XML</a>
</p>
<p>
<a href="Overview.html">Up to Design Issues</a>; back to
<a href="Architecture.html">Architecture from 50,000ft</a>
</p>
<p>
timbl
</p>
</body>
</html>