RDF-XML
16.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<meta http-equiv="Content-Type" content="text/html" />
<title>
Semantic Web: Why RDF is more than XML
</title>
<link href="di.css" rel="stylesheet" type="text/css" />
</head>
<body bgcolor="#DDFFDD" text="#000000">
<address>
Tim Berners-Lee
<p>
<small>Date: September 1998. Last modified: $Date:
1998/10/14 20:17:13 $</small>
</p>
<p>
Status: An attempt to explain the difference between the
XML and RDF models. Editing status: Draft. Comments
welcome!
</p>
</address>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<hr />
<h1>
Why RDF model is different from the XML model
</h1>
<p>
This note is an attempt to answer the question, "Why should I
use RDF - why not just XML?". This has been a question which
has been around ever since RDF started. At the W3C Query
Language workshop, there was a clear difference of view
between those who wanted to query documents and those who
wanted to extract the "meaning" in some form and query that.
This is typical. I wrote this note in a frustrated attempt to
explain whatthe RDF model was for those who though in terms
of the XML model. I later listened to those who thought in
terms of the XML model, and tried to writ it the other way
around in <a href="XML-Semantics.html">another note</a>. This
note assumes that the XML data model in all its complexity,
and the RDF syntax as in RDF Model and Syntax, in all its
complexity. It doesn't try to map one directly onto the other
-- it expresses the RDF model using XML.
</p>
<p>
Let me take as an example a single RDF assertion. Let's try
"The author of the <i>page</i> is <i>Ora</i>". This is
traditional. In RDF this is a triple
</p>
<pre>
triple(author, page, Ora)
</pre>
<p>
which you can think of as represented by the diagram
</p>
<p align="center">
<img src="diagrams/aac.gif" width="265" height="73" alt=
"page ---has author---> Ora" border="0" />
</p>
<p>
How would this information be typically be represented in
XML?
</p>
<pre>
<author>
<uri>page</uri>
<name>Ora</name>
</author>
</pre>
<p>
or maybe
</p>
<pre>
<document href="page">
<author>Ora</author>
</document>
</pre>
<p>
or maybe
</p>
<pre>
<document>
<details>
<uri>href="page"</uri>
<author>
<name>Ora</name>
</author>
</details>
</document>
</pre>
<p>
or maybe
</p>
<pre>
<document>
<author>
<uri>href="page"</uri>
<details>
<name>Ora</name>
</details>
</author>
</document>
<document href="http://www.w3.org/test/page" author="Ora" />
</pre>
<h2>
The XML Graph
</h2>
<p>
These are all perfectly good XML documents - and to a person
reading then they mean the same thing. To a machine parsing
them, they produce different XML trees. Suppose you look at
the XML tree
</p>
<pre>
<v>
<x>
<y> a="ppppp"</y>
<z>
<w>qqqqq</w>
</z>
</x>
</v>
</pre>
<p>
It's not so obvious what to make of it. The element names
were a big hint for a human reader.
</p>
<p>
<b>Without looking at the schema</b>, you know things about
the document structure, but nothing else. You can't tell what
to deduce. You don't know whether <i>ppppp</i> is a <i>y</i>
of <i>qqqqq</i>, or <i>qqqqq</i> is a <i>z</i> of
<i>ppppp</i> or what. You can't even really tell what real
questions can be asked. A source of some confusion is that in
the xyz example above, there are lots of questions you
<i>can</i> ask. They are questions like,
</p>
<ul>
<li>Is there a w element within a details element?
</li>
<li>What is the content of the w element within the first x
element?
</li>
<li>What is the content of the w element following the first
y element which contains an x element whose a attribute is
"pppp"?
</li>
<li>and so on.
</li>
</ul>
<p>
These are all questions about the <i>document</i>. If you
know the document schema (a big <i>if</i>) , and if that
schema it only gives you a limited number of ways of
expressing the same thing (another big <i>if</i>) , then
asking these questions can be in fact equivalent to asking
questions like
</p>
<ul>
<li>What is the author of <i>page</i>?
</li>
</ul>
<p>
This is hairy. It is possible because there is a mapping from
XML documents to semantic graphs. In brief, it is hairy
because
</p>
<ul>
<li>The mapping is many to one
</li>
<li>You need a schema to know what the mapping is
</li>
<li>(The schemas we are talking about for XML at the moment
do not include that anyway and would have to have a whole
inference language added)
</li>
<li>The expression you need for querying something in terms
of the XML tree is necessarily more complicated than the
expression you need for querying something in terms of the
RDF tree.
</li>
</ul>
<p>
This last is a big one. If you try to write down the
expression for the author of a document where the information
is in some arbitrary XML schema, you can probably do it
though it may or may not be very pretty. If you try to
combine more than one property into a combined expression,
(give me a list of books by the same author as this one),
saying it in XML gets too clumsy to consider.
</p>
<p>
(Think of trying to define the addition of numbers by regular
expression operations on the strings. Its possible for
addition. When you get to multiplication it gets ridiculous -
to solve the problem you would end up reinventing numbers as
a separate type.)
</p>
<p>
Looking at the simple XML encoding above,
</p>
<pre>
<author>
<uri>page</uri>
<name>Ora</name>
</author>
</pre>
<p>
it could be represented as a graph
</p>
<p>
<img src="diagrams/xml1.gif" alt=
"A graph of the XML tree with 3 element nodes each with name and some with content"
width="" height="0" />
</p>
<p>
We can represent the tree more concisely if we make a
shorthand by writing the name of each element inside its
circle:
</p>
<p>
<img src="diagrams/aab.gif" width="" height="0" />
</p>
<p>
Of course the RDF tree which this represents (although it
isn't obvious from the XML tree except to those who know) is
</p>
<p align="center">
<img src="diagrams/aac.gif" width="265" height="73" alt=
"page ---has author---> Ora" border="0" />
</p>
<p>
Here we have made a shorthand again by putting making the
label for each part its URI.
</p>
<p>
The complexity of querying the XML tree is because there are
in general a large number of ways in which the XML maps onto
the logical tree, and the query you write has to be
independent of the choice of them. So much of the query is an
attempt to basically convert the set of all possible
representations of a fact into one statement. This is just
what RDF does. It gives you some standard ways of writing
statements so that however it occurs in a document, they
produce the same effect in RDF terms. The same RDF tree
results from many XML trees.
</p>
<p>
Wouldn't it be nice if we could label our XML so that when
the parser read it, it could find the assertions (triples)
and distinguish their subjects and objects, so as to just
deduce the logical assertions without needing RDF? This is
just what RDF does, though.
</p>
<h2>
The RDF Graph
</h2>
<p>
In fact RDF is very flexible - it can represent this triple
in many ways in XML so as to be able to fit in with
particular applications, but just to pick one way, you could
write the above as
</p>
<pre>
<Description about="http://www.w3.org/test/page" Author ="Ora" />
</pre>
<p>
I have missed out the stuff about namespaces. In fact as
anyone can create or own the verbs, subjects and objects in a
distributed Web, any term has to be identified by a URI
somehow. This actual real example works out to in real life
more like
</p>
<pre>
<?xml version="1.0"?>
<Description
xmlns="http://www.w3.org/TR/WD-rdf-syntax#"
xmlns:s="http://docs.r.us.com/bibliography-info/"
about="http://www.w3.org/test/page"
s:Author ="http://www.w3.org/staff/Ora" />
</pre>
<p>
You can think that the "description" RDF element gives the
clue to the parser as to how to find the subjects, objects
and verbs in what follows.
</p>
<p>
This is pretty much the most shorthand way of using the base
RDF in XML. There are others which are longer, but more
efficient when you have, for instance, sets of many
properties of the same object. The useful thing is that of
course they all convey the same triple
</p>
<p align="center">
<img src="diagrams/aac.gif" width="265" height="73" alt=
"page ---has author---> Ora" border="0" />
</p>
<p>
It is a mess when you use questions about a document to try
to ask questions about what the document is trying to convey.
It will work. In a way. But flagging the grammar explicitly
(RDF syntax is a way of doing this) is a whole lot better.
</p>
<p>
Things you can do with RDF which you can't do with XML
include
</p>
<ul>
<li>You can parse the semantic tree, which end up giving you
a set of (possibly mutually referential) triples and then you
can use the ones you want ignoring the ones you don't
understand.
</li>
</ul>
<p>
Problems with basing you understanding on the structure
include
</p>
<ul>
<li>Without having gone to the trouble of getting the schema,
or having an application hand-programmed to recognise a
particular document type, you can't pick up any semantic
information from a document;
</li>
<li>When an XML schema changes, it could typically introduce
new intermediate elements (like "details" in the tree above
or "div" is HTML). These may or may or may not invalidate any
query which has been based on the structure of the document.
</li>
<li>If you haven't gone to the trouble of making a semantic
model, then you may not have a well defined one.
</li>
</ul>
<p>
I'll end this with some examples of the last problem. Clearly
they can be avoided by good design even in an XML system
which does not use RDF. Using RDF makes things easier.
</p>
<h2>
Get it right
</h2>
<p>
If you haven't gone to the trouble of making a semantic
model, then you may not have a well defined one. What does
that mean? I can give some general examples of ambiguities
which crop up in practice. In RDF, you need a good idea about
what is being said about what, and they would tend not to
arise.
</p>
<p>
Look at a label on the jam jar which says: "Expires 1999".
What expires: the label, or the jam? Here the ambiguity is
between a statement about a statement about a document, and a
statement about a document.
</p>
<p>
Another example is an element which qualifies another
apparently element. When information is assembled in a set of
independently thrown in records often ambiguities can arise
because of the lack of logic. HTTP headers (or email headers)
are a good example. These things can work when one program
handles all the records, but when you start mixing records
you get trouble. In XML it is all too easy to fall into the
trap of having two elements, one describing the author, and a
separate one as a flag that the "author" element in fact
means not the direct author but that of a work translated to
make the book in question. Suddenly, the "author" tag, which
used to allow you to conclude that the author of a finnish
document must speak finnish, now can be invalidated by an
element somewhere else on the record.
</p>
<p>
Another symptom of a specification where the actual semantics
may not be as obvious as as first sight is ordering. When we
hear that the order of a set of records is important, but the
records seem to be defined independently, how can that be?
Independent assertions are always valid taken individually or
in any order. In a server configuration file, for example, he
statement which looks like "any member has access to the
page" might really mean "any member has access to the page
unless there is no other rule in this file which has matched
the page". That isn't what the spec said, but it did mention
that the rules were processed in order until one applied.
Represented logically, in fact there is a large nested
conditional. There is implicit ordering when mail headers
say, "this message is encrypted", "this message is
compressed", "this message is ASCII encoded", "this message
is in HTML". In fact the message is an ASCII encoded version
of an encrypted version of a compressed version of a message
in HTML. In email headers the logic of this has to be written
into the fine print of the specification.
</p>
<h2>
Order in documents
</h2>
<p>
There is something fundamentally different between giving a
machine a knowledge tree, and giving a person a document. A
document for a person is generally serialized so that, when
read serially by a human being, the result will be to build
up a graph of associations in that person's head. The order
is important.
</p>
<p>
For a graph of knowledge, order is not important, so long as
the nodes in common between different statements are
identified consistently. (There are concepts of ordered lists
which are important although in RDF they break down at the
fine level of detail to an unordered set of statements like
"The first element of L is x", the "third element of L is z",
etc so order disappears at the lowest level.). In
machine-readable documents a list of ostensibly independent
statements where order is important often turn out to be
statements which are by no means independent.
</p>
<p>
Some people have been reluctant to consider using an RDF tree
because they do not wish to give up the order, but my
assumption is that this is from constraints on processing
human readable documents. These documents are typically not
ripe for RDF conversion anyway.
</p>
<p>
Conclusion:
</p>
<p>
Sometimes it seems there is a set of people for whom the
semantic web is the only graph which they would consider, and
another for whom the document tree (or graph if you include
links) is all they would consider. But it is important to
recognise the difference.
</p>
<hr />
<p>
In this series:
</p>
<ul>
<li>
<a href="RDFnot.html"><i>What the Semantic Web is
not</i></a> - answering some FAQs of the unconvinced.
</li>
<li>
<a href="Evolution.html">Evolvability</a>: properties of
the language for evolution of the technology
</li>
<li>
<a href="Architecture.html">Web Architecture from 50,000
feet</a>
</li>
</ul>
<h2>
Not put in yet:
</h2>
<p>
<i>.@@@ RDF does not have to be serialized in XML but ...</i>
</p>
<hr />
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
</body>
</html>