LinkedData.html
26 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<meta content="text/html; charset=us-ascii" http-equiv=
"Content-Type" />
<title>
Linked Data - Design Issues
</title>
<link rel="Stylesheet" href="di.css" type="text/css" />
<link rel="icon" href="diagrams/lod/597992118v2_64x64_Back.png"/>
<meta http-equiv="Content-Type" content="text/html" />
</head>
<body style=
"color: rgb(0, 0, 0); background-color: rgb(221, 255, 221);">
<address>
Tim Berners-Lee<br />
Date: 2006-07-27, last change: $Date: 2009/06/18 18:24:33
$<br />
Status: personal view only. Editing status: imperfect but
published.
</address>
<p>
<a href="./">Up to Design Issues</a>
</p>
<hr /> <!-- http://www.cafepress.co.uk/w3c_shop.480759174 http://www.cafepress.com/+shirt,480756337 -->
<a href="http://www.cafepress.com/w3c_shop">
<img alt="Get a 5* mug" border="none" src="diagrams/lod/597992118v2_350x350_Back.jpg" align="right"/>
</a>
<h1>
Linked Data
</h1>
<p>
The Semantic Web isn't just about putting data on the web. It
is about making links, so that a person or machine can
explore the web of data. With linked data, when you
have some of it, you can find other, related, data.
</p>
<p>
Like the web of hypertext, the web of data is constructed
with documents on the web. However, unlike the web of
hypertext, where links are
relationships anchors in hypertext documents written in
<small>HTML</small>, for data they links between
arbitrary things described by <small>RDF</small>,. The
<small>URI</small>s identify any kind of object or
concept. But for <small>HTML</small> or
<small>RDF</small>, the same expectations apply to make the
web grow:
</p>
<ol>
<li>
<p>
Use <small>URI</small>s as names for things
</p>
</li>
<li>
<p>
Use <small>HTTP</small> <small>URI</small>s so that
people can look up those names.
</p>
</li>
<li>
<p>
When someone looks up a <small>URI</small>, provide
useful information, using the standards (RDF*, SPARQL)
</p>
</li>
<li>
<p>
Include links to other <small>URIs</small>. so that they
can discover more things.
</p>
</li>
</ol>
<p>
Simple. In fact, though, a surprising amount of data
isn't linked in 2006, because of problems with one or more of
the steps. This article discusses solutions to these
problems, details of implementation, and factors affecting
choices about how you publish your data.
</p>
<h2>
The four rules
</h2>
<p>
I'll refer to the steps above as rules, but they are
expectations of behavior. Breaking them does not
destroy anything, but misses an opportunity to make
data interconnected. This in turn limits the ways it
can later be reused in unexpected ways. It is the
unexpected re-use of information which is the value
added by the web.
</p>
<p>
The first rule, to identify things with
<small>URI</small>s, is pretty much understood by most
people doing semantic web technology. If it doesn't use
the universal <small>URI</small> set of symbols, we don't
call it Semantic Web.<br />
<br />
The second rule, to use <small>HTTP</small>
<small>URI</small>s, is also widely understood.
The only deviation has been, since the web
started, a constant tendency for people to invent new
<small>URI</small> schemes (and sub-schemes within the
<span style="font-family: monospace;">urn:</span>
scheme) such as <small>LSID</small>s and handles and
<small>XRI</small>s and <small>DOI</small>s and so on, for
various reasons. Typically, these involve not
wanting to commit to the established Domain Name System
(<small>DNS</small>) for delegation of authority but to
construct something under separate control. Sometimes
it has to do with not understanding that <small>HTTP</small>
<small>URI</small>s are names (not addresses) and that
<small>HTTP</small> name lookup is a complex, powerful and
evolving set of standards. This issue discussed at length
elsewhere, and time does not allow us to delve into it here.
[ @@ref TAG finding, etc])
</p>
<p>
The third rule, that one should serve information on the web
against a <small>URI</small>, is, in 2006, well followed for
most ontologies, but, for some reason, not for some major
datasets. One can, in general, look up the
properties and classes one finds in data, and get information
from the <small>RDF</small>, <small>RDFS</small>, and
<small>OWL</small> ontologies including the relationships
between the terms in the ontology.
</p><p>
The basic format here for RDF/XML, with its popular alternative
serialization N3 (or Turtle). Large datasets provide
a SPARQL query service, but the basic linked data should
br provided as well.
</p>
<p>
Many research and evaluation projects in the few years of the
Semantic Web technologies produced ontologies, and
significant data stores, but the data, if available at all,
is buried in a zip archive somewhere, rather than being
accessible on the web as linked data. The Biopax
project, the CSAktive data on computer science research
people and projects were two examples. [The CSAktive data is
now (2007) available as linked data]
</p>
<p>
There is also a large and increasing amount of
<small>URI</small>s of non-ontology data which can be looked
up. <a href=
"http://ontoworld.org/wiki/Semantic_wiki">Semantic wikis</a>
are one example. The "Friend of a friend"
(<small>FOAF</small>) and <span style=
"font-style: italic;">Description of a Project</span>
(<small>DOAP</small>) ontologies are used to build social
networks across the web. Typical <a href=
"http://en.wikipedia.org/wiki/List_of_social_networking_websites">
social network portals</a> do not provide links to other
sites, nor expose their data in a standard form.
</p>
<p>
LiveJournal and Opera Community are two portal web sites
which do in fact publish their data in <small>RDF</small> on
the web. (Plaxo has a trail scheme, and I'm not sure
whether they support <span style=
"font-style: italic;">knows</span> links). This means that I
can write in my <small>FOAF</small> file that I know
Håkon Lie by using his <small>URI</small> in the Opera
Community data, and a person or machine browsing that data
can then follow that link and find all his friends.
<i>[Update:]</i>
Also, the Opera Community site allows you to register
the RDF URI for yourelf on another site. This means
that public data about you from different sites can be linked
together into one web, and a person or machine starting with
your Opera identity can find the others.
<!--
Well, all of his friends? Not really: only his
friends who are in the Opera Community. The system
doesn't yet him store the <small>URI</small>s of people on
different systems. So while the social network is open to
incoming links, and while it is internally browseable, it
doesn't make outgoing links.
-->
</p>
<p>
The fourth rule, to make links elsewhere, is necessary
to connect the data we have into a web, a serious, unbounded
web in which one can find al kinds of things, just as
on the hypertext web we have managed to build.
</p>
<p>
In hypertext web sites it is considered generally rather bad
etiquette not to link to related external material. The
value of your own information is very much a function of what
it links to, as well as the inherent value of the information
within the web page. So it is also in the Semantic Web.
</p>
<p>
So let's look at the ways of linking data, starting with the
simplest way of making a link.
</p>
<h3>
Basic web look-up
</h3>
<p>
The simplest way to make linked data is to use, in one file,
a <small>URI</small> which points into another.
</p>
<p>
When you write an <small>RDF</small> file, say
<http://example.org/smith>, then you can use local
identifiers within the file, say #albert, #brian and
#carol. In N3 you might say
</p>
<pre>
<#albert> fam:child <#brian>, <#carol>.
</pre>
<p>
or in <small>RDF/XML</small>
</p>
<pre>
<rdf:Description about="#albert"<br /> <fam:child rdf:Resource="#brian"><br /> <fam:child rdf:Resource="#carol"><br /></rdf:Description>
</pre>
<p>
The <small>WWW</small> architecture now gives a global
identifier "http://example.org/smith#albert" to Albert.
This is a valuable thing to do, as anyone on the planet
can now use that global identifier to refer to Albert and
give more information.
</p>
<p>
For example, in the
document <http://example.org/jones> someone might
write:
</p>
<pre>
<#denise> fam:child <#edwin>, <smith#carol>.
</pre>
<p>
or in <small>RDF/XML</small>
</p>
<pre>
<rdf:Description about="#denise"<br /> <fam:child rdf:Resource="#edwin"><br /> <fam:child rdf:Resource="http://example.org/smith#carol"><br /></rdf:Description>
</pre>
<p>
<br />
Clearly it is reasonable for anyone who comes across the
identifier 'http://example.org/smith#carol" to:
</p>
<ol>
<li>Form the <small>URI</small> of the document by truncating
before the hash
</li>
<li>Access the document to obtain information about #carol
</li>
</ol>
<p>
We call this dereferencing the <small>URI</small>. This
is basic semantic web.
</p>
<p>
There are several variations.
</p>
<h3>
Variation: URIs without Slashes and HTTP 303
</h3>
<p>
There are some circumstances in which dividing identifiers
into documents doesn't work very well. There may
logically be one global symbol per document per document, and
there is a reluctance to include a # in the
<small>URI</small> such as
</p>
<p>
http://wordnet.example.net/antidisesablishmentarianism#word
</p>Historically, the early Dublin Core and <small>FOAF</small>
vocabularies did not have # in their URIs. In any event
when <small>HTTP</small> <small>URI</small>s without hashes are
used for abstract concepts, and there is a document that
carries information about them, then:<br />
<ol>
<li>An <small>HTTP</small> <small>GET</small> request
on the <small>URI</small> of the concept returns
<span style="font-family: monospace;">303 See Also</span>
and gives in the Location: header, the <small>URI</small>
of the document.
</li>
<li>The document is retrieved as normal
</li>
</ol>
<p>
This method has the advantage that <small>URI</small>s can be
made up of all forms. It has the disadvantage that an
<small>HTTP</small> request mBrowse-ableust be made for every
single one. In the case of Dublin Core, for example,
dc:title and dc:creator etc are in fact served by the same
ontology document, but one does not know until they
have each been fetched and returned HTTP redirections.
</p>
<h3>
Variation: FOAF and rdfs:seeAlso
</h3>
<p>
The <a href=
"http://foaf-project.org/">Friend-Of-A-Friend</a> convention
uses a form of data link, but not using either of the
two forms mentioned above. To refer to another person
in a <small>FOAF</small> file, the convention was to give two
properties, one pointing to the document they are described
in, and the other for identifying them within that document.
</p>
<pre>
<#i> foaf:knows [<br /> foaf:mbox <mailto:joe@example.com>;<br /> rdfs:seeAlso <http://example.com/foaf/joe> ].
</pre>
<p>
Read, "I know that which has email joe@example.com and
about which more information is in
<http://example.com/foafjoe>".
</p>
<p>
In fact, for privacy, often people don't put their email
addresses on the web directly, but in fact put a one-way hash
(<small>SHA-1</small>) of their email address and give that.
This clever trick allows people who know their email address
already to work out that it is the same person, without
giving the email away to others.
</p>
<pre>
<#i> foaf:knows [<br /> foaf:mbox_sha1sum "2738167846123764823647"; # @@ dummy<br /> rdfs:seeAslo <http://example.com/foaf/joe> ].
</pre>
<p>
This linking system was very successful, forming a
growing social network, and dominating, in 2006, the
linked data available on the web.
</p>
<p>
However, the system has the snag that it does not give
<small>URI</small>s to people, and so basic links to them
cannot be made.
</p>
<p>
I recommend (e.g in weblogs on <a href=
"http://dig.csail.mit.edu/breadcrumbs/node/62">Links on the
Semantic Web</a> , <a href=
"http://dig.csail.mit.edu/breadcrumbs/node/71">Give yourself
a URI</a>, and and <a href=
"http://dig.csail.mit.edu/breadcrumbs/node/72">Backward and
Forward links in RDF just as important</a>) that those making
a <small>FOAF</small> file give themselves a
<small>URI</small> as well as using the <small>FOAF</small>
convention. Similarly, when you refer to a
<small>FOAF</small> file which gives a
<small>URI</small> to a person, use it in your reference to
that person, so that clients which just use
<small>URI</small>s and don't know about the
<small>FOAF</small> convention can follow the link.
</p>
<h2>
Browsable graphs
</h2>So now we have looked at ways of making a link, let's look
at the choices of when to make a link.<br />
<p>
One important pattern is a set of data which you can explore
as you go link by link by fetching data. Whenever one
looks up the URI for a node in the RDF graph, the server
returns information about the arcs out of that node, and the
arcs in. In other words, it returns any RDF statements
in which the term appears as either subject or object.
</p>
<p>
Formally, call a graph G <span style=
"font-style: italic;">browsable</span> if, for the URI
of any node in G, if I look up that URI I will be returned
information which describes the node, where describing a node
means:
</p>
<ol>
<li>Returning all statements where the node is a subject or
object; and
</li>
<li>Describing all blank nodes attached to the node by one
arc.
</li>
</ol><br />
<p class="detail">
(The subgraph returned has been referred to as "minimum
Spanning Graph (MSG [@@ref] ) or RDF molecule [@@ref],
depending on whether nodes are considered identified if they
can be expressed as a path of function, or reverse inverse
functional properties. A concise bounded description, which
only follows links from subject to object, does not
work.)
</p>
<p>
In practice, when data is stored in two documents, this means
that any <small>RDF</small> statements which relate things in
the two files must be repeated in each. So, for
example, in my <small>FOAF</small> page I mention that I am a
member of the <small>DIG</small> group, and that information
is repeated on the <small>DIG</small> group data. Thus,
someone starting from the concept of the group can also find
out that I am a member. In fact, someone who starts off
with my <small>URI</small> can find all the people who are in
the same group.
</p>
<h3>
Limitations on browseable data
</h3>
<p>
So statements which relate things in the two documents must
be repeated in each. This clearly is against the first rule
of data storage: don't store the same data in two different
places: you will have problems keeping it consistent.
This is indeed an issue with browsable data. A
set of of completely browsable data with links in both
directions has to be completely consistent, and that takes
coordination, especially if different authors or different
programs are involved.
</p>
<p>
We can have completely browsable data, however, where it is
automatically generated. The <a href=
"http://dig.csail.mit.edu/2006/dbview/dbview.py">dbview</a>
server, for example, provides a browsable virtual
documents containing the data from any arbitrary
relational database.
</p>
<p>
When we have a data from multiple sources, then we have
compromises. These are often settled by common sense,
asking the question,
</p>
<blockquote>
<p>
"If someone has the URI of that thing, what relationships
to what other objects is it useful to know about?"
</p>
</blockquote>
<p>
Sometimes, social questions determine the answer.
I have links in my <small>FOAF</small> file that I know
various people. They don't generally repeat that
information in their <small>FOAF</small> files. Someone may
say that they know me, which is an assertion which, in the
<small>FOAF</small> convention, is theirs to assert, and the
reader's to trust or not.
</p>
<p>
Other times, the number of arcs makes it impractical.
A <small>GPS</small> track gives thousands of times at which
my latitude, longitude are known. Every person loading my
<small>FOAF</small> file can expect to get my business card
information, but not all those trackpoints. It is reasonable
to have a pointer from the track (or even each point) to the
person whose position is represented, but not the other
way.
</p>
<p>
One pattern is to have links of a certain property in a
separate document. A person's homepage doesn't list
all their publications, but instead puts a link to it a
separate document listing them. There is an
understanding that <span style=
"font-family: monospace;">foaf:made</span> gives a work of
some sort, but <span style=
"font-family: monospace;">foaf:pubs</span> points to a
document giving a list of works. Thus, someone
searching for something <span style=
"font-family: monospace;">foaf:made</span> link would do well
to follow a <span style=
"font-family: monospace;">foaf:pubs</span> link. It
might be useful to formalize the notion with a statement like
</p>
<pre>
foaf:made link:listDocumentProperty foaf:pubs.
</pre>
<p>
in one of the ontologies.
</p>
<h3>
Query services
</h3>
<p>
Sometimes the sheer volume of data makes serving it as lots
of files possible, but cumbersome for efficient remote
queries over the dataset. In this case, it seems
reasonable to provide a <small>SPARQL</small> query service.
To make the data be effectively linked, someone who
only has the <small>URI</small> of something must be
able to find their way the <small>SPARQL</small>
endpoint.
</p>
<p>
Here again the <small>HTTP</small> 303 response can be used,
to refer the enquirer to a document with metadata about
which query service endpoints can provide what information
about which classes of <small>URI</small>s.
</p>Vocabularies for doing this have not yet been
standardized.<br />
<h2>Is your Linked Open Data 5 Star?</h2>
(Added 2010).
This year, in order to encourage people -- especially
government data owners -- along the road
to good linked data, I have developped this star rating system.
</p><p>
Linked Data is defined above. Linked <em>Open</em> Data (LOD) is
Linked Data which is released under an open licence, which
does not impede its reuse for free. Creative Commons CC-BY is an example
open licence, as is the UK's <a href="http://www.nationalarchives.gov.uk/doc/open-government-licence/">
Open Government Licence</a>.
Linked Data does not of course in general have to be open -- there is a
lot of important use of lnked data internally, and for personal and group-wide
data. You can have 5-star Linked Data without it being open.
However, if it claims to be Linked Open Data then it does have to be open,
to get any star at all.
<p></p>
Under the star scheme, you get one (big!) star if the information
has been made public at all, even if it is a photo of a scan of
a fax of a table -- if it has an open licence.
The you get more stars as you make it progressively more
powerful, easier for people to use.
<p>
<style>
.stars {color: gold; font-size: 18pt; text-align: right; margin-right: 20px;}
</style>
<table>
<tr>
<td class="stars">★</td>
<td>Available on the web (whatever format) <i>but with an open licence, to be Open Data</i></td>
</tr>
<tr>
<td class="stars">★★</td>
<td>Available as machine-readable structured data (e.g. excel instead of image scan of a table)</td>
</tr>
<tr>
<td class="stars">★★★</td>
<td> as (2) plus non-proprietary format (e.g. CSV instead of excel)</td>
</tr>
<tr>
<td class="stars">★★★★</td>
<td>All the above plus, Use open standards from W3C (RDF and SPARQL)
to identify things, so that people can point at your stuff</td>
</tr>
<tr>
<td class="stars">★★★★★</td>
<td>All the above, plus: Link your data to other people’s data to provide context</td>
</tr>
</table>
<p>
How well does your data do? You can buy <a href="http://www.cafepress.co.uk/w3c_shop.480759174">
5 star data mugs</a>, T-shirts
and bumper stickers from the W3C shop at cafepress: use them to
get your colleages and fellows conference-goers thinking 5 star linked data.
(Profits also help W3C :-).
</p>
<p>
Now in 2010, people have been pressing me, for governmet data,
to add a new requirement, and that is there should be metadata about the
data itself, and that that metadata should be
availble from a major catalog. Any open dataset (or even datasets which are not
but should be open) can be regisetreed at ckan.net.
Government datasets from the UK and US hsould be regisetred at
data.gov.uk or data.gov respectively.
Other copuntries I expect to develop their own registries.
Yes, there should be metadata about your dataset.
That may be the subject of a new note in this series.
</p>
<h2>
Conclusion
</h2><br />
<p>
Linked data is essential to actually connect the semantic
web. It is quite easy to do with a little thought, and
becomes second nature. Various common sense
considerations determine when to make a link and when not to.
</p>
<p>
The <a href=
"http://dig.csail.mit.edu/2005/ajar/ajaw/tab">Tabulator</a>
client (running in a suitable browser) allows you to
browse linked data using the above conventions, and can be
used to check that your linked data works.
</p>
<p>
References
</p>
<p>
[Ding2005] Li Ding, et. al., <a href=
"http://ebiquity.umbc.edu/paper/html/id/240/"><span style=
"font-style: italic;">Tracking RDF Graph Provenance using RDF
Molecules</span></a>, UMBC Tech Report TR-CS-05-06
</p>
<hr />
<h2>
Followup
</h2>
<p>
2006-02 Rob Crowell adapts Dan Connolly's DBView (2004) which
maps SQL data into linked RDF, adding backlinks.
</p>
<p>
2006-09-05 Chris Bizer et al adapt <a href=
"http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/">D2R
Server</a> to provide a linked data view of a database.
</p>
<p>
2006-10-10 Chris Bizer et al produce the <a href=
"http://sites.wiwiss.fu-berlin.de/suhl/bizer/ng4j/semwebclient/">
Semantic Web Client Library</a>, "Technically, the library
represents the Semantic Web as a single Jena RDF graph or
Jena Model." The code feteches web documents as needed to
answer queries.
</p>
<p>
2007-01-15 Yves Raimond has produced a <a href=
"http://moustaki.org/swic/">Semantic Web client for SWI
prolog</a> wit similar functionality.
</p>
<p>I have a talk at the 2009 O'Reilly eGovernment 2.0 conference
in Washington DC, talking about "Just a Bag of Chips" @@ref, and talking about the 5 star scheme.
Following that,
From InkDroid blogged summary (and CSS) of my 5 star sceheme adapted here
</p>
<hr />
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<p>
<a href="../People/Berners-Lee">Tim BL</a>
</p>
</body>
</html>