index.html
20.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xml:lang="en-US" xmlns="http://www.w3.org/1999/xhtml" lang="en-US">
<head>
<title>W3C RDB2RDF Incubator Group Report</title>
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-XGR" />
</head>
<body>
<div class="head">
<p><a href="http://www.w3.org/"><img src="http://www.w3.org/Icons/w3c_home" alt="W3C" height="48" width="72" /></a><a href="http://www.w3.org/2005/Incubator/XGR/"><img height="48" width="160" alt="W3C Incubator Report" src="http://www.w3.org/2005/Incubator/images/XGR" /></a></p>
<h1><a name="title" id="title"></a>W3C RDB2RDF Incubator Group Report</h1>
<h2><a name="w3c-doctype" id="w3c-doctype"></a>W3C Incubator Group Report 26 January 2009</h2>
<dl>
<dt>
This version:
</dt>
<dd>
<a href="http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf-20090126/">http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf-20090126/</a>
</dd>
<dt>
Latest version:
</dt>
<dd>
<a href="http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf/">http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf/</a>
</dd>
<dt>
Previous version:
</dt>
<dd>
This is the first public version
</dd>
<dt>
Editor:
</dt>
<dd>
Ashok Malhotra, Oracle
</dd>
</dl>
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
© <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup>
(<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>,
<a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
<a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>
</div>
<hr />
<h2><a name="abstract" id="abstract"></a>Abstract</h2>
<p>This is the final report from the RDB2RDF XG.
The XG recommends that the W3C initiate a WG to standardize a language for
mapping Relational Database schemas into RDF and OWL. </p>
<h2><a id="status" name="status">Status of This Document</a></h2>
<p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of <a href="/2005/Incubator/XGR/">Final Incubator Group Reports</a> is available. See also the <a href="http://www.w3.org/TR/">W3C technical reports index</a> at http://www.w3.org/TR/.</em></p>
<p>Publication of this document by W3C as part of the <a href="http://www.w3.org/2005/Incubator/">W3C Incubator Activity</a> indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. Participation in Incubator Groups and publication of Incubator Group Reports at the W3C site are benefits of <a href="http://www.w3.org/Consortium/join">W3C Membership</a>.</p>
<p>Incubator Groups have as a <a href="http://www.w3.org/2005/Incubator/procedures.html#Patent">goal</a> to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy. Participants in this Incubator Group have made no statements about whether they will offer licenses according to the <a href="http://www.w3.org/Consortium/Patent-Policy-20030520.html#sec-Requirements">licensing requirements of the W3C Patent Policy</a> for portions of this Incubator Group Report that are subsequently incorporated in a W3C Recommendation.</p>
<p>This is the final recommendation from the RDB2RDF XG.</p>
<div class="toc">
<h2 class="notoc">
<a id="contents" name="contents">Table of Contents</a>
</h2>
<ul id="toc" class="toc">
<li class="tocline"><a href="#recommendation"><b>1. Recommendation</b></a>
<ul class="toc">
<li class="tocline"><a href="#usecases"><b>1.1 Use Cases</b></a>
<ul class="toc">
<li class="tocline"><a href="#biomedical">1.1.1 Integrating Databases to Research Nicotine Dependency</a></li>
<li class="tocline"><a href="#triplify">1.1.2 Triplify: Exposing Relational Data on the Web</a></li>
<li class="tocline"><a href="#enterprise">1.1.3 Integration of Enterprise Information Systems</a></li>
<li class="tocline"><a href="#ordnance">1.1.4 Ordnance Survey Use Case</a></li>
</ul>
</li>
<li class="tocline"><a href="#liaisons"><b>1.2 Liaisons</b></a></li>
<li class="tocline"><a href="#startingpoints"><b>1.3 Starting Points</b></a></li>
</ul>
</li>
<li class="tocline"><a href="#references"><b>References</b></a></li>
<li class="tocline"><a href="#acknowledgments"><b>Acknowledgments</b></a></li>
</ul>
</div>
<hr /><div class="body"><div class="div1">
<h2><a name="recommendation" id="recommendation"></a>1 Recommendation</h2><p>
The RDB2RDF XG recommends that the W3C initiate a Working Group (WG) to
standardize a language for mapping Relational Database schemas into RDF and
OWL. Such a standard will enable the vast amounts of data stored in
Relational databases to be published easily and conveniently on the Web. It
will also facilitate integrating data from separate Relational databases and
adding semantics to Relational data.</p><p>This recommendation is based on the a survey of the State Of the Art
conducted by the XG <a href="#StateOfArt">[StateOfArt]</a> as well as the usecases
discussed below.</p><p>The mapping language defined by the WG would facilitate the development of
several types of products. It could be used to translate Relational data into
RDF which could be stored in a triple store. This is sometimes called
Extract-Transform-Load (ETL).
Or it could be used to generate a virtual mapping that could be queried using
SPARQL and the SPARQL translated to SQL queries on the underlying Relational
data. Other products could be layered on top of these capabilities to query
and deliver data in different ways as well as to integrate the data with other
kinds of information on the Semantic Web.</p><p>The mapping language should be complete regarding when compared to to the
relational algebra. It should have a human-readable syntax as well as XML and
RDF representations of the syntax for purposes of discovery and machine
generation.</p><p>There is a strong suggestion that the mapping language be expressed in
rules as defined by the W3C <a href="#RIF">[RIF]</a> WG. The syntax does not have
to follow the RIF syntax but should a round-trippable mapping
between mapping language and a RIF dialect.
The output of the mapping should be defined in terms of an RDFS/OWL
schema.</p><p>
It should be possible to subset the language for simple applications such as
Web 2.0. This feature of the language will be validated by creating a library
of mappings for widely used apps such as Drupal, Wordpress, phpBB.</p><p>
The mapping language will allow customization with regard to names and data
transformation. In addition, the language must be able to expose vendor
specific SQL features such as full-text and spatial support and vendor-defined
datatypes.</p><p>
The final language specification should include guidance with regard to
mapping Relational data to a subset of OWL such as OWL/QL or OWL/RL.</p><p>
The language must allow for a mechanism to create identifiers for database
entities. The generation of identifiers should be designed to support the
implementation of the linked data principles <a href="#LinkedData">[LinkedData]</a>. Where
possible, the language will encourage the reuse of public identifiers for
long-lived entities such as persons, corporations, geo-locations, etc. See
<a href="#liaisons"><b>1.2 Liaisons</b></a>.</p><p>
The proposed Working Group will also create a set of test cases that could be
used to verify conformance.</p><div class="div2">
<h3><a name="usecases" id="usecases"></a>1.1 Usecases </h3><p>To bootstrap exploitation of the Web as a globally accessible linked
database, we need a few essentials:</p><ul><li>Web accessible data needs to increase in granularity and cross
linkage.</li><li>Web applications and solutions must produce structured interlinked data as
extensions of existing functionality.</li><li>Web users must be shielded from the underlying complexity of injecting
structured linked data into the Web.</li></ul><div class="div3">
<h4><a name="biomedical" id="biomedical"></a>1.1.1 Integrating Databases to Research Nicotine Dependency</h4><p>
Complex biological queries generally require the integration of information
from several sources. To understand the genetic basis of nicotine dependence,
gene and pathway information needed to be integrated and three complex
biological queries answered using the integrated knowledge base. The gene
information source NCBI Entrez Gene, which has gene-related records of ~2
million genes needed to be integrated with pathway information sources, such
as KEGG (Kyoto Encyclopedia for Genes and Genomics). Comparing results across
model organisms required homology information provided by the NCBI HomoloGene,
containing homology data for several completely sequenced eukaryotic
organisms).</p><p>
An ontology-driven approach was used to integrate the two gene resources
(Entrez Gene and HomoloGene) and the three pathway resources (KEGG, Reactome
and BioCyc). An OWL ontology called the Entrez Knowledge Model (EKoM) was
created for the gene resources and integrated with the extant BioPAX ontology
designed for pathway resources. The integrated schema was populated with data
from the pathway resources, publicly available in BioPAX-compatible format,
and gene resources for which a population procedure was created.
</p><p>
SPARQL was used to formulate queries to investigate the genetic basis of
nicotine dependence over the integrated knowledge base:</p>
<ul>
<li>Which genes participate in a large number of pathways?
</li>
<li>Identify "hub genes" from the perspective of gene interaction?
</li>
<li>Which genes are expressed in the brain, in the context of neurobiology of nicotine dependence and various neurotransmitters in the central nervous system?
</li>
</ul>
<p>
The result was very successful. The queries could easily identify hub genes,
i.e., those genes whose gene products participate in many pathways or interact
with many other gene products. See
<a href="#NicoteneDependence">[NicotineDependence]</a> for details.</p>
</div>
<div class="div3">
<h4><a name="triplify" id="triplify"></a>1.1.2 Triplify: Exposing Relational Data on the Web</h4>
<p>In order to make the Semantic Web useful to ordinary Web users, RDF and OWL
have to be deployed on the Web on a much larger scale. Web applications such
as Content Management Systems, online shops or community applications (e.g.
Wikis, Blogs, Fora) already store their data in relational databases <a href="#triplifypaper">[triplifypaper]</a>. Providing a standardized way to map the relational data
structures behind these Web applications into RDF, RDF-Schema and OWL will
facilitate broad penetration and enrich the Web with RDF data and ontologies
and facilitate novel semantic browsing and search applications.</p><p>By supporting the long tail of Web applications and thus counteracting the
centralization of the Web 2.0 applications the planned RDB2RDF standardization
will help to give control over data back to end-users and thus promote a
democratization of the Web.</p><p>To support this usecase scenario, the mapping language should be easily
implementable for lightweight Web applications and have a shallow learning
curve to foster early adoption by Web developers.</p></div><div class="div3">
<h4><a name="enterprise" id="enterprise"></a>1.1.3 Integration of Enterprise Information Systems</h4><p>
Efficient information and data exchange between application systems within and
across enterprises is of paramount importance in the increasingly networked
and IT-dominated business atmosphere. Existing Enterprise Information Systems
such as CRM, CMS and ERP systems use Relational database backends for
persistence. RDF and Linked Data can provide data exchange and integration
interfaces for such application systems, which are easy to implement and use,
especially in settings where a loose and flexible coupling of the systems is
required.</p><p>Insight can often be gained by integrating data from databses built for
different purposes in separate corporate silos. For example, integrating data
from a bug database with a customer database may help understand ordering
behavior as a function of the bugs encountered.</p><p>
In Supply Chain Management (SCM), for example, it is vital to exchange product
catalogs and other goods related information within a network of
interconnected businesses involved in the ultimate provision of product and
service packages. Such information is stored in relational databases and
sometimes already exchanged electronically, but a variety of different
technologies are used (e.g. proprietary files, XML files, DB dumps, Web
Services etc.). Realizing a completely electronic information flow requires
significant initial investments and currently limits the flexibility of
businesses (e.g. with regard to changes in business partners). The envisioned
RDB2RDF mapping language applied in conjunction with existing RDB based SCM
systems will support the use of RDF and unique identifiers for realizing
flexible information information flows accompanying supply chains.
</p><p>
The mapping language to be standardized by the proposed WG will simplify the
publishing of enterprise data and information from Relational data backends
and, thus, facilitate the interlinking and exchange of information between
business information systems. In this scenario on-demand transformation of
relational data to RDF, scalability and completeness with regard to the
relational algebra are central requirements. </p></div>
<div class="div3">
<h4><a name="ordnance" id="ordnance"></a>1.1.4 Ordnance Survey Use Case</h4>
<p>
Ordnance Survey, the National mapping agency of the UK, operates a very
large geographical information system based on Oracle Spatial.
The database contains topographical features, soil type and land use
information. All these types of information are independently
maintained and use separate terminologies. They describe the same land area
but the boundaries of objects utilized for representing land
use and soil type and topography do not coincide: For example, a pasture
might consist of two distinct types of soil.</p><p>An example of a need to integrate this information is modeling filtration
of
pollutants into water bodies from agricultural land. The soil
type determines the degree of filtration, the land use determines the type
of pollutant. Topography determines whether
the field is next to a water body.</p><p>An ontology exists for describing the types of objects in each database.
The benefit from mapping the data to RDF is in simplifying querying and
integration of the data. The very high volume of data makes an ETL
approach impracticable, besides, the Oracle Spatial database offers spatial
joining which is generally not available on RDF stores.</p><p>
Thus, it is necessary to take SPARQL queries expressed in terms of the land
use, soil type and topography ontologies and convert them into
single SQL statements, with all joining and filtering to take place at the
relational database. In the process, high level concepts need to be
translated into SQL conditions on data that is not readily human readable.</p><p>
Business questions to be answered by the use case are for example:</p><ul><li>What is the total length of river bank bordered by permeable soil used for
grazing along a certain river?</li><li>What types of crops are being cultivated within 100m of water, with total
land use grouped by crop.</li><li>What watter bodies are subject to high environmental load from
agriculture, as defined by little current and extensive use of adjacent
land.</li></ul><p>
From the viewpoint of RDB to RDF mapping, this usecase highlights the need to
integrate data from different databases, built for different purposes. It
also
emphasizes need for extensibility in the mapping language for supporting RDBMS
vendor
specific features. In the present case, Oracle expresses a spatial join
using a special type of derived table not found
in standard SQL, thus the customization need is deeper than just supporting
calls to native SQL functions.</p><p>
The inference requirement consists primarily of expanding class membership
into and's and or's of conditions on the relational data. In
some cases, these conditions are spatial, such as bordering on or contained
in. The user should be familiar with the ontologies but
should not have to know about the classification codes used in the
databases.
</p></div>
</div>
<div class="div2">
<h3><a name="liaisons" id="liaisons"></a>1.2 Liaisons</h3>
<p>
The WG must track the evolution of SPARQL and liaise with the DAWG WG as well
as the OWL WG. The proposed WG will also keep track of work on assigning
unique identifiers to well-known entities such as the ENS system associated
with the OKKAM project
<a href="#okkam">[OKKAM]</a> and the Common Naming Project started by Neuro Commons
<a href="#CommonNaming">[Common Naming Project]</a></p></div>
<div class="div2">
<h3><a name="startingpoints" id="startingpoints"></a>1.3 Starting Points</h3><p>
The WG will take as its starting point the mapping languages developed by the
<a href="#D2RQ">[D2RQ]</a> and <a href="#Virtuoso">[Virtuoso]</a> efforts.</p></div></div>
<div class="div1">
<h2><a name="references" id="references"></a>References</h2>
<dl><dt class="label"><a name="CommonNaming" id="CommonNaming"></a>Common Naming Project</dt><dd>
<a href="http://neurocommons.org/page/Common_Naming_Project"><cite>Neuro Commons Common Naming Project
</cite></a>, Science Commons, Sept 17, 2008.
(See http://neurocommons.org/page/Common_Naming_Project.)</dd><dt class="label"><a name="D2RQ" id="D2RQ"></a>D2RQ</dt><dd>
<a href="http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/"><cite> The D2RQ Platform v0.5.1, User Manual and Language
Specification
</cite></a>, Chris Bizer, Richard Cyganiak, Jorg Garbers, Oliver Maresch
(See http://www4.wiwiss.fu-berlin.de/bizer/D2RQ/spec/.)</dd><dt class="label"><a name="RIF" id="RIF"></a>RIF</dt><dd>
<a href="http://www.w3.org/2005/rules/wiki/RIF_Working_Group"><cite>W3C Rule Interchange Format Working Group</cite></a>
(See http://www.w3.org/2005/rules/wiki/RIF_Working_Group.)</dd><dt class="label"><a name="LinkedData" id="LinkedData"></a>LinkedData</dt><dd>
<a href="http://www.w3.org/DesignIssues/LinkedData.html"><cite>Design Issues for Linked Data</cite></a>, Tim Berners-Lee
(See http://www.w3.org/DesignIssues/LinkedData.html.)</dd>
<dt class="label"><a name="StateOfArt" id="StateOfArt"></a>StateOfTheArtSurvey</dt>
<dd><a href="http://www.w3.org/2005/Incubator/rdb2rdf/RDB2RDF_SurveyReport.pdf"><cite>Mapping Relational Data to RDF and OWL: A Literature
Survey</cite></a>, Satya Sahoo, Wolfgang Halb</dd>
<dt class="label"><a name="okkam" id="okkam"></a>OKKAM</dt><dd>
<a href="http://www.okkam.org/"><cite>An Entity Name System (ENS)
for the Semantic Web</cite></a>, Paolo Bouquet, Heiko Stoermer, Barbara
Bazzanella, January 2008.
(See http://www.okkam.org/.)</dd>
<dt class="label"><a name="Virtuoso" id="Virtuoso"></a>Virtuoso</dt>
<dd>
<a href="http://virtuoso.openlinksw.com"><cite>Virtuoso Meta Schema Language</cite></a> (See:
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSSQL2RDF)
</dd>
<dt class="label"><a name="triplifypaper" id="triplifypaper"></a>Triplify
</dt><dd>
<a href="http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf"><cite>Triplify - Lightweight Linked Data Publication from Relational
Databases, submitted to WWW 2009
</cite></a>Auer, Dietzold, Lehmann, Hellmann, Aumueller
(See http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf.)</dd><dt class="label"><a name="NicoteneDependence" id="NicoteneDependence"></a>NicoteneDependence</dt><dd>
<a href="http://dx.doi.org/10.1016/j.jbi.2008.02.006 "><cite>An ontology-driven semantic mashup of gene and biological
pathway information: Application to the domain of nicotine dependence
</cite></a>Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J.
Skinner and Amit P. Shetha (See http://dx.doi.org/10.1016/j.jbi.2008.02.006 .)</dd></dl></div>
<h2>
<a id="acknowledgments" name="acknowledgments">Acknowledgments</a>
</h2>
<p>
The editor would like to thank the members of the RDB2RDF XG who have contributed to the ideas in this report.
We would also like to thank the guests who have come and presented their work to the RDB2RDF XG.
</p>
</div>
</body></html>