RDB-RDF.html 21.9 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <meta http-equiv="content-type" content=
    "text/html; charset=us-ascii" />
    <title>
      Relational Databases and the Semantic Web (in Design Issues)
    </title>
    <style type="text/css">
/*<![CDATA[*/

    .work {background-color: #FFFFC1}
    /*]]>*/
    </style>
    <link href="di.css" rel="stylesheet" type="text/css" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000">
    <address>
      Tim Berners-Lee Created
      <p>
        <small>Date: September 1998.</small>
      </p>
    </address>
    <p>
      $Id: RDB-RDF.html,v 1.25 2009/08/27 21:38:09 timbl Exp $
    </p>
    <address>
      <p>
        Status: . Editing status: Comments please. An parenthetical
        discussion to the <a href="Architecture.html">Web
        Architecture at 50,000 feet</a>. and the <a href=
        "Semantic.html">Semantic Web roadmap</a>.
      </p>
    </address>
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <hr />
    <h1>
      Relational Databases on the Semantic Web
    </h1>
    <p>
      There are many other data models which RDF's Directed
      Labelled Graph (DLG) model compares closely with, and maps
      onto. See a summary in
    </p>
    <ul>
      <li>
        <a href="RDFnot.html">What the Semantic Web can
        represent</a>
      </li>
    </ul>
    <p>
      One is the Relational Database (RDB) model.
    </p>
    <h2>
      <a name="ER" id="ER">The Semantic Web and Entity-Relationship
      models</a>
    </h2>
    <p>
      Is the RDF model an entity-relationship mode? Yes and no. It
      is great as a basis for ER-modelling, but because RDF is used
      for other things as well, RDF is more general. RDF is a model
      of entities (nodes) and relationships. If you are used to the
      "ER" modelling system for data, then the RDF model is
      basically an openning of the ER model to work on the Web. In
      typical ER model involved entity types, and for each entity
      type there are a set of relationships (slots in the typical
      ER diagram). The RDF model is the same, except that
      relationships are first class objects: they are identified by
      a URI, and so anyone can make one. Furthurmore, the set of
      slots of an object is not defined when the class of an object
      is defined. The Web works though anyone being (technically)
      allowed to say anything about anything. This means that a
      relationship between two objects may be stored apart from any
      other information about the two objects. This is different
      from object-oriented systems often used to implement ER
      models, which generally assume that information about an
      object is stored in an object: the definition of the class of
      an object defines the storage implied for its properties.
    </p>
    <p>
      For example, one person may define a vehicle as having a
      number of wheels and a weight and a length, but not foresee a
      color. This will not stop another person making the assertion
      that a given car is red, using the color vocabular from
      elsewhere.
    </p>
    <p>
      Apart from this simple but significant change, many concepts
      involved in the ER modelling take across directly onto the
      Semantic Web model.
    </p>
    <h2>
      The Semantic Web and Relational Databases
    </h2>
    <p>
      The semantic web data model is very directly connected with
      the model of relational databases. A relational database
      consists of tables, which consists of rows, or records. Each
      record consists of a set of fields. The record is nothing but
      the content of its fields, just as an RDF node is nothing but
      the connections: the property values. The mapping is very
      direct
    </p>
    <ul>
      <li>a record is an RDF node;
      </li>
      <li>the field (column) name is RDF propertyType; and
      </li>
      <li>the record field (table cell) is a value.
      </li>
    </ul>
    <p>
      Indeed, one of the main driving forces for the Semantic web,
      has always been the expression, on the Web, of the vast
      amount of relational database information in a way that can
      be processsed by machines.
    </p>
    <p>
      RDF's serialization format -- its syntax in XML -- is a very
      suitable format for expressing relational database
      information.
    </p>
    <h3>
      Special aspects of the RDB model
    </h3>
    <p>
      Relational database systems manage RDF data, but in a
      specialized way. In a table, there are many records with the
      same set of properties. An individual cell (which corresponds
      to an RDF property) is not often thought of on its own. SQL
      queries can join tables and extract data from tables, and the
      result is generally a table. So, the practical use for which
      RDB software is used typically optimized for doing operations
      with a small number of tables some of which may have a large
      number of elements.
    </p>
    <p>
      A fundamental aspect of a database table is that often the
      data in a table can be definitive. Neither RDF nor RDB models
      have simple ways of expressing this. For example, not only
      does a row in a table indicate that there is a red car whose
      Massachusetts plate is "123XYZ", but the table may also carry
      the unwritten semantics that if any car has a Massachusetts
      plate then it must be in the table. (If any RDF node has
      "Massachusetts plate number" property then than node is a
      member of the table) The scope of the uniquenes of a value is
      in fact a very interest property.
    </p>
    <p>
      The original RDB model defined by E.F. Codd included
      datatyping with inheritance, which he had intended would be
      implememnted in the RDB products to a greater extent that it
      has. For example, typically a person's home address house
      number may be typed as an an integer, and their shoe size may
      also be also be typed as an integer. One can as a result join
      to tables through those fields, or list people whose shoe
      size equals their house number. Practical RDB systems leave
      it to the application builder to only make operations which
      make sense. Once a database is expreted onto the Web, it
      becomes possible to do all kinds of strange combinations, so
      a stronger typing becomes very useful: it becomes a set of
      inference rules.
    </p>
    <p>
      In a pure RDB model, every table has a primary key: a column
      whose value can be used to uniquely identify every row. Some
      products do not enforce this, leading to an ambiguity in the
      significance of duplicate rows. A curious feature is that the
      primary key can be changed without changing the identity of a
      row. (A person can change their name for example). SQL allows
      tables to be set up so that such changes can cascade through
      the local system to preseve referential integrity. This
      clearly won't work on the Web. One solution is to use a row
      ID -- which many systems do in fact use although SQL doesn't
      expose it in a standard way. Another is for the application
      to coinstrain the primary key not to change. Another is to
      put up with links breaking.
    </p>
    <p>
      RDB systems have datatypes at the atomic (unstructured)
      level, as RDF and XML will/do. Combination rules tend in RDBs
      to be loosely enforced, in that a query can join tables by
      any columns which match by datatype -- without any check on
      the semantics. You could for example create a list of houses
      that have the same number as rooms as an employee's shoe
      size, for every employee, even though the sense of that would
      be questionable.
    </p>
    <p>
      The new SQL99 standard is going to include new
      object-oriented features, such as inherited typing and
      structured contents of cells - arrays and structs. This RDB
      model with things from the OO world. I don't deal with that
      here in that the RDF model works as a lowest commoin
      denominator being able to express either and both.
    </p>
    <h3>
      Schemas and Schemas
    </h3>
    <p>
      A difference between XML/RDF schemas (and SGML) on the one
      hand and database schemas on the other is the expectation
      that there will be a relatively small number of XML/RDF
      schemas. Many web sites will export documents whose structure
      is defined by the same schema, and this is in fact what
      provides the interoperability.
    </p>
    <p>
      A database schema is, as fasr as I know, created
      independently for each database. Even if a million companies
      clone the same form of employee database, there will be a
      million schemas, one for each database.
    </p>
    <p>
      It may be that RDF will fill a simple role in simply
      expressing the equivalence of the terms in each database
      schema.
    </p>
    <h3>
      Exposing a database on the Web
    </h3>
    <p>
      In order to be able to access a table, and make extra
      statements about it which will enable its use in more and
      more ways, the essential objects of the table must be
      exported as first class objects on the Web.
    </p>
    <p>
      When mapping any system onto the Web, the mapping into URI
      space is critical. Here we are doing this common operation
      generically for all relational databases. It is obviously
      usefuil for this to be done in a consistent ways between
      multiple vendors would be useful - an area for possible
      standardization.
    </p>
    <p>
      Here is a random example I may have gotten wrong, basd on
      whatI understand of the naming within databases. The database
      itself is defined within a schema which is listed in a
      catalog.
    </p>
    <table border="1">
      <caption>
        Mapping an RDB into the Web - strawman
      </caption>
      <tbody>
        <tr>
          <td>
            Catalog
          </td>
          <td>
            http://www.acme.com/mycat
          </td>
          <td></td>
        </tr>
        <tr>
          <td>
            Schema
          </td>
          <td>
            http://www.acme.com/mycat/schema1
          </td>
          <td></td>
        </tr>
        <tr>
          <td>
            Database
          </td>
          <td>
            http://www.acme.com/mycat/schema1/empdb/
          </td>
          <th>
            Relative:
          </th>
        </tr>
        <tr>
          <td>
            Table
          </td>
          <td>
            /mycat/schema1/empdb/emps
          </td>
          <td>
            emps
          </td>
        </tr>
        <tr>
          <td>
            Column name
          </td>
          <td>
            /mycat/schema1/empdb/emps/shoe
          </td>
          <td>
            emps/shoe
          </td>
        </tr>
        <tr>
          <td>
            View
          </td>
          <td>
            /mycat/schema1/empdb/emps2
          </td>
          <td>
            emps2
          </td>
        </tr>
        <tr>
          <td>
            Row
          </td>
          <td>
            /mycat/schema1/empdb/emps/rowid=123
          </td>
          <td>
            emps/rowid=123
          </td>
        </tr>
        <tr>
          <td>
            Cell
          </td>
          <td>
            /mycat/schema1/empdb/emps/rowid=123;col=shoe
          </td>
          <td>
            emps/rowid=123;col=shoe
          </td>
        </tr>
        <tr>
          <td>
            Arbitrary query
          </td>
          <td>
            /mycat/schema1/empdb/?select+empno+from<em>[...]</em>
          </td>
          <td>
            ?select<em>[...]</em>
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      2002 version, see <a href=
      "http://www.w3.org/2000/10/swap/dbork/dbview.py">real
      code</a> implemented by Dan Connolly:
    </p>
    <table border="1">
      <tbody>
        <tr>
          <th>
            <a name="table" id="table">What</a>
          </th>
          <th>
            Uriref relative to http://www.acme.com/wherever/
          </th>
          <th>
            rdf:type
          </th>
        </tr>
        <tr class="work">
          <td>
            <p>
              Database description of database "personnel"
            </p>
          </td>
          <td>
            personnel
            <p>
              (say - whatever)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument, db:DatabaseDescription
          </td>
        </tr>
        <tr>
          <td>
            The conceptual database(a table of tables??)
          </td>
          <td>
            personnel#_database
            <p>
              (Arbitrary, must not clash, linked by
              <code><strong>db:describes</strong></code> from
              personnel)
            </p>
          </td>
          <td></td>
        </tr>
        <tr class="work">
          <td>
            A document giving all the data in the database. May
            support PUT?
          </td>
          <td>
            personnel/_data
            <p>
              (Arbitrary, must not clash with table names, linked
              by <strong><code>db:allData</code></strong> from
              personnel)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument
          </td>
        </tr>
        <tr>
          <td>
            The concept of the table "employees": The class of
            exactly those things which are in the table.
          </td>
          <td>
            <p>
              personnel/employees#.table
            </p>
            <p>
              (was: personnel#employees, but changed to allow it to
              be deref'd to giev useful data)
            </p>
            <p>
              (defined in personnel)
            </p>
          </td>
          <td>
            rdfs:Class, db:Table
          </td>
        </tr>
        <tr class="work">
          <td>
            A description of the table. Optimization: includes the
            current size of the table. Identifies primary key if
            any.
          </td>
          <td>
            personnel/employees
            <p>
              (<strong>Convention</strong>. The bit of the
              classname before the #)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument, db:TableDescription
          </td>
        </tr>
        <tr class="work">
          <td>
            A description of all the tables. Just an (optional)
            optimization.
          </td>
          <td>
            personnel/_all
            <p>
              (Arbitrary, must not clash, linked by
              <code><strong>db:tableSchemas</strong></code> from
              personnel/employees)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument, db:TableDescription
          </td>
        </tr>
        <tr>
          <td>
            The concept of a column in the table, the Property
            something has iff that is recorded in the table.
          </td>
          <td>
            personnel/employees#email
            <p>
              (Defined in personnel/employees)
            </p>
          </td>
          <td>
            rdf:Property, db:Column
          </td>
        </tr>
        <tr class="work">
          <td>
            A document giving all the data in the table. May
            support PUT
          </td>
          <td>
            personnel/employees/_data
            <p>
              (Arbitrary, must not clash, linked by
              <strong><code>db:tableData</code></strong> from
              personnel/employees)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument,
          </td>
        </tr>
        <tr class="work">
          <td>
            A document giving the data in the row for which the
            primary key is 1234. (Iff primary key exists). May
            support PUT
          </td>
          <td>
            personnel/employees/1234
            <p>
              (<strong>Convention.</strong> Note the primary key
              value must be encoded suitably!)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument
          </td>
        </tr>
        <tr>
          <td>
            The concept of the thing describd by that row.
          </td>
          <td>
            <p>
              personnel/employees/1234#item
            </p>
            <p>
              (<strong>Convention</strong>)
            </p>
            <p>
              (when primary key exists, then employees#_data etc
              use this URIref for the item 1234 intead of making
              anonymous nodes)
            </p>
            <p>
              (employees/_data#1234?@@)
            </p>
          </td>
          <td>
            personnel/employees#_Class
          </td>
        </tr>
        <tr class="work">
          <td>
            A document giving the information in just one cell
          </td>
          <td>
            personnel/employees/1234/email
            <p>
              (<strong>Convention</strong>)
            </p>
          </td>
          <td>
            [ is rdf:domain of personnel/employees#email ]
          </td>
        </tr>
        <tr class="work">
          <td>
            Arbitrary query
          </td>
          <td>
            personnel/_sql?select+empno+from<em>[...]</em>
            <p>
              (arbitrary, linked by
              <code><strong>db:sqlService</strong></code> from
              personnel if supported.)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument
          </td>
        </tr>
        <tr class="work">
          <td>
            Arbirary HTML form field match (select * from employees
            where email like "*fred*") [@details]
          </td>
          <td>
            personnel/_fquery?email=*fred*;name=Joe
            <p>
              (arbitrary, linked by
              <code><strong>db:formService</strong></code> from
              personnel if supported)
            </p>
          </td>
          <td>
            soc:Work, rdfdocument
          </td>
        </tr>
        <tr>
          <td>
            POST point for RDF data, either new data, or assertions
            that some (n3) Formula is a log:Falsehood.
          </td>
          <td>
            <p>
              personnel/_postme
            </p>
            <p>
              (arbitrary, linked by
              <code><strong>db:deltaService</strong></code> from
              personnel if supported. Could be same URI
              <code>personnel</code> in fact, as we are dealing
              iwth a different method)
            </p>
          </td>
          <td>
            db:postable
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      @@@ How to use typing to indicate that the URI in the table
      is a (relative?) URI to another object, not a string?
    </p>
    <p>
      @@@ This works fine when implemented live on a database.
      However, it is a little tricky to emulate in a typical
      file-based web server because of the use of "personnel" in
      this case both as directory and as
    </p>
    <p>
      One of the things which makes life easier is to make the
      mapping so that the relative URI syntax can be used to
      advantage. For example, here, everything within the database
      (the scope of an SQL statement) can be writted as a short
      URI.
    </p>
    <p>
      There is a question as to how much of the SQL query syntax
      should be turned into identifier. For example, is a query on
      a primary key really an identifier? Is the extraction of a
      single cell really an identifier? It would be useful to be
      able to treat them as such. However, it would be wiser to use
      the "?" convention to indicate a generalized SQL idempotent
      query. (A URL should <a href="Axioms.html#get">of course</a>
      <em>never</em> be used to refer to the results of a
      table-changing operation such as UPDATE or DELETE. In this
      case, if HTTP were used, an SQL query should IMHO be POST ed
      to the database URI. Of course, you can use your favorite
      networked database access protocol)
    </p>
    <p>
      In the above the column name of the table could be refered to
      using the table as a namespace, a row for example being
    </p>
    <pre>
&lt;foo<br />  xmlns:t="http://www.example.com/mycat/personnel/employees"&gt;<br />  &lt;t:email&gt;joe@example.com&lt;/t:email&gt;<br />  &lt;t:age&gt;45&lt;/t:age&gt;<br />&lt;/foo&gt;
</pre>
    <p>
      and one row of the the result of joining this table (of
      people) and another table (about people) by their primary
      keys would use namespaces from both tables:
    </p>
    <pre>
&lt;foo<br />  xmlns:t="http://www.example.com/mycat/personnel/employees"<br />  xmlns:u="http://www.acme.com/mycat/schema1/empdb/likes"&gt;<br />    &lt;t:email&gt;joe@example.com&lt;/t:email&gt;<br />  &lt;t:age&gt;45&lt;/t:age&gt;<br />  &lt;u:music&gt;blues&lt;/u:music&gt;<br />&lt;/foo&gt;
</pre>
    <hr />
    <h2>
      Later related work:
    </h2><a href=
    "http://www.cs.man.ac.uk/~ocorcho/documents/SWDB2004_BarrasaEtAl.pdf">R2O,
    an Extensible and Semantically Based Database-to-Ontology
    Mapping Language.</a> Barrasa J, Corcho O,
    G&Atilde;&sup3;mez-P&Atilde;&copy;rez A. Second Workshop on
    Semantic Web and Databases (SWDB2004). Toronto, Canada. August
    2004.
    <hr />
    <p>
      <em>This has been elaborated with help of an RDB tutorial and
      discussion from Andrew Eisenberg/Sybase</em>.
    </p>
    <hr />
    <p>
      See also: <a href="RDF-XML.html">Why RDF is more than XML</a>
    </p>
    <p>
      <a href="Overview.html">Up to Design Issues</a>; back to
      <a href="Architecture.html">Architecture from 50,000ft</a>
    </p>
    <p>
      timbl
    </p>
  </body>
</html>