InterpretationProperties.html 20.1 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      Interpretation properties -- Ideas about Web architecture
    </title>
    <link rel="Stylesheet" href="di.css" type="text/css" />
    <meta http-equiv="Content-Type" content=
    "text/html; charset=us-ascii" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
    <address>
      Tim Berners-Lee<br />
      Date: 1998, last change: $Date: 2009/08/27 21:38:07 $<br />
      Status: personal view only. Editing status: first draft.
    </address>
    <p>
      <a href="./">Up to Design Issues</a>
    </p>
    <h3>
      Design Issues - Ideas about Web Architecture
    </h3>
    <p>
      <em>This page assumes an imaginary namespace referred to as
      play: which is used only for the sake of example. The readers
      is assumed to be able to guess its specification.</em>
    </p>
    <hr />
    <h1>
      <a name="Interpreta" id="Interpreta">Interpretation
      properties</a>
    </h1>
    <p>
      <em>Abstract: Natural languages, encodings, and similar
      relationships between one abstract thing and another, are
      best modeled in RDF as properties. I call these
      Interpretation properties in that they express the
      relationship between one value and that value interpreted (or
      processed in the imagination) in a specific way.</em>
    </p>
    <h2>
      <a name="problem" id="problem">The problem of annotating
      natural language</a>
    </h2>
    <p>
      There has to date (2000/02) been a consistent muddle in the
      RDF community about how to represent the natural language of
      a string. In XML it is simple, because you never have to
      exactly explain what you mean. You can mark up span of text
      and declare it to be French.
    </p>
    <blockquote>
      <p>
        His name was &lt;html:span
        xml:lang="fr"&gt;Jean-Fran&amp;ccedilla;ois&lt;/html:span&gt;
        but we called him Dan.
      </p>
    </blockquote>
    <p>
      Under pressure from the XML community to be standard, the RDF
      spec included this attribute as the official RDF way to
      record that a string was in a given language. This was a
      mistake, as the attribute was thrown into the syntax but not
      into the model which the spec was defining.
    </p>
    <p>
      Consider the <a href="Identity.html#this">example</a> in the
      <a href="Identity.html">identity section</a>,
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type&gt;http://www.people.org/types#person&lt;/a&gt;
   &lt;play:name&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/play:name&gt;
   &lt;play:mailbox resource="mailto:ora.lassila@research.nokia.com"/&gt;
   &lt;play:homePage resource="http://www.w3.org/People/Lassila"/&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Now that represents five nodes in the RDF graph: the
      anonymous node for Ora himself (who has no web address) and
      the four arcs specifying that this thing is of type person,
      and has a common name, email address and home page as given.
    </p>
    <p>
      Where to we add the language property? Of course we could add
      a language attribute to the XML, but that would be lost on
      translation into the RDF model: no triple would result.
    </p>
    <h3>
      <a name="Attempt2" id="Attempt2">Attempt 1: a property of the
      person?</a>
    </h3>
    <p>
      Many specifications such as iCalendar (see my notes@link)
      would add another property to the definition of the person.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type&gt;http://www.people.org/types#person&lt;/a&gt;
   &lt;play:name&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/play:name&gt;
   &lt;play:namelang&gt;fi&lt;/play:namelang&gt;
   &lt;play:mailbox&gt;ora.lassila@research.nokia.com&lt;/play:mailbox&gt;
   &lt;play:homePage&gt;http://www.w3.org/People/Lassila/&lt;/play:homepage&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Here, the property <em>play:namelang</em> is defined to mean
      "A has a name which is in natural language B". In the
      iCalendar spec, the definition more complex in that the
      <em>lang</em> property is in same cases the language of a
      name and in other cases that of the object's description.
      This is a modeling muddle. The nice thing about doing it this
      way is that the structure is kept flat, and pre-XML systems
      such as RFC822 (email etc) headers have a syntax which can
      only cope with this.
    </p>
    <p>
      There are many drawbacks to this muddle. Ora may have two
      names, one in Finish and another in English, and the model
      fails to be able to express that. Because the attribute is
      apparently tied to the person and not obviously attached to
      the name, automatic processing of such a thing is ruled out.
      Clearly, the structure does not reflect the facts of the
      case.
    </p>
    <h3>
      <a name="Attempt1" id="Attempt1">Attempt 2: a property of the
      string?</a>
    </h3>
    <p>
      The second attempt is to make a graph which expresses the
      language as a property of the string itself. Clearly, "Ora
      Yrj&ouml; Uolevi Lassila" is Finnish, is it not? Yes, Ora is
      Finnish, but that is different. What we need to say is that
      the string is in the Finnish language. The problem, then,
      becomes that RDF does not allow literal text to be the
      subject of a statement. Never mind, RDF in fact invents the
      <em>rdf:value</em> property which allows us to specify that a
      node is really text, but say other things about it too. This
      is done by introducing an intermediate node.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type resource="http://www.people.org/types#person" /&gt;
   &lt;play:name rdf:parseType="Resource"&gt;
       &lt;rdf:value&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/rdf:value&gt;
       &lt;play:lang&gt;fi&lt;/play:lang&gt;
    &lt;/play:name&gt;
   &lt;play:mailbox resource="mailto:ora.lassila@research.nokia.com"/&gt;
   &lt;play:homePage resource="http://www.w3.org/People/Lassila"&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      There we have it, and in an RDF graph at least very pretty it
      looks. And indeed, we could work with this, apart from the
      fact that we have made another modeling error. It is not true
      that the language is a property of the text string. After
      all, the string "Tim" - is that English (short for Timothy?
      or French (short for "Timoth&eacute;")? I don't need to add a
      long list of text strings which can be interpreted as one
      language or as another. A system which made the assertion
      that the string itself was fundamentally English would simply
      be not representing the case.
    </p>
    <h3>
      <a name="Attempt" id="Attempt">Attempt 3: a relationship
      between them.</a>
    </h3>
    <p>
      In fact, the situation is that Ora's name is a natural
      language object, which is the interpretation according to
      Finnish of the string "Ora Yrj&ouml; Uolevi Lassila". In
      other words, Finish the language is the relationship between
      Ora's name and the string. In RDF, we model a binary
      relationship with a property.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type&gt;http://www.people.org/types#person&lt;/a&gt;
   &lt;play:name&gt;
       &lt;lang:fi&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/lang:fi&gt;
    &lt;/play:name&gt;
   &lt;play:mailbox&gt;ora.lassila@research.nokia.com&lt;/play:mailbox&gt;
   &lt;play:homePage&gt;http://www.w3.org/People/Lassila/&lt;/play:homepage&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      This works much better. Ora has a name which is the Finnish
      "Ora". This allows an RDF system to create a node for that
      string, and a "Finish" link from the concept of Ora the
      person, maybe a Danish link from the concept of the currency,
      and an old english link from the concept of weight (1/15
      pound), not to mention a Latin link from the concept of the
      shore.
    </p>
    <p>
      A problem we may feel is we would like the language to be a
      string, so that we can reference the ISO spec for all such
      things, but there is of course no reason why the spec for the
      lang: space should not reference the same spec.
    </p>
    <p>
      Another problem we might feel is that it is reasonable for
      the play:name to expect a string, and in most cases it may
      get a string: what is the poor system supposed to do in order
      to accommodate finding a natural language object in place of
      a string? I guess making a class which includes all strings
      and all natural language objects is the best way to go. Any
      use of string which did not allow also such natural language
      object makes life much more difficult for multilingual
      software- so this is serious problem.
    </p>
    <p>
      <em>[[This leads us on to another interesting question of
      packaging in RDF. There is a requirement in XML packaging and
      in email packaging and it seems quite similarly in RDF that
      when you ask me for something of type X I must be able to
      give you something of type package which happens to include
      the X you asked for and also some information for your
      edification. But that is another story.@@@ eleborate and
      define properties or syntax@@@]]</em>
    </p>
    <p>
      What is really important is that we are using the ability of
      RDF to talk about abstract things, just as when we identified
      people by the resources they were associated with, but
      avoided pretending that any person had a definitive URI.
    </p>
    <h2 id="Interpreta1">
      Datatypes as interpretation properties<sup><a href="#L380"
      name="L382" id="L382">*</a></sup>
    </h2>
    <p>
      <em>Datatypes</em> here I mean in the sense of the atomic
      types in a programming language, or for example XML Datatypes
      (XML schema part 2). Defining datatypes involves defining
      constraints on an input string (for example specifying what a
      valid date is as a regular expression) and specifying the
      mathematical abstract individuals which instances of a type
      represent. One can model the relationship between the
      representation and the abstract value and the string using a
      property.
    </p>
    <table border="0" width="100%">
      <tbody>
        <tr>
          <td valign="middle">
            <pre>
&lt;rdf:Description about="#myshoe"&gt;
   &lt;shoe:size&gt;10&lt;/shoe:size&gt;
&lt;/rdf:Description&gt;
</pre>
          </td>
          <td valign="middle">
            <span class="N3">&lt;#myshoe&gt; shoe:size "10".</span>
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      This doesn't tell us what it is 10 of. We could go through
      life without any model of types: we could define a shoe size
      as being a decimal string for a number inches. There are many
      questions and tradeoffs which datatype designers make (for
      example,
    </p>
    <ul>
      <li>Can you tell the type of a value from the string
      representation in every case? (eg 1.4e4 vs 1.4d4 for
      precision)
      </li>
      <li>Are the values of different datatypes distinct? (Eg, is 1
      = 1.0?)
      </li>
      <li>Are the set of datatypes extensible? (Eg, can you add
      complex numbers or prime numbers?)
      </li>
      <li>Does representation equality imply value equality?
      </li>
      <li>Does value equality imply representation equality? (Is
      the only allowed representation the canonical one?)
      </li>
    </ul>
    <p>
      It would be nice to be able to model these questions in
      general in the semantic web, in order describe the properties
      of dat in arbitrary systems. We can introduce interpretation
      properties which link a string to its decimal interpretation
      as number, or a length including units. The problem is that
      the RDF graph which most folks use is the one above. The
      object of shoe:size is "10".
    </p>
    <p>
      The simplistic system corresponding exactly to the <a href=
      "#Attempt2">Attempt 1 above</a>, is to declare that shoe:size
      is of class integer. This implies (we then say) that any
      value is a decimal string. Given the string and the type we
      can conclude the abstract value, the integer ten. This works.
      It is the system used by XML datatytpes whose answers for the
      questions above are as I understand it [No, Yes, Yes, Yes,
      No]. A snag is that you can't compare two values unless you
      know the datatypes.
    </p>
    <p>
      To model the representation explicitly in the RDF it seems
      you have to introduce another node and arc, which is a pain.
    </p>
    <table border="0" width="100%">
      <tbody>
        <tr>
          <td valign="middle">
            <pre>
&lt;rdf:Description about="#myshoe"&gt;
   &lt;shoe:size&gt;
      &lt;rdf:value&gt;10&lt;/rdf:value&gt;
   &lt;/shoe:size&gt;
&lt;/rdf:Description&gt;
</pre>
          </td>
          <td valign="middle">
            <span class="N3">&lt;#myshoe&gt; shoe:size [ rdf:value
            "10" ].</span>
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      We can then define rdf:value to express that there is some
      datatype relation which relates the size of the shoe to "10".
      All datatype relations are subProperties of rdf:value with
      this system. Once it is that form, the datatype information
      can be added to the graph. You have the choice of asserting
      that the object is of a given class, and deducing that the
      datatype relation must be a certain one. You can nest
      interpretation properties - interpreting a string as a
      decimal and then as a length in feet. But this is not
      possible without that extra node. One wonders about radically
      changing the way all RDF is parsed into triples, so as to
      introduce the extra abstract node for every literal --
      frightful. One wonders about declaring "10" to be a generic
      resource, an abstraction associated with the set of all
      things for which "10" is a representation under some datatype
      relation. This is frightful too you don't have "equals" any
      more in the sense you used to have it.
    </p>
    <p>
      Instead of adding an extra arc in series with the original,
      we can leave all Properties such as shoe:size as being rather
      vague relations between the shoe and some string
      representation, and then using a functional property (say
      <code>rdf:actual)</code> to relate the shoe:size to a (more
      useful) property whose object is a typed abstract value.
    </p>
    <pre>
{ &lt;#myshoe&gt; shoe:size "10" } log:implies
{ &lt;#myshoe&gt; [is rdf:actual of shoe:size] [rdf:value "10"] } .
</pre>
    <p>
      <em>@@@ No clear way forward for describing datatypes in
      RDF/DAML (2001/1) @@</em>
    </p>
    <h2>
      <a name="More" id="More">More examples</a>
    </h2>
    <p>
      Interpretation properties was the name I have arbitrarily
      chosen for this sort of use. I am not sure whether it is a
      good word. But I want to encourage their use. Base 64
      encoding is another example. It comes up everywhere, but XML
      Digital Signature is one place.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;play:name parseType="Resource"&gt;
      &lt;lang:fi  parseType="Resource"&gt;
        &lt;enc:base64&gt;jksdfhher78f8e47fy87eysady87f7sea&lt;/enc:base64&gt;
      &lt;/lang:fi&gt;
    &lt;/play:name&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Another example is type coercion. Suppose there is a need to
      take something of datetime and use it as a date:
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;play:event parseType="Resource"&gt;
       &lt;play:start parseType="Resource"&gt;
          &lt;play:date&gt;2000-01-31 12:00ET&lt;/play:date&gt;
       &lt;/play:start&gt;
       &lt;play:sumary&gt;The Bryn Poeth Uchaf Folk festival&lt;/play:summary&gt;
   &lt;/play:event&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Such properties often have uniqueness and/or unambiguity
      properties. <em>enc:base64</em> for example is clearly a
      reversible transformation. It it relates two strings, on
      printable and the other a byte string with no other
      constraints. The byte string could not in general be
      represented in an XML document. The definition of
      <em>enc:base64</em> is that A when encoded in base 64 yields
      A. This allows any processor, given B to derive A. The
      specification of the encoding namespace (here refereed to by
      prefix <em>enc:</em>) could be that any conforming processor
      must be able to accept a base64 encoding of a string in any
      place that a string is acceptable.
    </p>
    <p>
      Interpretation properties make it clear what is going on. For
      example,
    </p>
    <pre>
&lt;rdf:description about="http://www.w3.org/"&gt;
   &lt;play:xml-cannonicalized parseType="Resource"&gt;
      &lt;enc:hash-sha-1 parseType="Resource"&gt;
         &lt;enc:base64&gt;jd8734djr08347jyd4&lt;/enc:base64&gt;
      &lt;/enc:hash-sha-1&gt;
   &lt;/play:xml-cannonicalized&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      clearly makes a statement, using properties quite
      independently defined for the various processes, that the
      base64 encoding of the SHA-1 hash of the canonicalized form
      of the W3C home page is jd8734djr08347jyd4. Compare this
      withe the HTTP situation in which the headers cannot be
      nested, and the encodings and compression and other things
      applied to the body are mentioned as unordered annotations,
      and the spec has to provide a way of making the right
      conclusion about which happened in what order.
    </p>
    <h2>
      Units of Measure (2006)
    </h2>
    <p>
      This pattern applies very well to units of measure.
    </p>
    <p>
      See, for example a simple ontology <a href=
      "http://www.w3.org/2007/ont/unit">http://www.w3.org/2007/ont/unit</a>
      of units of measure.
    </p>
    <h2>
      <a name="Conclusion" id="Conclusion">Conclusion</a>
    </h2>
    <p>
      Representing the interpretation of one string as an abstract
      thing can be done easily with RDF properties. This helps make
      a clean accurate model. However, using the concept for
      datatypes in RDF is incompatible with RDF as we know it
      today.
    </p>
    <hr />
    <p>
      See also:
    </p>
    <ul>
      <li>
        <a href="Identity.html">Expressing the identity of real
        things</a>
      </li>
    </ul>
    <p>
      <em>@@@Needs circle-and-arrow pictures for each attempt.</em>
    </p>
    <p>
      <a name="L380" href="#L382" id="L380">Note.</a> This section
      followed a discussion about "<em><a href=
      "/2001/01/ct24">Using XML Schema Datatypes in RDF and
      DAML+OIL</a></em> with DWC.
    </p>
    <p>
      <a href="mailto:gruber@ksl.stanford.edu">Thomas R. Gruber</a>
      and Gregory R. Olsen, KSL <a href=
      "http://www-ksl.stanford.edu/knowledge-sharing/papers/engmath.html">
      "An Ontology for Engineering Mathematics"</a> in Jon Doyle,
      Piero Torasso, &amp; Erik Sandewall, Eds., <em>Fourth
      International Conference on Principles of Knowledge
      Representation and Reasoning</em>, Gustav Stresemann
      Institut, Bonn, Germany, Morgan Kaufmann, 1994. <em>A non-RDF
      but thorough treatement including units of measure as scalar
      quantities.</em>
    </p>
    <p>
      Compare with <a href=
      "http://icosym-nt.cvut.cz/kifb/en/ont/sumo-units-of-measure.html">
      SUMO units of Measure</a> which seems have units as
      instances, and multupliers such as kilo, giga, etc as
      functions.
    </p>
    <p>
      A ittle off-topic, On linear and area memasure, John Baez's
      <a href="http://www.math.ucr.edu/home/baez/inches.html">"Why
      are there 63360 inches per mile?"</a> is good reaing.
    </p>
    <hr />
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <p>
      <a href="../People/Berners-Lee">Tim BL</a>
    </p>
    <p>
      (names of certain characters may have been misspelled to
      protect the innocent ;-)
    </p>
  </body>
</html>