InterpretationProperties.html 20.1 KB

Raw Blame History Permalink

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      Interpretation properties -- Ideas about Web architecture
    </title>
    <link rel="Stylesheet" href="di.css" type="text/css" />
    <meta http-equiv="Content-Type" content=
    "text/html; charset=us-ascii" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
    <address>
      Tim Berners-Lee<br />
      Date: 1998, last change: $Date: 2009/08/27 21:38:07 $<br />
      Status: personal view only. Editing status: first draft.
    </address>
    <p>
      <a href="./">Up to Design Issues</a>
    </p>
    <h3>
      Design Issues - Ideas about Web Architecture
    </h3>
    <p>
      <em>This page assumes an imaginary namespace referred to as
      play: which is used only for the sake of example. The readers
      is assumed to be able to guess its specification.</em>
    </p>
    <hr />
    <h1>
      <a name="Interpreta" id="Interpreta">Interpretation
      properties</a>
    </h1>
    <p>
      <em>Abstract: Natural languages, encodings, and similar
      relationships between one abstract thing and another, are
      best modeled in RDF as properties. I call these
      Interpretation properties in that they express the
      relationship between one value and that value interpreted (or
      processed in the imagination) in a specific way.</em>
    </p>
    <h2>
      <a name="problem" id="problem">The problem of annotating
      natural language</a>
    </h2>
    <p>
      There has to date (2000/02) been a consistent muddle in the
      RDF community about how to represent the natural language of
      a string. In XML it is simple, because you never have to
      exactly explain what you mean. You can mark up span of text
      and declare it to be French.
    </p>
    <blockquote>
      <p>
        His name was &lt;html:span
        xml:lang="fr"&gt;Jean-Fran&amp;ccedilla;ois&lt;/html:span&gt;
        but we called him Dan.
      </p>
    </blockquote>
    <p>
      Under pressure from the XML community to be standard, the RDF
      spec included this attribute as the official RDF way to
      record that a string was in a given language. This was a
      mistake, as the attribute was thrown into the syntax but not
      into the model which the spec was defining.
    </p>
    <p>
      Consider the <a href="Identity.html#this">example</a> in the
      <a href="Identity.html">identity section</a>,
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type&gt;http://www.people.org/types#person&lt;/a&gt;
   &lt;play:name&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/play:name&gt;
   &lt;play:mailbox resource="mailto:ora.lassila@research.nokia.com"/&gt;
   &lt;play:homePage resource="http://www.w3.org/People/Lassila"/&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Now that represents five nodes in the RDF graph: the
      anonymous node for Ora himself (who has no web address) and
      the four arcs specifying that this thing is of type person,
      and has a common name, email address and home page as given.
    </p>
    <p>
      Where to we add the language property? Of course we could add
      a language attribute to the XML, but that would be lost on
      translation into the RDF model: no triple would result.
    </p>
    <h3>
      <a name="Attempt2" id="Attempt2">Attempt 1: a property of the
      person?</a>
    </h3>
    <p>
      Many specifications such as iCalendar (see my notes@link)
      would add another property to the definition of the person.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type&gt;http://www.people.org/types#person&lt;/a&gt;
   &lt;play:name&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/play:name&gt;
   &lt;play:namelang&gt;fi&lt;/play:namelang&gt;
   &lt;play:mailbox&gt;ora.lassila@research.nokia.com&lt;/play:mailbox&gt;
   &lt;play:homePage&gt;http://www.w3.org/People/Lassila/&lt;/play:homepage&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Here, the property <em>play:namelang</em> is defined to mean
      "A has a name which is in natural language B". In the
      iCalendar spec, the definition more complex in that the
      <em>lang</em> property is in same cases the language of a
      name and in other cases that of the object's description.
      This is a modeling muddle. The nice thing about doing it this
      way is that the structure is kept flat, and pre-XML systems
      such as RFC822 (email etc) headers have a syntax which can
      only cope with this.
    </p>
    <p>
      There are many drawbacks to this muddle. Ora may have two
      names, one in Finish and another in English, and the model
      fails to be able to express that. Because the attribute is
      apparently tied to the person and not obviously attached to
      the name, automatic processing of such a thing is ruled out.
      Clearly, the structure does not reflect the facts of the
      case.
    </p>
    <h3>
      <a name="Attempt1" id="Attempt1">Attempt 2: a property of the
      string?</a>
    </h3>
    <p>
      The second attempt is to make a graph which expresses the
      language as a property of the string itself. Clearly, "Ora
      Yrj&ouml; Uolevi Lassila" is Finnish, is it not? Yes, Ora is
      Finnish, but that is different. What we need to say is that
      the string is in the Finnish language. The problem, then,
      becomes that RDF does not allow literal text to be the
      subject of a statement. Never mind, RDF in fact invents the
      <em>rdf:value</em> property which allows us to specify that a
      node is really text, but say other things about it too. This
      is done by introducing an intermediate node.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type resource="http://www.people.org/types#person" /&gt;
   &lt;play:name rdf:parseType="Resource"&gt;
       &lt;rdf:value&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/rdf:value&gt;
       &lt;play:lang&gt;fi&lt;/play:lang&gt;
    &lt;/play:name&gt;
   &lt;play:mailbox resource="mailto:ora.lassila@research.nokia.com"/&gt;
   &lt;play:homePage resource="http://www.w3.org/People/Lassila"&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      There we have it, and in an RDF graph at least very pretty it
      looks. And indeed, we could work with this, apart from the
      fact that we have made another modeling error. It is not true
      that the language is a property of the text string. After
      all, the string "Tim" - is that English (short for Timothy?
      or French (short for "Timoth&eacute;")? I don't need to add a
      long list of text strings which can be interpreted as one
      language or as another. A system which made the assertion
      that the string itself was fundamentally English would simply
      be not representing the case.
    </p>
    <h3>
      <a name="Attempt" id="Attempt">Attempt 3: a relationship
      between them.</a>
    </h3>
    <p>
      In fact, the situation is that Ora's name is a natural
      language object, which is the interpretation according to
      Finnish of the string "Ora Yrj&ouml; Uolevi Lassila". In
      other words, Finish the language is the relationship between
      Ora's name and the string. In RDF, we model a binary
      relationship with a property.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type&gt;http://www.people.org/types#person&lt;/a&gt;
   &lt;play:name&gt;
       &lt;lang:fi&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/lang:fi&gt;
    &lt;/play:name&gt;
   &lt;play:mailbox&gt;ora.lassila@research.nokia.com&lt;/play:mailbox&gt;
   &lt;play:homePage&gt;http://www.w3.org/People/Lassila/&lt;/play:homepage&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      This works much better. Ora has a name which is the Finnish
      "Ora". This allows an RDF system to create a node for that
      string, and a "Finish" link from the concept of Ora the
      person, maybe a Danish link from the concept of the currency,
      and an old english link from the concept of weight (1/15
      pound), not to mention a Latin link from the concept of the
      shore.
    </p>
    <p>
      A problem we may feel is we would like the language to be a
      string, so that we can reference the ISO spec for all such
      things, but there is of course no reason why the spec for the
      lang: space should not reference the same spec.
    </p>
    <p>
      Another problem we might feel is that it is reasonable for
      the play:name to expect a string, and in most cases it may
      get a string: what is the poor system supposed to do in order
      to accommodate finding a natural language object in place of
      a string? I guess making a class which includes all strings
      and all natural language objects is the best way to go. Any
      use of string which did not allow also such natural language
      object makes life much more difficult for multilingual
      software- so this is serious problem.
    </p>
    <p>
      <em>[[This leads us on to another interesting question of
      packaging in RDF. There is a requirement in XML packaging and
      in email packaging and it seems quite similarly in RDF that
      when you ask me for something of type X I must be able to
      give you something of type package which happens to include
      the X you asked for and also some information for your
      edification. But that is another story.@@@ eleborate and
      define properties or syntax@@@]]</em>
    </p>
    <p>
      What is really important is that we are using the ability of
      RDF to talk about abstract things, just as when we identified
      people by the resources they were associated with, but
      avoided pretending that any person had a definitive URI.
    </p>
    <h2 id="Interpreta1">
      Datatypes as interpretation properties<sup><a href="#L380"
      name="L382" id="L382">*</a></sup>
    </h2>
    <p>
      <em>Datatypes</em> here I mean in the sense of the atomic
      types in a programming language, or for example XML Datatypes
      (XML schema part 2). Defining datatypes involves defining
      constraints on an input string (for example specifying what a
      valid date is as a regular expression) and specifying the
      mathematical abstract individuals which instances of a type
      represent. One can model the relationship between the
      representation and the abstract value and the string using a
      property.
    </p>
    <table border="0" width="100%">
      <tbody>
        <tr>
          <td valign="middle">
            <pre>
&lt;rdf:Description about="#myshoe"&gt;
   &lt;shoe:size&gt;10&lt;/shoe:size&gt;
&lt;/rdf:Description&gt;
</pre>
          </td>
          <td valign="middle">
            <span class="N3">&lt;#myshoe&gt; shoe:size "10".</span>
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      This doesn't tell us what it is 10 of. We could go through
      life without any model of types: we could define a shoe size
      as being a decimal string for a number inches. There are many
      questions and tradeoffs which datatype designers make (for
      example,
    </p>
    <ul>
      <li>Can you tell the type of a value from the string
      representation in every case? (eg 1.4e4 vs 1.4d4 for
      precision)
      </li>
      <li>Are the values of different datatypes distinct? (Eg, is 1
      = 1.0?)
      </li>
      <li>Are the set of datatypes extensible? (Eg, can you add
      complex numbers or prime numbers?)
      </li>
      <li>Does representation equality imply value equality?
      </li>
      <li>Does value equality imply representation equality? (Is
      the only allowed representation the canonical one?)
      </li>
    </ul>
    <p>
      It would be nice to be able to model these questions in
      general in the semantic web, in order describe the properties
      of dat in arbitrary systems. We can introduce interpretation
      properties which link a string to its decimal interpretation
      as number, or a length including units. The problem is that
      the RDF graph which most folks use is the one above. The
      object of shoe:size is "10".
    </p>
    <p>
      The simplistic system corresponding exactly to the <a href=
      "#Attempt2">Attempt 1 above</a>, is to declare that shoe:size
      is of class integer. This implies (we then say) that any
      value is a decimal string. Given the string and the type we
      can conclude the abstract value, the integer ten. This works.
      It is the system used by XML datatytpes whose answers for the
      questions above are as I understand it [No, Yes, Yes, Yes,
      No]. A snag is that you can't compare two values unless you
      know the datatypes.
    </p>
    <p>
      To model the representation explicitly in the RDF it seems
      you have to introduce another node and arc, which is a pain.
    </p>
    <table border="0" width="100%">
      <tbody>
        <tr>
          <td valign="middle">
            <pre>
&lt;rdf:Description about="#myshoe"&gt;
   &lt;shoe:size&gt;
      &lt;rdf:value&gt;10&lt;/rdf:value&gt;
   &lt;/shoe:size&gt;
&lt;/rdf:Description&gt;
</pre>
          </td>
          <td valign="middle">
            <span class="N3">&lt;#myshoe&gt; shoe:size [ rdf:value
            "10" ].</span>
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      We can then define rdf:value to express that there is some
      datatype relation which relates the size of the shoe to "10".
      All datatype relations are subProperties of rdf:value with
      this system. Once it is that form, the datatype information
      can be added to the graph. You have the choice of asserting
      that the object is of a given class, and deducing that the
      datatype relation must be a certain one. You can nest
      interpretation properties - interpreting a string as a
      decimal and then as a length in feet. But this is not
      possible without that extra node. One wonders about radically
      changing the way all RDF is parsed into triples, so as to
      introduce the extra abstract node for every literal --
      frightful. One wonders about declaring "10" to be a generic
      resource, an abstraction associated with the set of all
      things for which "10" is a representation under some datatype
      relation. This is frightful too you don't have "equals" any
      more in the sense you used to have it.
    </p>
    <p>
      Instead of adding an extra arc in series with the original,
      we can leave all Properties such as shoe:size as being rather
      vague relations between the shoe and some string
      representation, and then using a functional property (say
      <code>rdf:actual)</code> to relate the shoe:size to a (more
      useful) property whose object is a typed abstract value.
    </p>
    <pre>
{ &lt;#myshoe&gt; shoe:size "10" } log:implies
{ &lt;#myshoe&gt; [is rdf:actual of shoe:size] [rdf:value "10"] } .
</pre>
    <p>
      <em>@@@ No clear way forward for describing datatypes in
      RDF/DAML (2001/1) @@</em>
    </p>
    <h2>
      <a name="More" id="More">More examples</a>
    </h2>
    <p>
      Interpretation properties was the name I have arbitrarily
      chosen for this sort of use. I am not sure whether it is a
      good word. But I want to encourage their use. Base 64
      encoding is another example. It comes up everywhere, but XML
      Digital Signature is one place.
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;play:name parseType="Resource"&gt;
      &lt;lang:fi  parseType="Resource"&gt;
        &lt;enc:base64&gt;jksdfhher78f8e47fy87eysady87f7sea&lt;/enc:base64&gt;
      &lt;/lang:fi&gt;
    &lt;/play:name&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Another example is type coercion. Suppose there is a need to
      take something of datetime and use it as a date:
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;play:event parseType="Resource"&gt;
       &lt;play:start parseType="Resource"&gt;
          &lt;play:date&gt;2000-01-31 12:00ET&lt;/play:date&gt;
       &lt;/play:start&gt;
       &lt;play:sumary&gt;The Bryn Poeth Uchaf Folk festival&lt;/play:summary&gt;
   &lt;/play:event&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Such properties often have uniqueness and/or unambiguity
      properties. <em>enc:base64</em> for example is clearly a
      reversible transformation. It it relates two strings, on
      printable and the other a byte string with no other
      constraints. The byte string could not in general be
      represented in an XML document. The definition of
      <em>enc:base64</em> is that A when encoded in base 64 yields
      A. This allows any processor, given B to derive A. The
      specification of the encoding namespace (here refereed to by
      prefix <em>enc:</em>) could be that any conforming processor
      must be able to accept a base64 encoding of a string in any
      place that a string is acceptable.
    </p>
    <p>
      Interpretation properties make it clear what is going on. For
      example,
    </p>
    <pre>
&lt;rdf:description about="http://www.w3.org/"&gt;
   &lt;play:xml-cannonicalized parseType="Resource"&gt;
      &lt;enc:hash-sha-1 parseType="Resource"&gt;
         &lt;enc:base64&gt;jd8734djr08347jyd4&lt;/enc:base64&gt;
      &lt;/enc:hash-sha-1&gt;
   &lt;/play:xml-cannonicalized&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      clearly makes a statement, using properties quite
      independently defined for the various processes, that the
      base64 encoding of the SHA-1 hash of the canonicalized form
      of the W3C home page is jd8734djr08347jyd4. Compare this
      withe the HTTP situation in which the headers cannot be
      nested, and the encodings and compression and other things
      applied to the body are mentioned as unordered annotations,
      and the spec has to provide a way of making the right
      conclusion about which happened in what order.
    </p>
    <h2>
      Units of Measure (2006)
    </h2>
    <p>
      This pattern applies very well to units of measure.
    </p>
    <p>
      See, for example a simple ontology <a href=
      "http://www.w3.org/2007/ont/unit">http://www.w3.org/2007/ont/unit</a>
      of units of measure.
    </p>
    <h2>
      <a name="Conclusion" id="Conclusion">Conclusion</a>
    </h2>
    <p>
      Representing the interpretation of one string as an abstract
      thing can be done easily with RDF properties. This helps make
      a clean accurate model. However, using the concept for
      datatypes in RDF is incompatible with RDF as we know it
      today.
    </p>
    <hr />
    <p>
      See also:
    </p>
    <ul>
      <li>
        <a href="Identity.html">Expressing the identity of real
        things</a>
      </li>
    </ul>
    <p>
      <em>@@@Needs circle-and-arrow pictures for each attempt.</em>
    </p>
    <p>
      <a name="L380" href="#L382" id="L380">Note.</a> This section
      followed a discussion about "<em><a href=
      "/2001/01/ct24">Using XML Schema Datatypes in RDF and
      DAML+OIL</a></em> with DWC.
    </p>
    <p>
      <a href="mailto:gruber@ksl.stanford.edu">Thomas R. Gruber</a>
      and Gregory R. Olsen, KSL <a href=
      "http://www-ksl.stanford.edu/knowledge-sharing/papers/engmath.html">
      "An Ontology for Engineering Mathematics"</a> in Jon Doyle,
      Piero Torasso, &amp; Erik Sandewall, Eds., <em>Fourth
      International Conference on Principles of Knowledge
      Representation and Reasoning</em>, Gustav Stresemann
      Institut, Bonn, Germany, Morgan Kaufmann, 1994. <em>A non-RDF
      but thorough treatement including units of measure as scalar
      quantities.</em>
    </p>
    <p>
      Compare with <a href=
      "http://icosym-nt.cvut.cz/kifb/en/ont/sumo-units-of-measure.html">
      SUMO units of Measure</a> which seems have units as
      instances, and multupliers such as kilo, giga, etc as
      functions.
    </p>
    <p>
      A ittle off-topic, On linear and area memasure, John Baez's
      <a href="http://www.math.ucr.edu/home/baez/inches.html">"Why
      are there 63360 inches per mile?"</a> is good reaing.
    </p>
    <hr />
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <p>
      <a href="../People/Berners-Lee">Tim BL</a>
    </p>
    <p>
      (names of certain characters may have been misspelled to
      protect the innocent ;-)
    </p>
  </body>
</html>