Identity.html 19 KB

Raw Blame History Permalink

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      -- Axioms of Web architecture
    </title>
    <link rel="Stylesheet" href="di.css" type="text/css" />
    <meta http-equiv="Content-Type" content="text/html" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000">
    <address>
      Tim Berners-Lee<br />
      Date: 1998, last change: $Date: 2009/08/27 21:38:07 $<br />
      Status: personal view only. Editing status: first draft.
    </address>
    <p>
      <a href="./">Up to Design Issues</a>
    </p>
    <h3>
      Web Design Issues
    </h3>
    <p>
      <em>This page assumes an imaginary nsmespace referred to as
      play: which is used only for the sake of example. The readers
      is assumed to be able to guess its specifictaion.Much of this
      page was originally a note to the <a href=
      "Syntax.html">Strawman Syntax</a>.</em>
    </p>
    <hr />
    <h3>
      <a name="Identifier" id="Identifier">Identifiers - what is
      identified?</a>
    </h3>
    <p>
      When XML is used to represent a directed laballed graph which
      is used to represent information about things, then one must
      be able to make statements about parts of an XML document,
      parts of the DLG (such as RDF nodes) and of course the
      objects described.
    </p>
    <p>
      In most cases it seems obvious to the human reader. The jam
      jar label text does not (normally) read "jam jar label text"
      or "jam jar label" or "jam jar" but "jam".
    </p>
    <p>
      Take the case of a statement about a person in amaginary
      syntax
    </p>
    <pre>
&lt;z:person id="foo"&gt;
   &lt;head&gt;
      &lt;play:author&gt;Zoe&lt;/play:author&gt;
   &lt;/head&gt;
   &lt;play:name&gt;Albert&lt;/play:name&gt;
   &lt;play:mailbox resource="mailto:adoe@bar.com"/&gt;
   &lt;play:son-name&gt;Bill&lt;/play:son-name&gt;
   &lt;play:daughter-name&gt;Claire&lt;/play:daughter-name&gt;
   &lt;play:father&gt;
      &lt;z:name value="Joe"/&gt;
      &lt;z:wrote href="#foo"&gt;
      &lt;z:friend resource="#foo"/&gt;
   &lt;/play:father&gt;
&lt;/z:person&gt;
</pre>
    <p>
      The XML element has one attribute and four child elements.
      The RDF node has three properties (stated here). The person
      Albert has two children. What so we refer to is we refer to
      "#foo"? Of course we refer to the element - but when we make
      RDF statements, we normally want to refer to the RDF node, or
      rather the object described by the node, in RDF terms the
      <em>resource</em>.
    </p>
    <p>
      Of course, in a typical unix programming language we would
      simply add a syntax character to distinguish the forms of
      reference: #foo would be the node, and @#foo (or something)
      would be the object refered to. But in this case we are
      trying to do everthing with RDF, and what is left with XML,
      and so we would lose a few points by adding instead some
      totally new syntax. What we <em>can</em> do is to use
      different attribute names for the different forms of
      reference. The attribute names I used above are as follows:
    </p>
    <table border="1">
      <caption>
        Forms of reference to the object of a property
      </caption>
      <tbody>
        <tr>
          <td>
            <code>value</code>
          </td>
          <td>
            litteral string
          </td>
        </tr>
        <tr>
          <td>
            <code>href</code>
          </td>
          <td>
            taking the string as a URI with or without fragment
            identifier, the text (or XML fragment or whatever
            medium) to which it refers.
          </td>
        </tr>
        <tr>
          <td>
            <code>resource</code>
          </td>
          <td>
            taking a string as a URI with fragment idenifier, the
            abstract RDF object (rdf:resource) corresponding to the
            identified XML document fragment.
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      Here I have used "href"to allow RDF to refer to the XML
      model. This is important, as for example it is bits of XML
      which one digitaly signs, not (in sigend XML) bits of RDF.
      Also, it is useful for RDF to be able to talk about XML
      elements. It brings up the question of what an RDF fragment
      identifier means.
    </p>
    <h2 id="clash">
      RDF and XML fragment identifiers clash
    </h2>
    <p>
      <em>This highlights (2000/02) a bug in the relationship
      between XML and RDF</em>
    </p>
    <p>
      Consider what is identified by
    </p>
    <p style="text-align: center">
      <code>http://.../foo.rdf#bar</code>
    </p>
    <p>
      when <code>...foo.rdf</code> contains among other things the
      following:
    </p>
    <pre>
&lt;rdf:description rdf:id="bar"&gt;
   &lt;rdf:type resource="...#person"&gt;
   &lt;y:common-name&gt;Ora Lassila&lt;/y:common-name&gt;
   &lt;y:mailbox&gt;ora.lassila@research.nokia.com&lt;/y:mailbox&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      The meaning of the fragment identifier is taken from the
      specification assocaitedwith the MIME type.
    </p>
    <p>
      Therfore, if this is takes as a document of type
      application/rdf, then the fragment identifier identifies the
      thing (person in this case, Ora) described RDF node. This is
      how refernces are used in RDF.
    </p>
    <p>
      However, if its considered to be of type text/xml then the
      fragment identifier is defined bythe XML spec, and so
      references an element whose attrubute XML:ID. has value
      "bar". It happens that the <code>rdf:id</code> is
      <em>not</em> defined to be an xml:id but is defined to "act
      like one", whatever that means, by the RDF spec. So it isn't
      clear whether the reference to this would be to the XML
      subtree (consisting of the rdf:description element and its
      contents) or would be undefined or possibly a refernce to
      some other element which happened to have id="bar".
    </p>
    <p>
      To have a different interpretation of a URI as a function of
      the notional type of the document belies the fact the point
      of using XML syntax for RDF was that RDF documents should be
      XML documents! Of course we embed RDF in regular XML
      documents. So this distinction is nonsense.
    </p>
    <p>
      Of course, the RDF spec can simply use the XML definition
      indirectly and refer to the RDF ndoe described by the XML
      element. Howvere, this is not powerful enough for RDF. This
      is because RDF needs to be able to make statements about XML
      documents and XML elements. So for example, I might want to
      state that I wrote the above snipet. It would be very
      tempting to write that I am the author of foo.rdf#bar. But I
      am not the author of Ora Lassila. RDF uses and parseType to
      resolve this for inline data: parseType=Resource indicates
      that the reference is to the RDF object, and
      parseType=Literal indicates that it is to the XML. The thing
      could be resolved with an interpretion property which
      expresses relationship between an XML subtree and an RDF
      object which it describes. While it would be good to define
      that property, RDF syntax needs a shortcut. I would propose
      that "resource=" which is used to point to a resource be also
      used for a resource fragment id, and that a new syntax be
      introduced to refer to the actual RDF node. maybe "object="
      which happens here to correspond to the (subject, predicate,
      object) sense -- as well as a "thing" sense. (The former is
      what is the reason for chosing it - the attribute should
      express the relationship, not the class of the thing refered
      to in general!).
    </p>
    <h2>
      <a name="Naming" id="Naming">Naming properties and
      elements</a>
    </h2>
    <p>
      We have a similar problem in the XML-RDF relationship looking
      atthe identity oat the schema level.
    </p>
    <p>
      In RDF M&amp;S 1.0, a property name defined in a namespace is
      formed by directly concatenating the namepsace URI with local
      tag name of the XML element.
    </p>
    <p>
      One natural way to use this is to end the namespace URI with
      "#" so that the local tag name becomes the fragment
      identifier. When the schema is written in XML, this implies
      that the tag name, being a simple alphanumeric, will identify
      something in the document by its XML ID. This is a constraint
      on the schema language: the XML ID of an element must be
      usable as a reference to the thing being defined.
    </p>
    <p>
      When there is a 1:1 mapping netween RDF properties and XML
      element types, there is a choice of
    </p>
    <ol>
      <li>giving them the same URI and distinguishing which is
      refereed to by context (as in resource= and object= above),
      or
      </li>
      <li>giving the different URIs algorithimically related, like
      assuming that #foo-element means the element defining #foo,
      using a convention specified in eth schema languages, or
      </li>
      <li>giving them totally distinct URIs which can be connected
      by an assertion in the schema, or an in
      </li>
    </ol>
    <p>
      Given that it is interesting to use RDF to make statements
      about XML element types, having different names it appealing.
      As writing down the relationship every time the algorithmic
      link is un appealing.
    </p>
    <h3>
      <a name="generic" id="generic">A generic problem with XML
      identifiers</a>
    </h3>
    <p>
      <span class="detail">(I notice in passing that XML has
      currently a mixture of identifier paces which is a little
      confusing.</span>
    </p>
    <p class="detail">
      The element and attribute namespace is very well handled in
      terms of abbreviations, and is grounded in URI space, using
      the XML namespaces spec.
    </p>
    <p class="detail">
      The URI space is of course the same space, but when value is
      typed as a URI, then it cannot use the abbreviation system of
      the elelemnt namespace.)
    </p>
    <h3>
      <a name="IDREF" id="IDREF">IDREF considered harmful</a>
    </h3>
    <p>
      The local identifier space is a subset of URI space. When an
      attribute is defined as a URI, the simple "#" prefix gives
      access to the local ID space - while still allowing great
      pwer of expression by reference to anything else on the Web.
      When the "idref" form is used, this is not possiible. The
      idref form is a weak form IMHO and not wise for new designs
      which are not to be deliberately constraining.
    </p>
    <p>
      Others have noticed this problem and there have even been
      suggestions which confused the URI prefix and the namespace
      prefix. In fact the problem can be solved [ref eric
      whiteboard] with an escape of some sort. One prossibility is
      ambushing a void URI schme name by using a colon prefix
      (suggested by Eric Prud'hommeaux)
    </p>
    <p class="detail">
      <code>href=":rdf:description"</code>
    </p>
    <p>
      would be a perfectly valid URI (in an XML context) which
      referenced the rdf:description URI using the defined rdf:
      namespace. I feel this is messy, as it would have to be
      subject to different handling than any other URI: its
      expansion would be done in an XML-specific way.
    </p>
    <p>
      The other link you need is the ability, when using an element
      name which only occurs once, and without changing the default
      namespace, it would clearly be logical to be able to just
      write
    </p>
    <p class="detail">
      <code>&lt;http://foo.com/schemas/memo6.2#priority&gt;a&lt;/[...]&gt;</code>
    </p>
    <p>
      Because what follows uses the full power of what precedes
      with generality, we may need to see the first in use before
      the paper is over. But I can't see making the second change
      to XML.)
    </p>
    <hr />
    <p>
      <span class="detail">[We could derive</span> <em class=
      "detail">resource</em> <span class="detail">as a shorthand
      for indicating that the object refeerred to is that refered
      to by the given element, with an explicit coersion</span>
    </p>
    <p class="detail"></p>
    <pre class="detail">
&lt;z:person id=foo&gt;
   &lt;head&gt;
      &lt;z:author&gt;Zoe&lt;/z:author&gt;
   &lt;/head&gt;
   &lt;z:name&gt;Albert&lt;/z:name&gt;
   &lt;z:mailbox&gt;mailto:adoe@bar.com&lt;/z:mailbox&gt;
   &lt;z:son-name&gt;Bill&lt;/z:son-name&gt;
   &lt;z:daughter-name&gt;Claire&lt;/z:daughter-name&gt;
   &lt;z:father&gt;
      &lt;z:name value="Joe"/&gt;
      &lt;z:wrote href="#foo"&gt;
      &lt;z:friend&gt;
         &lt;rdf:node href="#foo"&gt;
      &lt;/z:friend&gt;
   &lt;/z:father&gt;
&lt;/z:person&gt;
</pre>
    <p>
      <span class="detail">This could fromally model what is going
      on but it a mess: every rdf arc has to be doubled!. Rref is
      in fact more fundamental and basic to RDF, and href is an
      added level-breaker for breaking levels.]</span>
    </p>
    <p>
      In my opinion, when you look at this analysis, the fact that
      the abstract object in RDF is known as a resource and that
      that is different from the "Resource" which is the R in URI,
      this is very confusing.
    </p>
    <p>
      RDF M&amp;S solvesa similar problem to this with ID for the
      object and BAGID for the container if statements.
    </p>
    <p>
      RDF uses the "resource" to indicate an object described by a
      node in the RDF graph. In the above example, "#foo" in the
      XML lamnguage idenifies the element &lt;z:person ... Hwoever,
      in RDF #foo refers (I understand) to the person themselves.
      This means that the
    </p>
    <p>
      <em>@@@ - Note the relationship between an object and a URI
      is non-perfect except for an abstract resource.... give
      examples (home page, mailbox, common name).... Compare with
      SQL - no identifiers, only use properties.@@@</em>
    </p>
    <h2>
      <a name="Expressing" id="Expressing">Expressing Identity of
      real things</a>
    </h2>
    <p>
      Resources (as in Universal Resource Identifer) are precisely
      that identified by URIs. Web pages and email messages are
      thought of as resources. RDF unfortunately uses the term for
      anything which can be talked about - any concept no matterhow
      abstract. RDF was originally designed as a solution for
      metadat - information about information - where the subject
      of discourse was by defintion a Web resource. There was no
      problem with terminology. As we use RDF to describe things
      other than web pages, we in fact use properties to identify
      them, for example we use email addresses to idnetify people.
      We must not muddle the email address with the person.
    </p>
    <p>
      Consider <a name="this" id="this">this example</a>
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type&gt;http://www.people.org/types#person&lt;/a&gt;
   &lt;play:name&gt;Ora Yrj&ouml; Uolevi Lassila&lt;/play:name&gt;
   &lt;play:mailbox resource="mailto:ora.lassila@research.nokia.com"/&gt;
   &lt;play:homePage resource="http://www.w3.org/People/Lassila"/&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      Now that represents five nodes in the RDF graph: the
      anonymous node for Ora himself (who has no web address) and
      the four arcs sepcifying that this thing is of type person,
      and has a commin name, email address and home page as given.
    </p>
    <p>
      Some of the properties are unambiguous in some way: two
      people which have the same mailbox may be assumed to be the
      same person. (I won't get into the rat-hole of what identity
      properties should be assumed for what identifiers - that is
      not core to the discussion)
    </p>
    <p>
      I imagine that many processors will use their knowledge
      (preprogrammed or from a schema) about uniqueness of such
      properties to make conclusions. For example, if we define
      <em>play:mailbox</em> to be such that no two people are
      allowed to share the same play:mailbox property. Then, the
      information
    </p>
    <pre>
&lt;rdf:description&gt;
   &lt;rdf:type resource="[...]book"/&gt;
   &lt;play:author parseType="Resource"&gt;
       &lt;play:mailbox
     resource="mailto:ora.lassila@research.nokia.com"/&gt;
   &lt;/play:author&gt;
&lt;/rdf:description&gt;

&lt;rdf:description&gt;
   &lt;play:name&gt;Ora Lassila&lt;/play:name&gt;
   &lt;play:mailbox resource="mailto:ora.lassila@research.nokia.com"/&gt;
   &lt;play:homePage resource="http://www.w3.org/People/Lassila/"/&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      allows the system to conclude that the name of the author of
      the book is Ora Lassila.
    </p>
    <p>
      This actually exposes what is really happening when we say as
      a short cut that "the author is
      ora.lassila@research.nokia.com". What we mean is that the
      author is somebody with that internet mailbox. To expose such
      a two-step process exposes the actual nature of the identity
      relationships, and also their limitations. This is, in my
      opinion, a much cleaner way to model the data. Sometimes we
      need a shortcut:
    </p>
    <pre>
&lt;rdf:description rdf:type="[...]book"&gt;
   &lt;rdf:type&gt;[...]book&lt;rdf:type/&gt;
   &lt;play:author-mailbox
     resource="mailto:ora.lassila@research.nokia.com"/&gt;
&lt;/rdf:description&gt;
</pre>
    <p>
      RDF treats people as "resources" and in using this terminolgy
      which normally means tautologically "things with URIs" makes
      people expect Ora to be synonymous with his web page or his
      mailbox. This is not of course good design. When a shortcut
      is made such as author-mailbox it is important to to realize
      what is happening. In this model, there is no RDF node for
      the author himself. The fact that there is a person involved
      who wrote the book and has the mailbox has to be expressed
      elsewhere. The RDF schema may indeed be a good place for
      that, once we have the vocabulary, as that will make the
      expansion of the short cut evident to any processing
      machinery.
    </p>
    <p>
      The unambiguous nature of the <em>play:mailbox</em> meant
      that it could be used as a way of identifying something. As a
      raw URI, <code>mailto:ora.lassila@research.nokia.com</code>
      idenifies the abstract mailbox to and from which email can be
      sent. However, the <em>play:mailbox</em> property allows one
      to identify a person. Its unambiguousness allows us to step
      from the literal "written by <strong>a</strong> person whose
      email is ora@w3.org" to the more useful "written by
      <strong>the</strong> person whose mailbox is ora@w3.org".
    </p>
    <hr />
    <hr />
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <p>
      <a href="../People/Berners-Lee">Tim BL</a>
    </p>
  </body>
</html>