Meaning.html 25.1 KB

Raw Blame History Permalink

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      The meaning of a document -- Axioms of Web architecture
    </title>
    <link rel="Stylesheet" href="di.css" type="text/css" />
    <meta http-equiv="Content-Type" content=
    "text/html; charset=us-ascii" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
    <address>
      Tim Berners-Lee<br />
      Date: 1999, last change: $Date: 2009/08/27 21:38:08 $<br />
      Status: personal view only. Editing status: first draft.
      <em>Written partly when the Namespace argument came around
      again and I realized that where there</em>
    </address>
    <p>
      <a href="./">Up to Design Issues</a>
    </p>
    <h3>
      Axioms of Web Architecture: the meaning of a document
    </h3>
    <p>
      <em>Abstract: The meaning of a document is then the product
      of some text in some language) and the meaning of the
      language. The text is found in a document and the language
      defined in a document called a schema.</em>
    </p>
    <hr />
    <h1>
      Meaning
    </h1>
    <p>
      <em>Grounding the meaning of a document in URI space.</em>
    </p>
    <p>
      What is the meaning of a document?
    </p>
    <p>
      The meaning of a document on the Web can be defined more
      precisely than an arbitrary paper document. Because we have
      the benefit of a global namespace (URIs), things become
      possible which were not before. One example is global
      hypertext; another is the rigid (though rarely absolute)
      specification of meaning. Just as a hypertext document can
      now exactly point to another document when it makes a
      reference (instead of making some vague natural language
      reference to it), so can a formal document make a precise
      reference to the language it uses.
    </p>
    <p>
      A writer of a document uses the language to convey his intent
      to the reader. It is essential that the intent of the writer
      can be well defined for both parties and in general for a
      third party.
    </p>
    <p>
      The "<dfn>language</dfn>" here I means the set of symbols,
      the syntactic rules which constrain their combination, and
      some semantics which are conveyed by defining their
      interpretation in one or more other formal language, or in
      some natural language.
    </p>
    <table border="5">
      <tbody>
        <tr>
          <td>
            The meaning of a document is then the product of the
            text of the document (in some language) and the meaning
            of the language.
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      On the Web, <a href="Axioms.html#Universality2">important
      things are identified by URIs</a>. This should clearly apply
      both to the document itself and to the language. The party
      which defines what a URI refers to I call the publisher, or
      owner of the URI. HTTP allows a delegated system of authority
      for ownership (DNS) to define ownership of URIs, and it also
      provides a network protocol to retrieve documents
      representing that identified by the URI. The text a document
      is defined by its publisher and the meaning of the language
      is defined by the publisher of the language.
    </p>
    <p>
      Natural languages are constantly evolving and rather vague,
      in that no one (except <em>Scrabble</em> players) use a
      particular dictionary as a definitive set of meanings. In
      practice, the meaning of a word in a natural language is the
      sum of the associations of that word -- logical or poetic --
      in the mind of the reader or writer. Of course society works
      on the basis of a very strong similarity of the webs of
      association in different people's minds.
    </p>
    <p>
      In the semantic web, however, meaning is not vague: the idea
      is that languages must be defined formally and as precisely
      as possible. The semantic web consists of some "terminal"
      languages which are defined solely in natural language terms,
      and some languages for which there are machine-readable
      interpretations into other formal languages. Whereas programs
      processing documents in the first sort of language will
      typically have to be hand coded, documents in the second set
      may be processed automatically to convert them into languages
      in the first set.
    </p>
    <p>
      URIs can be of various sorts, with various properties
      depending on their scheme (and, for http URIs, the
      publisher), but some URIs can be dereferenced to a definitive
      document. The document resulting from dereferencing the URI
      for a language is a place where the publisher of the language
      can put definitive information about the meaning of a
      language.
    </p>
    <h3>
      <a name="Language1" id="Language1">Language and document
      subsets</a>
    </h3>
    <p>
      As languages evolve, there can be many languages which are
      similar. "Similarity" doesn't mean much, but something which
      is well defined is when a document in one language A can be
      treated precisely as though it had been in another language
      B.
    </p>
    <h3>
      <a name="Meaning1" id="Meaning1">Meaning in XML</a>
    </h3>
    <p>
      In XML, a language is a "namespace", and the document about
      the language is called a "schema". In XML, one document can
      contain a mixture of languages, and so the schema if written
      in XML may contain information about syntactic constraints
      (in XML-schema language) and/or RDF properties (in rdf-schema
      language), or any combination of the above. (<a href=
      "#Language">note</a>)
    </p>
    <p>
      XML puts no constraints on a language apart from syntactic
      structure. There is not (without RDF and logic or some other
      higher level) any overall framework into which new languages
      can be introduced. So, the question of <strong>what an XML
      document means depends</strong> first upon the fully
      qualified name of the <strong>document element</strong>. No
      semantics can be attached to any of its descendents in the
      document tree except in as much as is defined by the
      specification of that element type in that namespace. One
      cannot talk about the "meaning" of a subtree of a document
      without understanding the semantics of the language. In fact,
      because languages only necessarily define meaning for
      documents, the only way one can talk about the meaning of a
      subset of a document is to define a how those parts of the
      document can be reassembled into a second whole document.
      This is what must be done when a digital signature is applied
      to a document.
    </p>
    <h3>
      <a name="Meaning" id="Meaning">The Meaning of Digital
      Signature</a>
    </h3>
    <p>
      The language defines semantics. On the simple philosophy that
      one place is enough, It is not the place of a digital
      signature to define semantics. A digital signature on a
      document may give a party reason to use the information
      therein for purposes it would not have otherwise. The issuer
      of a public key may also put constraints on what sort of
      guarantees are made by signature with a given key. But the
      signature itself must not affect the semantics - the meaning
      - of a document. To allow it to would be to create an
      inconsistency between the intent of the writer of the
      original document and the meaning of the signed document. So,
      signatures themselves have no meaning. The meaning has to be
      ascibed to them by other documents. For example, I may say,
      "If an organization is a member of W3C according to a
      document signed with this key, then that organzation is
      indeed a member". That is a trust statement which gives the
      key a connection into the world of meaning of documents.
    </p>
    <h3>
      <a name="Style" id="Style">Style as meaning</a>
    </h3>
    <p>
      (Although few people would think of presentation style of a
      document as its "meaning", and many of us spend a lot of time
      emphasising the difference between style and content and
      semantics, in fact much of what applies to style applies to
      semantics. Therefore the "meaning in terms of presentation"
      is a good test case for the architecture of the system. (For
      many documentation systems, the only semantics required is
      "H2 means a big bold block on the left"!) Style sheets
      provide an "interpretation"of a document by mapping it onto
      another well-defined language of formatting properties. The
      style sheet language gives a good definition (in English) of
      what is needed. This is an interesting comparison, and I
      mention it as a place where architectural conssistency should
      be maintained, but it isn't what I normally mean by
      "meaning".)
    </p>
    <h3>
      <a name="Logical" id="Logical">Logical meaning</a>
    </h3>
    <p>
      When XML is used to encode logic, then a document is a
      formula and the (see <a href="Logic.html">Logic on the
      web</a>). Then, the way new predicates and constants interact
      is defined by the logic. The way fundamental new parts of the
      language (such as quantification) are added is part of a more
      general question of how arbitrary languages interact.
      Examples we have seen are the mixing of XHTML and XSL. What
      is the result - XHTML or XSL? A document or a style sheet?
      Both?
    </p>
    <h3>
      <a name="Mixing" id="Mixing">Mixing Languages</a>
    </h3>
    <p>
      XML puts no contarints on a language apart from syntactic
      structure. There is not (without for example RDF and logic)
      some overall framework into which new languages can be
      introduced. This means that every language has to define how
      it canbe extended by mixing with other languages. Typically
      it will indicate the element types which can be subclassed by
      extensions and therefore incorporated into documents wherever
      that element type is allowed.
    </p>
    <p>
      One particular example of such a type is common to almost all
      languages. This is the sentence, the fully qualified
      assertion or statement, the formula with no free variables.
      Almost all whole documents count as such, though an
      interesting counterexample is a style sheet which represent a
      function: it specified the result document as a functin of an
      input document, and so itself cannot be said to be a
      stand-alone statement. (If I sent you a message consisting
      only af a stylesheet with no coverletter, what would it
      signify? What would it mean if I digitally signed it?)
    </p>
    <p>
      With that exception, it clearly makes sense to allow any
      language which has the concept of a sentence -- maybe any
      language at all - to allow sentences from other languages to
      be included anywhere where a sentence of its own could go.
      <strong>This should be a generic feature of XML
      schemas</strong>.
    </p>
    <p>
      (It is would be against the minimalist principle for XML
      generically to define other common subclasses. Note that the
      RDF spec does define properties and node types and the
      concept of subclassing in RDF. HTML defines things like block
      and inline elements, which can be subclassed in extensions;
      SVG and SMIL probably define similar concepts. The
      significance of this when looking at downloaded support code
      would be that, for example, in a set of Java classes
      implementing HTML, that any subclass of "Inline element"
      would export the same software API to allow it to be
      justified and line wrapped in a text flow object. So there is
      a natural correspondence between element type subclassing and
      support class subclassing, but the tow must remain distinct.
      Language specifications must always define what a language
      means without refering to implementations if they can
      possibly avoid it)
    </p>
    <p>
      Note that without the assurances given by such information
      you cannot just go around embedding one language in another.
      Every language has to address the issue which the concept of
      RDF transparency potentially solves for RDF. A surrounding
      XML context must have the ability to quote, deny, negate or
      whatever any element. In fact, nothing in XML says that the
      menaing of a fragment is not affected by thing anywhere else
      in a document. Nothing suggests that the process of removing
      sub-trees creates a valid document. (How does xml fragment
      deal with this?)
    </p>
    <h3>
      <a name="Grounded" id="Grounded">Grounded documents</a>
    </h3>
    <p>
      We can say a document is "grounded" if its meaning is
      completely defined because every term used is explicitly,
      directly or indirectly, an explicit direct or indirect
      referece to its definition in a document on the Web. Clearly
      a definition of "grounding" depends on the set of documents
      one considers acceptable definitions. "Grounded in W3C
      Recommendations" would imply that the closure under [i.e. set
      of all the things you can possibly end up with by repeated
      applications of] the operation of looking up definitions
      would be a subset of the set of W3C recommendations.
    </p>
    <p>
      This is the basis for the entire web and internet
      architecture stack today. (See also: <a href=
      "Stack.html">Stack</a>) . All commercial use on the web is
      largely to be considered in this light, that the meaning of
      each messaeg sent across the Internet is well-defined by a
      series of specifications.
    </p>
    <p>
      (A sense of grounding also can be appliyed seperately to
      different sorts of "understanding". When "understanding"
      means presentation to a human for human understanding, a
      presentation-grounded documents points to all information
      such as schemata and style sheets which will enable it to be
      presented.)
    </p>
    <h3>
      Grounding as a myth: the Web of Meaning
    </h3>
    <p>
      The concept of grounded documents is important for
      predicatble systems, but it is a bad model for the web -- or
      for life -- in the long run. Words in a <em>natural</em>
      langauge such as English are not grounded in a unique base
      set<a href="#Grounding">*</a>. Every time you look one up in
      the dictionary all you find are more words. The world is
      web-like, and any attempt by the Web to constrain it to be
      tree-like is bound to force a misrepresentation of realtity.
      This is the Wittgenstein view of meaning. Understanding this
      view sometimes confuses people about the very systematic way
      in which meaning in Internet protocols is defined by layers
      and layers of specs.
    </p>
    <p>
      In fact, the two views both apply, one nested inside the
      other. Yes, meaning is use - but in the Internet protocols,
      society has set up social constraints - laws and other
      expectations - which constrain use to be according to the
      specs. This is a social constraint which your computer is
      under when you use the Internet, just as when you fill out a
      tax form you don't have a choice as to how to interpret the
      meaning of "Adjusted Gross Income on line 39 of a US IRS form
      1040". There is a whole department of the government which
      defines what it is and which socially owns the term. So while
      the
    </p>
    <p>
      What will change with the Semantic Web's development is that
      its grounding in legacy systems will fade into history. Right
      now, the meaning of "Invoice total vale" is effectively
      defined by the software which you plug your RDF document
      into, and how it treats invoices. This is an important way to
      bootstrap the semantic web with useful terms. That will
      become less important as many different software poducts
      share teh same term. In the end, it is weblike form which
      will characterize the semantic web. Everyone will be defining
      things in terms of other things which they feel are useful
      and stable enough. It will be impossible to insist that there
      be a global ordering between more basic and less basic
      specifications -- and to do so would stop the web scaling. No
      one will agree on a directed <em>acyclic</em> graph
      determining what terms are "more basic" than others. For any
      set of definitions in one direction, there can always be some
      reverse definitions which can be seen by others as just as
      valid.
    </p>
    <p>
      So, while the concept of documents grounded in a given base
      set is important for interoperability, it must not be seen as
      a goal to force the semantic web into an acyclic structure.
      There will be no single Dewy decimal system for the semantic
      web. The concepts of well-defined stable specifications will
      still be essential. So will respect for the definitions of
      terms. The difference will be that any one will chose their
      own set of langauges they consider "basic", and find ways of
      defining other languages they come across in terms of those.
      A rich web of conversions, translations will grow up to
      support this. The web of trust will provdie tools for
      navigating within and selecting from this web in a safe way.
      And of course, global standarsdw il wlways make like much
      easier where they can be made.
    </p>
    <h3>
      FAQ: Surely meaning is only defined by use?
    </h3>
    <p>
      <em>This is all very well</em>, runs a popular line,
      <em>except that to talk about "meaning" at all is basically
      bogus</em>. <em>The meaning of words, and therefore
      languages, is defined by use - by how people actuall respond
      to them, by how they are processed. Surely the only way I can
      guarantee that someone will interpret a document in a
      particular way is to have some out-of-band agreement with
      them first?</em>
    </p>
    <p>
      Philosophically, it is indeed the case that you need some
      out-of-band (not in the message itself) agreement. In real
      life, though, in fact there a lot of widely-held agreements.
      In fact, the law is a set of agreements which you are deemed
      to accept whether you formally agree or not. So when you are
      sent a tax form, you can't argue that the language of the tax
      form is not one you interpret in that way. they just stick
      you in jail.
    </p>
    <p>
      The web works like one big agreement. By connecting your
      computer to it and getting email from POP and IMAP ports,
      there is an understanding that what you get are MIME
      messages, and the same thing when you pick up web page using
      HTTP. So by using the web you are entering a world where the
      assumption can be made that messages are to be interpreted by
      a set of specifications. the specifications are (currently)
      generally written in english, and imperfect, but basically
      debate about them is practically about details, not aboutteh
      philosophy as to whether they apply. So that is why one can
      in practice talk about meaning.
    </p>
    <h3>
      FAQ: Doesn't the meaning of a document depend on its context?
    </h3>
    <p>
      Of course it does. If i exclose a phtocopy of a document as
      an attachment, it doesn't mean I am sending you that letter.
    </p>
    <p>
      However, theer are a lot of contexts for a document which
      have the same implication for the meaning of that document.
      Publication, by email to a public list, or HTTP, or FTP, or
      printing on paper and nailing to a tree, in each case leaves
      the meaning of a document defined in the same way. These
      contexts, in which a document is published by a party, or a
      message converyed from one party to another, are so common
      and basic that the meaning of the document in these contexts
      is referred to simply as the meaning of the document (or
      message).
    </p>
    <p>
      The webarchitecture separately enumerates the ways in which
      these contexts actually work under he hood (publication using
      HTTP, etc) and teh way documents are interpreted and dealt
      with once published. That way, XML langauegs don't ahve to
      keep referring to "meaning when received with a 200 code in
      HTTP".
    </p>
    <hr />
    <h2>
      See also
    </h2>
    <ul>
      <li>
        <a href="Metadata.html#Self-descr">Self-describing
        information in "Metadata"</a>
      </li>
      <li>
        <a href="Evolution.html">Evolvability</a>
      </li>
    </ul>
    <h2>
      Footnotes
    </h2>
    <h3>
      <a name="Name-less" id="Name-less">Name-less and Address-less
      systems</a>
    </h3>
    <p>
      (Technically, it is possible to create a network with
      "source-based routing" in which everything whether server or
      document is identified by an md5 checksum or other random
      unique ID, and network nodes learn to send packets with full
      routing instructions. This is a little like the old email
      addresses which specified a routing path like
      timbl@cernvax!mcvax!mitmail!whatever. The process of
      hypertext link involves the client A contacting the server B
      of the source document of the link and finding the path which
      B had stored as a way to get from B to the server C of the
      link's destination document. Then the client A can contact C
      first through the root ABC but then from local information
      and information from B and C can maybe derive a more
      efficient route AC. Such a system has different scaling
      properties as a subset of teh information about the network
      must reside in the network hosts rather than in the routers.
      Its efficeny and scaling properties rely on features of the
      topology of the web such as locality of reference.)
    </p>
    <h3>
      <a name="Language" id="Language">Language identity crisis in
      XML</a>
    </h3>
    <p>
      (There is currently (1999/9) much debate in the XML world
      over exactly what defines a language, the proposed answers
      ranging though: the publisher of the namespace including any
      information in the definitive schema; a separate note of a
      schema; a schema plus a different namsepcae URI document plus
      a version plus an HTML profile; and "nothing". If this debate
      resolves itself such that athe identity of a language is not
      clearly defined. In that case the XML namespace mechanism may
      prove an insufficiently firm foundation for the semantic web,
      or any application of data on the web.)
    </p>
    <h3 id="Grounding">
      Grounding of words in English
    </h3>
    <p>
      (Distracton: Is there a set of english words in the OED
      which, if understood, allow one to understand any definition
      by sufficient recursive dereferencing?)
    </p>
    <h3>
      References:
    </h3>
    <p>
      DNS mess: Weaving the Web p126, etc.
    </p>
    <p>
      <a href=
      "http://www.ietf.org/mail-archive/ietf-announce/msg05299.html">
      Carpenter, Brian, et. al , "IAB Technical Comment on the
      Unique DNS Root", IETF-announce, 1999/9/27.</a>
    </p>
    <h2>
      Fodder
    </h2>
    <p>
      [@@ Dan's quote (Ted N?) about all things being hopelesly?
      intertwigled@@ :-) .. maybe some Bhuddist quotation about
      interconnectedness...]
    </p>
    <p>
      "I'm very glad you asked me that, Mrs Rawlinson. The term
      `holistic' refers to my conviction that what we are concerned
      with here is the fundamental interconnectedness of all
      things. I do not concern myself with such petty things as
      fingerprint powder, telltale pieces of pocket fluff and inane
      footprints. I see the solution to each problem as being
      detectable in the pattern and web of the whole. The
      connections between causes and effects are often much more
      subtle and complex than we with our rough and ready
      understanding of the physical world might naturally suppose,
      Mrs Rawlinson. Let me give you an example. If you go to an
      acupuncturist with toothache he sticks a needle instead into
      your thigh. Do you know why he does that, Mrs Rawlinson? No,
      neither do I, Mrs Rawlinson, but we intend to find out. A
      pleasure talking to you, Mrs Rawlinson. Goodbye." -- Douglas
      Adams, _Dirk Gentley's Holistic Detective Agency
    </p>
    <p>
      <a href="http://www.xent.com/nov99/0596.html">quoted in
      Fork</a>
    </p>
    <p>
      @@ Statistiscs from OED
    </p>
    <p>
      <a href=
      "http://www.eastgate.com/ht99/slides/Welcome.htm">Mark
      Bernstein, "Everything is intertwingled"</a>.Opening Keynote,
      Hypertext '99, Darmstadt, Germany. February 23, 1999.
    </p>
    <hr />
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <p>
      <a href="../People/Berners-Lee">Tim BL</a>
    </p>
  </body>
</html>