RDFnot.html 20.9 KB

Raw Blame History Permalink

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      Web design issues; What a semantic can represent
    </title>
    <meta http-equiv="Content-Type" content="text/html" />
    <link href="di.css" rel="stylesheet" type="text/css" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000">
    <address>
      Tim Berners-Lee
      <p>
        <small>Date: September 1998. Last modified: $Date:
        1998/09/17 20:10:41 $</small>
      </p>
      <p>
        Status: . Editing status: Comments please. An parenthetical
        discussion to the <a href="Architecture.html">Web
        Architecture at 50,000 feet</a>. and the <a href=
        "Semantic.html">Semantic Web roadmap</a>.
      </p>
    </address>
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <hr />
    <p>
      Parenthetically, so as not to disturb the flow of what a
      semantic web <i>is</i>,...what it is not, and how other data
      models map into directed labelled graphs.
    </p>
    <h1>
      What the Semantic Web can represent
    </h1>
    <p>
      There are many other data models which RDF's Directed
      Labelled Graph (DLG) model compares closely with, and maps
      onto. This page is written with the intention of enumerating
      the similarity and diferences between the models, to indicate
      how the mapping might be done and what extra information
      muast be added in the process. Where the other models are
      related to previous unmet promises of computer science, now
      passed into folk law as unsolvable problems, they suggest a
      fear that the goal of a Semantic Web is inappropriate.
    </p>
    <p>
      One consistent difference between the Semantic Web and many
      data models for programming langauges is the "closed world
      assumption".
    </p>
    <h3>
      <a name="Semantic" id="Semantic">A Semantic Web is not
      Artificial Intelligence</a>
    </h3>
    <p>
      The concept of machine-understandable documents does not
      imply some magical artificial intelligence which allows
      machines to comprehend human mumblings. It only indicates a
      machine's ability to solve a well-defined problem by
      performing well-defined operations on existing well-defined
      data. Instead of asking machines to understand people's
      language, it involves asking people to make the extra effort
    </p>
    <p>
      Even though it simple to define, RDF at the level with the
      power of a semantic web will be complete language, capable of
      expressing paradox and tautology, and in which it will be
      possible to phrase questions whose answers would to a machine
      require a search of the entire web and an unimaginable amount
      of time to resolve. This should not deter us from making the
      language complete. Each mechanical RDF application will use a
      schema to restrict its use of RDF to a deliberately limited
      language. However, when links are made between the RDF webs,
      the result will be an expression of a huge amount of
      information. It is clear that because the Semantic Web must
      be able to include all kinds of data to represent the world,
      tha the language itself must be compeletely expressive
    </p>
    <h3>
      <a name="semantic2" id="semantic2">A semantic Web will not
      require every application to use expressions of arbitrary
      complexity</a>
    </h3>
    <p>
      Even though the language itself allows expressions of
      arbitrary complexity and computability, applications which
      generate RDF will in practice be limited to generating simple
      expressions such as access control lists, privacy
      preferences, and search criteria. This does not mean that
      where a "not" is needed, it should not be drawn from a
      standard vocabulary so than any RDF engine will be able to
      recognise it as a "not".
    </p>
    <p>
      (more)
    </p>
    <h3>
      <a name="semantic1" id="semantic1">A semantic Web will not
      require proof generation to be useful: proof validation will
      be enough.</a>
    </h3>
    <p>
      The first uses, such as access control on web sites, involve
      validation of a previously prepared proof, not a requirement
      to answer an arbitrary question, find the path the construct
      a valid proof. It is well known that to search for and
      generate a proof for an arbitrary question is typically an
      intractable process for many real world problems, and RDF
      does not require this (unsolvable) problem to be solved to be
      useful.
    </p>
    <h3>
      <a name="semantic" id="semantic">A semantic web is not an
      exact rerun of a previous failed experiment</a>
    </h3>
    <p>
      Other concerns at this point are raised about the
      relationship to Knowledge representation systems: has this
      not been tried before with projects such as <a href=
      "Semantic.html#kif">KIF</a>and <a href=
      "Semantic.html#cyc">cyc</a>? The answer is yes, it has, more
      or less, and such systems have been developed a long way.
      They should feed the semantic Web with design experience and
      the Semantic Web may provide a source of data for reasoning
      engines developed in similar projects.
    </p>
    <p>
      Many KR systems had a problem merging or interrelating two
      separate knowledge bases, as the model was that any concept
      had one and only one place in a tree of knowledge. They
      therefore did not scale, or pass the test of independent
      invention. [see evolvability]. The RDF world, by contrast is
      designed for this in mind, and the retrospective
      documentation of relationships between originally independent
      concepts.
    </p>
    <h3>
      <a name="Knowledge" id="Knowledge">Knowledge Representation
      goes Global</a>
    </h3>
    <p>
      Knowledge representation is a field which is currently seems
      to have the reputation of being initially interesting, but
      which did not seem to shake the world to the extent that some
      of its proponents hoped. It made sense but was of limited use
      on a small scale, but never made it to the large scale. This
      is exactly the state which the hypertext field was in before
      the Web. Each field had made certain centralist assumptions
      -- if not in the philosophy, then in the implementations,
      which prevented them from spreading globally. But each field
      was based on fundamentally sound ideas about the
      representation of knowledge. The Semantic Web is what we will
      get if we perform the same globalization process to Knowledge
      Representation that the Web initially did to Hypertext. We
      remove the centralized concepts of absolute truth, total
      knowledge, and total provability, and see what we can do with
      limited knowledge.
    </p>
    <h2>
      <a name="ER" id="ER">The Semantic Web and Entity-Relationship
      models</a>
    </h2>
    <p>
      Is the RDF model an entity-relationship mode? Yes and no. It
      is great as a basis for ER-modelling, but because RDF is used
      for other things as well, RDF is more general. RDF is a model
      of entities (nodes) and relationships. If you are used to the
      "ER" modelling system for data, then the RDF model is
      basically an opening of the ER model to work on the Web. In
      typical ER model involved entity types, and for each entity
      type there are a set of relationships (slots in the typical
      ER diagram). The RDF model is the same, except that
      relationships are first class objects: they are identified by
      a URI, and so anyone can make one. Furthurmore, the set of
      slots of an object is not defined when the class of an object
      is defined. The Web works though anyone being (technically)
      allowed to say anything about anything. This means that a
      relationship between two objects may be stored apart from any
      other information about the two objects. This is different
      from object-oriented systems often used to implement ER
      models, which generally assume that information about an
      object is stored in an object: the definition of the class of
      an object defines the storage implied for its properties.
    </p>
    <p>
      For example, one person may define a vehicle as having a
      number of wheels and a weight and a length, but not foresee a
      color. This will not stop another person making the assertion
      that a given car is red, using the color vocabulary from
      elsewhere.
    </p>
    <p>
      Apart from this simple but significant change, many concepts
      involved in the ER modelling take across directly onto the
      Semantic Web model.
    </p>
    <h2>
      <a name="Semantic1" id="Semantic1">The Semantic Web and
      Relational Databases</a>
    </h2>
    <p>
      The semantic web data model is very directly connected with
      the model of relational databases. A relational database
      consists of tables, which consists of rows, or records. Each
      record consists of a set of fields. The record is nothing but
      the content of its fields, just as an RDF node is nothing but
      the connections: the property values. The mapping is very
      direct
    </p>
    <ul>
      <li>a record is an RDF node;
      </li>
      <li>the field (column) name is RDF propertyType; and
      </li>
      <li>the record field (table cell) is a value.
      </li>
    </ul>
    <p>
      Indeed, one of the main driving forces for the Semantic web,
      has always been the expression, on the Web, of the vast
      amount of relational database information in a way that can
      be processsed by machines.
    </p>
    <p>
      RDF's serialization format -- its syntax in XML -- is a very
      suitable format for expressing relational database
      information.
    </p>
    <p>
      Relational database systems, manage RDF data, but in a
      specialized way. In a table, there are many records with the
      same set of properties. An individual cell (which corresponds
      to an RDF property) is not often thought of on its own. SQL
      queries can join tables and extract data from tables, and the
      result is generally a table. So, the practical use for which
      RDB software is used typically optimized for soing operations
      with a small number of tables some of which may have a large
      number of elements.
    </p>
    <p>
      RDB systems have datatypes at the atomic (unstructured)
      level, as RDF and XML will/do. Combination rules tend in RDBs
      to be loosely enforced, in that a query can join tables by
      any comlumns which match by datatype -- without any check on
      the semantics. You could for example create a list of houses
      that have the same number as rooms as an employee's shoe
      size, for every employee, even though the sense of that would
      be questionable.
    </p>
    <p>
      The Semantic Web is not designed just as a new data model -
      it is specifically appropriate to the linking of data of many
      different models. One of the great things it will allow is to
      add information relating different databases on the Web, to
      allow sophisticated operations to be performed across them.
    </p>
    <h2>
      <a name="Inference" id="Inference">RDF is not an Inference
      system</a>
    </h2>
    <p>
      I am not proposing any FPOC or HOL inference engine. I just
      note that HOL allows integration of multiple systems which
      use different inference engines spanning the range from from
      SQL to AI. For example, a simple HOL would allow any SHOE
      rules, data and results expressed, and a proof found by a
      SHOE engine to be verified by anyone.
    </p>
    <h3>
      <a name="Surely" id="Surely">Surely all first-order or
      higher-order predicate caluculus based systems (such as KIF)
      have failed historically to have wide impact?</a>
    </h3>
    <p>
      The same was true of hypertext systems between 1970 and 1990,
      ie before the Web. Indeed, the same objection was raised to
      the Web, and the same reasons apply for pressing on with the
      dream.
    </p>
    <p>
      The problem with all such systems was that they were
      conceptually or physically centralized. They required link
      global consistency.
    </p>
    <p>
      Guess what? KIF is very centralized in its approach to
      organizing knowledge (the cyc ontology for example suggests
      that everyone agree on the same terms for common english
      words, which RDF does not) and it does not promote its
      concepts to being first class web objects, ie it doesn't use
      URIs to identify them. To webize KIF or KR in general is, in
      many ways, the same as to webize hypertext in many ways.
      Replace identifiers with URIs. Remove any requirement for
      global consistency. Put in a significant effort into getting
      critical mass. Sit back.
    </p>
    <h3>
      Surely, many things expressible in FOPC are not efficiently
      computable?
    </h3>
    <p>
      Dead right. The goal of the semantic web is to express real
      life. Many things in real life, real questions which we will
      face are not efficiently computable. There are two solutions
      to this: The classical (pre-web) solution is to constrain the
      langage of expression so that all queries terminate in finite
      time. The weblike solution is to allow the expression of
      facts and rules in an overall language which is sufficiently
      flexible and powerful to express real life. Create subsets fo
      the web in which specific constraints give you specific
      computational properties. An anlogy is with the
      human-information systems which existed before the web. Most
      forced one to keep ones data in a hierarchy (sometimes of
      fixed depth or a matrix (often with a specific number of
      dimensions). This gave consistency properties within the
      information system. I bet DARPA has many of these systems and
      still does. They only way they could be integrated was to
      express them in terms of a much more powerful language -
      global hypertext. Hypertext did not have any of these
      reassuring properties. People were frightened about getting
      lost in it. You could follow links forever. As it turns out,
      it is true of course that there is a problem that you can
      follow links forever in the Web. And on the Semantic Web an
      inference engine will not necessarily terminate. However, on
      eth Web there are many subsystems such as many websites where
      life is very ordered and predictable, and searches give
      definitive results and there are no dangling links. But there
      is a HUGE advantage from exposing all this information in a
      way that allows it to be unified with all the other systems,
      ordered and unordered.
    </p>
    <h3>
      We should not expect a base inference level to include
      non-decidable computations
    </h3>
    <p>
      I have no expecatation of any inference capability in the SW
      core design. The semantic web does not have HOL inference as
      a standard. I would expect any SW compliant device to be able
      to <em>validate</em> a HOL proof, but not <em>generate</em>
      one.
    </p>
    <p>
      If you take a non-HOL-complete langauge and extend it to HOL,
      unless you have first defined where you are going (by
      defininbg the HOL langauge and expressing SHOE in it first)
      you will very likely end up with a rather baroque HOL
      langauge.
    </p>
    <h3>
      The FOPC inference model is extremely intolerant of
      inconsistency [i.e. P(x) &amp; NOT (P(X)) -&gt; Q], the
      semantic web has to tolerate many kinds of inconsistency.
    </h3>
    <p>
      Toleration of inconsistecy can only be done by fuzzy systems.
      We need a semantic web which will provide guarantees, and
      about which one can reson with logic. (A fuzzy system might
      be good for finding a proof -- but then it should be able to
      go back and justify each deduction logically to produce a
      proof in the unifying HOL language which anyone can check)
      Any real SW system will work not by believing anything it
      reads on the web but by checking the source of any
      information. (I wish people would learn to do this on the Web
      as it is!). So in fact, a rule will allow a system to infer
      things only from statements of a particular form signed by
      particular keys. Within such a system, an inconsistency is a
      serious problem, not something to worked around. If my bank
      says my bank balance is $100 and my computer says it is $200,
      then we need to figure out the problem. Same with launching
      missiles, IMHO. The semantic web model is that a URI
      dereferences to a document which parses to a directed labeled
      graph of statements. The statements can have URIs as
      prameters, so they can may statements about documents and
      about other statements. So you can express trust and reason
      about it, and limit your information to trusted consistent
      data.
    </p>
    <h3>
      Again, extension to higher order logic makes sense to me,
      requirement of FOPC inference model seems dangerous.
    </h3>
    <p>
      Most KR systems confuse information with inference tips. When
      a system stores a rule <em>a daughter of one's daughter is
      one's grandaughter</em> it is typically not just tored as
      that statement, but in a table of rules to be used by the
      algorithm at a particular time (for example whenever a parent
      of a daughter is found). The classicfication between data and
      various type of rule is a sort of meta level information
      which is general not itself expressed in the language. Two
      systems must be able to interchange the logical meaning of
      the rule, even when the type of rule may be unknown to each
      others inference engines. (Of couse, the rule expressed in
      general logic may be recongizable as a rule by another system
      and absorbed as such.) The example above is logically
    </p>
    <p>
      &forall;&alpha;,&beta;,&chi; (d(a,b) &amp; d(b,c) =&gt;
      gd(a,c))
    </p>
    <p>
      while for example a SHOE-based system and an Algernon-based
      system may have quite different systems for applying rules at
      different times.
    </p>
    <h2>
      <a name="CG" id="CG">Conceptual Graphs and the Semantic
      Web</a>
    </h2>I have written <a href="CG.html">a separate set of
    notes</a> about the relationship between Conceptual Graphs and
    the Semantic Web.
    <hr />
    <p>
      A few unsorted references - see also other pages in this set.
    </p>
    <ul>
      <li>
        <a href=
        "http://www.cs.umd.edu/projects/plus/SHOE/index.html">SHOE:
        simple hypertext ontology extensions</a>
      </li>
    </ul>
    <p>
      Shoe
    </p>
    <p>
      References on KR on the Web from Tim Finin:
    </p>
    <p>
      Here are some relevant papers from the <a href=
      "http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-23/">
      IJCAI-99 Workshop on Intelligent Information Integration</a>,
      . The first is a nice overview...
    </p>
    <ul>
      <li>
        <a href=
        "http://www.cs.vu.nl/~frankh/postscript/IJCAI99-III.html">Practical
        Knowledge Representation for the Web</a>, Frank van
        Harmelen and Dieter Fensel,
      </li>
      <li>
        <a href=
        "http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-23/crainfield-ijcai99-iii.pdf">
        UML as an Ontology Modelling Language</a>, Stephen
        Cranefield, Martin Purvis,
      </li>
      <li>
        <a href=
        "http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-23/fensel-ijcai99-iii.ps">
        On2broker: Semantic-Based Access to Information Sources at
        the WWW</a>, Dieter Fensel, Jurgen Angele, Stefan Decker,
        Michael Erdmann, Hans-Peter Schnurr, Steffen Staab, Rudi
        Studer, Andreas Witt,
      </li>
    </ul>
    <p>
      and here are some others of possible interest...
    </p>
    <p>
      Embedding Knowledge in Web Documents, Philippe Martin and
      Peter Eklund, Eighth International World Wide Web Conference,
      Toronto, May 11-14, 1999.
    </p>
    <p>
      Ontobroker: Or How to Enable Intelligent Access to the WWW,
      Dieter Fensel, Stefan Decker, Michael Erdmann, and Rudi
      Studer, Eleventh Workshop on Knowledge Acquisition, Modeling
      and Management, Voyager Inn, Banff, Alberta, Canada, Saturday
      18th to Thursday 23rd April, 1998
    </p>
    <p>
      and if we want a good overview of cyc as a backgrounder
    </p>
    <p>
      CYC: A Large-Scale Investment in Knowledge Infrastructure
      Douglas B. Lenat, CACM, 1995. I have a local copy at
      http://www.cs.umbc.edu/471/papers/cyc95.pdf
    </p>
  </body>
</html>