Webize.html 11.6 KB
<?xml version="1.0" encoding="us-ascii"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>Webizing existing systems - Design Issues </title>
  <link rel="Stylesheet" href="di.css" type="text/css" />
  <meta http-equiv="content-type" content="text/html; charset=us-ascii" />
  <link href="di.css" rel="stylesheet" type="text/css" />
</head>

<body bgcolor="#DDFFDD" text="#000000" xml:lang="en" lang="en">
<address>
  Tim Berners-Lee<br />
  Date: 1998, last change: $Date: 2010/03/09 14:07:04 $<br />
  Status: personal view only. Editing status: first draft. 
</address>

<p><a href="./">Up to Design Issues</a> </p>
<hr />

<h1><a name="Webizing" id="Webizing">Webizing existing systems</a> </h1>

<p><em>This discusses the introduction of URIs as names in a system to scale
it to the web.</em> </p>

<p>The web is extended in two ways - by adding new bits of technology to the
existing stuff, and by "webizing" existing applications and systems. Webizing
is really important, not only as a way of bootstrapping the web using large
amount of legacy information, but because the existing systems have been
researched and designed over the years and it is really important we do not
lose the knowledge accrued during that process. </p>

<p>The essential process in webizing is to take a system which is designed as
a closed world, and then ask what happens when it is considered as part of an
open world. Practically, this effect on a computer language is to replace the
names/tokens/identifiers for URIs. Thus, where before reference could only be
made to something in the same document/program/module one can with equal ease
make reference to something in a different one somewhere in that abstract
space which is the Web. </p>

<p>In a clean case, this will be done so that the URI for an object is rather
naturally related to its representation in the original language. For
example, the element with ID "foo" in bar.xml is bar.xml#foo. However, to do
the same for an attribute defined in a DTD or schema is more difficult,
because of the complex nature of the spaces and subspaces for element and
attribute names in XML. It is great when the webized language is very similar
to the original language, and ideal when it actually compiles. Dan Connolly's
2000/8 <a href="#Connolly,">webization of KIF</a> uses URIs for identifiers,
but to be accurate because URIs are case sensitive and KIF tokens not, lower
case letter had to be marked with escaped with backslashes in the translation
which made the result less readable. Changing the underlying language in
small ways can make the translation much less cumbersome!. </p>

<p>Here is a slightly flippant view on the webize() function, each row of
which probably needs an essay of explanation, but provided here without
any.</p>

<table border="4">
  <caption></caption>
  <tbody>
    <tr>
      <td>x</td>
      <td>webize(x)</td>
    </tr>
    <tr>
      <td>Hypertext</td>
      <td>WWW</td>
    </tr>
    <tr>
      <td>Data</td>
      <td><a href="LinkedData.html">Linked data</a></td>
    </tr>
    <tr>
      <td>Top-down structured design</td>
      <td>Bottom-up ontology design</td>
    </tr>
    <tr>
      <td>Data Hiding</td>
      <td>Data Re-use</td>
    </tr>
    <tr>
      <td>Goto Considered Harmful</td>
      <td>Goto drives the economy</td>
    </tr>
    <tr>
      <td>unix file system</td>
      <td><a href="CloiudStorage.html">ACL'd r/w linked data</a></td>
    </tr>
    <tr>
      <td>Large-scale structure: Hierachy</td>
      <td><a href="Fractal.html">Large-scale structure Scale free</a></td>
    </tr>
    <tr>
      <td>"Tired"</td>
      <td>"Wired"</td>
    </tr>
  </tbody>
</table>

<h3><a name="Example" id="Example">Example - webizing a database</a> </h3>

<p>Imagine that a database is to be made available on the web in RDF. Suppose
the database itself will have a URI of http://weather.org/current An SQL
database is essentially a closed world, in that the various thing in it were
not designed to be linked to from outside. An SQL statement </p>
<pre>SELECT temp, zip  FROM weather WHERE temp  &gt; 30</pre>

<p>makes reference to terms which have meaning within the database. There is
no reference in that statement to the database - that is simply part of the
context. </p>

<p>Now suppose we determine what the URI will be for the pieces of the
database, perhaps current/weather for a table, and current/weather.temp for a
column in a table. We could then expend the syntax (excuse my SQL - I am
making this up) </p>
<pre><span style="color: #FF0000">USING c FOR http://weather.org/current</span><br style="color: #FF0000" />

<span style="color: #FF0000">USING u FOR http://places.org/usa</span><br />

SELECT <span style="color: #FF0000">c:</span>readings.temp, <span style="color: #FF0000">u:</span>location.lat, <span style="color: #FF0000">u:</span>location.long
  FROM JOIN <span style="color: #FF0000">c:</span>readings, <span style="color: #FF0000">u:</span>location
  WHERE <span style="color: #FF0000">c:</span>readings.zip = <span style="color: #FF0000">u:</span>location.zip
  AND <span style="color: #FF0000">c:</span>readings.temp &gt; 30;</pre>

<p>This is an (incorrect I expect @@@) SQL which links out of the local
database to combine it with information from a remote one. This syntax I am
sure won't work in practice, but should illustrate the principle. Namespaces
c and u are introduced for two reasons: for brevity, as repeating them in the
code would have been too cumbersome; and for syntactic reasons as URIs tend
to contain characters which would be ambiguous with other syntax is allowed
in SQL column names. </p>

<p>Of course, whether actually SQL on a set of scattered databases is
valuable may be questionable - it may not optimize as well as some other
query languages. However, suddenly the things defined by the database are
available to the outside world. For example, the concept of temperature
reading as used by weather.org in its database of current conditions </p>

<p><code>http://weather.org/current/readings.temp</code> </p>

<p>is now a concept, an RDF property in fact, which is available for all the
world to refer to. These references need not all be in SQL. Because the
schema for the database will declare it to be an RDF property or something
equivalent, many different systems can use the information and refer to the
concept. </p>

<h4 id="Notes">Notes specifically on this example </h4>

<p>I note, before we leave this example, that there are two concepts
important to a table. One is the type of thing described by a row. A row in
the reading table, for example, defined a weather reading, something which
had a location and temperature and humidity and place. The other concept is
the set of objects which are actually in the table. In the classic SQL
example of the employees table, there is a rdf:class employee, subclass of
person, and also the fact that someone works for the company iff they are in
the table. </p>

<p>A second note on exporting databases. When you really put something on the
web, there is often, for flexibility and security, a layer between what you
expose and the internal storage. Just as web pages are not files though often
closely related to files, and have the same form - a string of bytes and a
MIME type. Exposed remote operations are not local procedures though closely
related to them, and have the same form -- a service URI and a method name
and parameters. Similarly one would probably export a derived view of a
database in many cases - one which would have the form of a database. This
allows different engineering decisions to be made on the external
manifestation (persistent and what the customer wants) and the internal form
(efficient and convenient for you). </p>

<h2 id="webize">Webizing nested languages </h2>

<p>Sometimes this is easy and sometimes it is hard. It is hard, for example,
when the language uses nested scoping to great effect. In this case there is
a very large amount of context which is completely different between the
beginning and end of such a link. The <em>go to</em> instruction is
considered harmful [<a href="#GTCH">ref</a>] by Dijkstra because it "<em>as
it stands is just too primitive; it is too much an invitation to make a mess
of one's program</em>." This of course is true of the hypertext link too, in
a way. Both allow an open webbed world which typically, if used with no
restraint, remove rules which give sanity and analysability to a language and
allow optimization of the code compiled. So, just as some languages prevent
one from jumping into or out of an inner loop of a program, so it may make no
sense to allow a link to be made into something within a nested structure,
because the referenced thing just does not have any meaning when taken out of
context. </p>

<p>When dealing with language which have nested context, it may be necessary
either to define how something inside represented independently of context,
or to make it impossible. </p>

<p>Be careful, though, before jumping to this conclusion. In many cases, it
is important to webize nested objects completely. For example, in a 3d scene
language, an object may be within a scene within an object within a scene and
still have identity which is important to be able to refer to. In a hypertext
document, there is a nested context which for example affects the style, and
the reference is made to the destination anchor not as a isolated piece of
hypertext, but in the context of the whole document. </p>

<p>The principle that on the Web, anything must be able to say anything about
anything means that these innermost nested objects must have URIs. </p>

<p>It may also be the case that an attempt to webize a language reveals bad
points in the design which really need to be ironed out anyway for the cause
of good software engineering. If a name in some module has in fact quite
different meanings when used in different contexts, then it isn't suitable
for webizing as it is, and maybe two separate derived URIs should be made in
the mapping. Maybe the language should actually be cleaned up so that the
concepts are distinct. </p>

<p>A very simple case is in a documentation control system, when humans use
the same document name ("the pipe size draft") to refer to a particular
document and also to the set of documents from </p>

<p>An exercise for the reader is to contemplate and determine whether it is
webized, and if not, what it would take, and what would be the cleanest way
of going it. Try looking at XML schemas (what is the URI of an element
type?). </p>

<p>When stuck, recourse to common sense. Ask what the construct actually
represents in a global context, if anything. This might mean clarifying the
language itself. </p>

<h2><a name="Conclusion" id="Conclusion">Conclusion</a> </h2>

<p>Webizing a language involves turning from a system which assumes a closed
world to one which will operate as part of the open web. Some cases are
easier than others. Webizing one application gets one a good idea of what
sorts of design decisions force a closed world assumption and make webizing
difficult, and what by contrast makes a weblike application which immediately
benefits from the rest of everything out there. </p>
<hr />

<h2 id="References">References </h2>

<p><a name="GTCH" id="GTCH">GTCH</a>: Edsger W. Dijkstra, "<a
href="http://www.acm.org/classics/oct95/">Go To Statement Considered
Harmful</a>", <em>Communications of the ACM</em>, Vol. 11, No. 3, March 1968,
pp. 147-148. </p>

<p><a name="Connolly,">Connolly, Dan,</a> "<a
href="/2000/07/hs78/KIF.html">Knowledge Interchange Format (KIF) as an RDF
Schema</a>", 2000/8 </p>

<p><a href="Overview.html">Up to Design Issues</a> </p>

<p><a href="../People/Berners-Lee">Tim BL</a> </p>

<p>2000/8/31 </p>
</body>
</html>