thenandnow 19.4 KB

Raw Blame History Permalink

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD>
  <META HTTP-EQUIV="Content-Type" CONTENT="text/html;charset=iso-8859-1">
  <!-- style borrowed from NOTE in parts -->
  <STYLE type=text/css>
        .example {
        BACKGROUND-COLOR: #f9f5de; BORDER-BOTTOM: 1px solid; BORDER-LEFT: 1px solid; BORDER-RIGHT: 1px solid; BORDER-TOP: 1px solid; COLOR: #5d0091; MARGIN-LEFT: 10%; WIDTH: 65%
        }
	BODY {
 	margin: 2em 1em 2em 70px;
  	font-family: sans-serif;
  	color: black;
  	background: white;
  	background-position: top left;
  	background-attachment: fixed;
  	background-repeat: no-repeat;
	}
	</STYLE>
  <TITLE>Nodes and Arcs 1989-1999: WWW history and RDF</TITLE>
</HEAD>
<BODY>
<DIV class="head">

<!-- lose that official looking icon for now -->

<!--IMG src="http://www.w3.org/Icons/WWW/w3c_home" ALT ="W3C" class="W3CIcon" -->


  <H1>
Nodes and Arcs 1989-1999
  </H1>
</DIV>

<H2>The WWW Proposal and RDF: Then and Now</H2>

<P>
    Initial version: 1999-11-12, Dan Brickley
<A HREF="mailto:danbri@w3.org"><TT>danbri@w3.org</TT></A><BR>

<P>
    <STRONG>Status:</STRONG> <BR>
This is a work in progress and a personal view of the
technical relationship between RDF and older ideas from Web architecture.
It is an early release as an informal discussion document for
feedback from the <A HREF="/RDF/Interest/"> RDF Interest Group</A>. This is <EM>not</EM> a formal
publication of any working group, or of the W3C itself. Some typos
remain...

</P>

<P>This document is provided as a background discussion motivating
 the <A HREF="/1999/11/11-WWWProposal/">WWW Proposal in RDF</A>
document. It was originally a sub-section of that work but grew too long
and was reworked as a standalone commentary. As such, there is some
duplication with that document which should be removed in any future
versions.
</P>

<H3>Information Management: Then and Now</H3>

  <P>
  The <A HREF="http://www.w3.org/History/1989/proposal.html">original
proposal of the WWW</A> from 1989 included a figure showing how
information about a Web of relationships amongst named objects could unify
a number of information management tasks.
</P>


<P>
<IMG SRC="/History/1989/Image1.gif" ALT="nodes and arcs figure from the
WWW proposal" >
</P>


<H3>RDF, WWW and Knowledge Management</H3>

<P>
Having <A HREF="/1999/11/110-WWWProposal/">
re-represented this data using RDF</A>, what can we do with
it that we couldn't before? The <A HREF="/RDF/">RDF</A> pages list a
number of query and logic oriented applications that suggest approaches to
WWW knowledge management unavailable in 1989. For example, we can show a
simple <A HREF="rdfqdemo.html">Javascript-based RDF Query demonstrator</A>
that queries this RDF database. (note that this is an in-progress
work and currently functions in only a subset of Javascript/ECMAScript
browsers).

</P>

<P>
The remainder of this document revisits some of the initial aims of the
WWW, and connects these to the architecture adopted for the Resource
Description Framework.
</P>


<H3>A digression: RDF in context</H3>

<P>
<STRONG>Note</STRONG>:
The following discussion is only one interpretation of the relationship
between the RDF data modeling system and the original system of knowledge
management outlined in the WWW proposal.
Readers are encouraged to consult the <A
HREF="http://www.w3.org/History/1989/proposal.html">original WWW
proposal</A> before continuing, and to reach their own conclusions about
this perspective on RDF.
</P>

<P>
A few relevant excerpts from the WWW proposal are reproduced here for convenience.
</P>

<BLOCKQUOTE>
CERN is a wonderful organisation. It involves several thousand people,
many of them very creative, all working toward common goals.
Although they are nominally organised into a hierarchical management
structure,this does not constrain the way people will communicate,
and share information, equipment and software across groups.
</BLOCKQUOTE>

<BLOCKQUOTE>

The actual observed working structure of the organisation is a multiply
connected "web" whose interconnections evolve with time. In this
environment, a new person arriving, or someone taking on a new task, is
normally given a few hints as to who would be useful people to
talk to. Information about what facilities exist and how to find out about
them travels in the corridor gossip and occasional newsletters, and
the details about what is required to be done spread in a similar way. All
things considered, the result is remarkably successful, despite
occasional misunderstandings and duplicated effort.
</BLOCKQUOTE>


<BLOCKQUOTE>
A problem, however, is the high turnover of people. When two years is a
typical length of stay, information is constantly being lost. The
introduction of the new people demands a fair amount of their time and
that of others before they have any idea of what goes on. The
technical details of past projects are sometimes lost forever, or only
recovered after a detective investigation in an emergency. Often, the
information has been recorded, it just cannot be found.
</BLOCKQUOTE>


<P>
This scenario is a familiar one. The challenges faced by CERN in 1989 are
common to many companies and organizations in 1999. We now have widespread
access to Internet information sources, typically accessed via the World
Wide Web. However, the WWW has not yet provided a solution to the
challenges it was initially proposed to address.</P>

<P>
 Word-of-mouth information
is supplemented by online information sources, but access to these is
still through relatively crude search systems. A common complaint about
the WWW is that the 'search engines' which provide most users with
information discovery facilities are somewhat crude.

Searching for
keywords and phrases amongst the Web
pages of a large company or organization, let along the <EM>entire</EM> Web,
will often result in a huge number of document being discovered. Often
these bear no obvious relationship to the information needs of the user.
</P>

<P>
The original WWW proposal suggested that it should be possible to pose
questions to an information management system and have them answered by a
mechanism that understands something of the complex
 web of interelationships that exist between people, document,
organizations and other entities.
</P>

<H3>Ask the Web?</H3>

<P>
Currently, users search for data on the Web by asking questions that are
of the form: "which documents contain <EM>these</EM> words and
phrases?"
</P>

<P>
The Resource Description Framework (<A HREF="/RDF/">RDF</A>), following
the original WWW design, suggests that we can do better than this. What
questions might we want to ask the Web? A few were sketched in the WWW
proposal...
</P>

<BLOCKQUOTE>

<P>
The sort of information we are discussing answers, for example, questions
like
</P>

<UL>
<LI>     Where is this module used?
<LI>     Who wrote this code? Where does he work?
<LI>     What documents exist about that concept?
<LI>     Which laboratories are included in that project?
<LI>     Which systems depend on this device?
<LI>     What documents refer to this one?
</UL>
</BLOCKQUOTE>

<P>
With the exception of the last item on this wishlist ('which documents
refer to this one'), the current Web (or Web search engines) does not allow
such questions to be easily answered. There is however a close affinity between
the model recently adopted in RDF and the structures described (but
which were until recently unimplemented) in the WWW proposal. The WWW
proposal notes that 'Linked Information Systems' can be applied to this
set of problems...
</P>

<BLOCKQUOTE>
In providing a system for manipulating this sort of information, the hope
would be to allow a pool of information to develop which could
grow and evolve with the organisation and the projects it describes.
For this to be possible, the method of storage must not place its own
restraints on the information. This is why a "web" of notes with links
(like references) between them is far more useful than a fixed
hierarchical system. When describing a complex system, many people resort
to diagrams with circles and arrows. Circles and arrows leave
one free to describe the interrelationships between things in a way that
tables, for example, do not. The system we need is like a diagram of
circles and arrows, where circles and arrows can stand for anything.
</BLOCKQUOTE>

<P>
The proposal then goes on to describe a number of 'node' types and 'arrow'
types such as might be used to represent diagrammatically the entities and
relationships typical of a complex organisation such as CERN...
</P>

<BLOCKQUOTE>
We can call the circles nodes, and the arrows links. Suppose each node is
like a small note, summary article, or comment. I'm not over
concerned here with whether it has text or graphics or both. Ideally, it
represents or describes one particular person or object. Examples of
nodes can be:
</BLOCKQUOTE>


<BLOCKQUOTE>
<UL>
<LI>     People </LI>
<LI>     Software modules </LI>
<LI>     Groups of people </LI>
<LI>     Projects </LI>
<LI>     Concepts </LI>
<LI>     Documents </LI>
<LI>     Types of hardware </LI>
<LI>     Specific hardware objects </LI>
</UL>
</BLOCKQUOTE>


<P>
The proposal also lists a number of relationship types that might hold
between these various types of thing. For some pair of entities A and B,
they might stand in one of any number of relationships. It might be true
that 'A'...
</P>

<BLOCKQUOTE>
<UL>
<LI>     depends on B
<LI>     is part of B
<LI>     made B
<LI>     refers to B
<LI>     uses B
<LI>     is an example of B
</UL>
</BLOCKQUOTE>

<P>
In doing so, the WWW proposal makes an interesting claim: that the complex
mesh of information relating people, software, documents, concepts,
organizations and other types of stuff could be understood through a very
simple metaphor. The metaphor is that of a <EM>web</EM> of named
relationships connecting uniquely identified things. This is, and not through
coincidence, the exact same model for representing information as that
adopted in RDF.
</P>

<H3>Nodes and Arrows; Entities and Relationships</H3>

<P>
There are a number of different terminologies for talking about the same
broad family of approaches to information management. The WWW proposal uses the terminology
 of 'node and arrow' diagrams, such as that reproduced above. Many in the
database and data modeling communities talk of 'entity - relationship'
modeling. RDF models are often represented graphically as 'node and arc'
diagrams. In RDF contexts we also talk about the entities represented by
nodes as 'Resources', and the relationships and attributes shown as
arcs/arrows are called 'Properties'.
</P>

<P>
Despite terminological differences, RDF can be seen as the eventual
formalization of this long-delayed component of the Web architecture. RDF
is the W3C's <A HREF="/Press/1999/RDF-REC">recommended</A> technology for
describing 'data about data', or metadata. The notion of 'data about data'
is somewhat confusing in a Web context. It is often useful to think about
RDF models as a form of 'self describing' data. To understand this, it is
important to appreciate the central role played by <EM>identifiers</EM> in
the Web architecture.
</P>

<BLOCKQUOTE>
 "The Web works best when anything of value and identify is a first
        class object.  If something does not have a URI, you can't refer  to it,
        and the power of the Web is the less for that."<BR>
        -- TimBL, Dec 1996<BR>
        <A HREF="http://www.w3.org/DesignIssues/Axioms">http://www.w3.org/DesignIssues/Axioms</A>
</BLOCKQUOTE>


<P>
On the Web, everything is a considered to be a 'resource', ie. a thing
that can be identified, and through identification, be used. The vast 'nodes and
arrows' diagram that constitutes the current World Wide Web consists mostly of documents
connected by links whose type is relatively meaningless (the label is "href",
which merely means "links to"). With the development of RDF and XML, we
can anticipate a richer Web in which these nameable interrelationships are modelled in
RDF and written down in XML syntax using RDF and X-Link. </P>

<P>The Web model is for all online resources to have unique
identifiers. In addition, unique identifiers can be assigned to a variety
of non electronic resources. The URI specification defines a
convention for representing these identifiers as short textual
strings; social and legal conventions define policies for assigning these
identifiers to resources of all kinds: eg. documents, concepts and
countless other entities. The URI system, like the Web itself, is
designed to be extensible: as new ways of identifying objects (eg. DOIs,
URNs etc) are proposed, Web URIs can accomodate these.
</P>

<P>
The crucial point is that every individual node, every <EM>type of
node</EM>, and every <EM>type of arc</EM> in the 'nodes and arc' diagram be
uniquely identifiable. The WWW familiar to users in 1999 is built on this
principle: everything that exists to the Web is identified on the Web
using URI identifiers. For example, a mailbox is identified with a
'mailto:' identifier, web pages are typically identified using 'http:'
names. The power of the Web comes from this simple, almost trivial,
principle: that <EM> unique identification is extremely useful for
information management</EM>.
</P>

<H3>RDF: Metadata as self-describing data</H3>

<P>
RDF is about self describing data in the sense that the principle of
unique identification which underpins the Web is applied to the practice
of modeling information. Although the RDF model of 'nodes and arcs' is
almost unchanged from that outlined in the WWW proposal document, RDF
takes things much further. By combining the principle of unique
identification with the nodes and arrows representation system, we gain a
powerfully simple perspective on information management.
</P>

<P>
We say that RDF's information model is self-describing because both the
types of relationships (arrows, arcs) and the types of nodes that
we see in node and arrow diagrams are themselves considered 'first class'
objects, uniquely identifiable and therefore describable. We make the
building blocks of our data modeling system into identifiable things on
the Web by giving them URI names, so that different computer systems
across the world can each make unambiguous use of the same types of nodes
and arcs.
</P>

<P> For example, when two
objects are connected, as in the original diagram, by a 'wrote' arrow (eg. "Tim Berners Lee"
--wrote--&gt; "This document" ), the relationship we call "wrote" is given
a Web identifier of its own. In 1999, we can use RDF and URIs to do this,
and the <A HREF="/XML/">XML</A> data format to interchange such
information between computers. The
<A HREF="http://purl.org/dc/">Dublin Core Metadata Initiative</A>, for
example, have defined a set of concepts such as
'Title', 'Creator', 'Description', 'Date'. So, instead of just writing the
simple label 'creator', RDF uses a Web
identifier: 'http://purl.org/dc/elements/1.0/Creator'. This gives us a
node on the Web which represents the relationship
'Creator' that holds between creative agents (persons, organizations) and
the works they create. Since we now have a URI for the notion of
 'Creator', other communities can describe relationships between this and
other nodes in the Web.
</P>

<P>
Why is this self-describing? Since the notion of Creator here is just
another node or resource on the Web, RDF (ie. nodes and arrows) itself can be used to make
statements about that thing. We might want to annotate it with a label, or textual
description (in one or more natural languages). Or we might want to relate it to other resources. This is
exactly what we see in the original WWW diagram: the node drawn as
"Hypertext" is shown as having an "includes" arrow pointing to "Linked
Information". This is a representation of the notion of a Linked Information
Systems, such as the proposed WWW itself. A number of nodes are also drawn
representing <EM>examples</EM> (or instances) of linked information
systems, eg. ENQUIRE, Hypercard. Similarly, the node representing the
category of "Hierarchical Systems" (examples being GroupTalk, UUCP/News,
CERNDOC, VAX/Notes) is itself a "first class resource" in the diagram.
</P>

<H3>Asking the Web and RDF Query</H3>

<P>
So... assuming we compose node-and-arc views of our diverse information
systems, and assuming we give unique identifiers to everything that
matters to our information management needs, what does this buy us?
</P>

<P>
If we give unique identifiers (URIs) to...:
</P>
<UL>
<LI>types of thing (eg. the set of Hypertext systems, the set of People or organizations)</LI>
<LI>the relationships that stand between those things (eg. 'describes',
'wrote', 'includes', 'unifies'...)</LI>
<LI>particular examples of those types of thing (eg. CERNDOC, Hypercard)</LI>
</UL>

<P>
...then we have an RDF-ready information system. We can use the universal
syntax provided by XML to write down and exchange messages that contain
information can be interpreted according to
this model, and we can use the nodes-and-arcs model to provide a common
'interpretation strategy' for a wide range of information management
scenarios.
</P>


<H4>For example...</H4>

<P>
If we want to ask for the identifiers of all things that are 'information
systems' which are 'unified' by a system described by some named individual,
we could couch this as a query consisting of URI identifiers and 'question
marks' or variables.
</P>

<P>
For example (in a fictional syntax):
</P>

<P>
<CODE>
 type(?X, InformationSystem), unifies(?X,?Y), describes(?Z,?X), wrote(?P,?Z).
</CODE>
</P>
<P>or</P>
<P>
<CODE>
 type(?X, InformationSystem), unifies(?X,?Y), describes(?Z,?X), wrote("Tim Berners-Lee",?Z).
</CODE>
</P>

<P>
This is a computerish way of asking for groups of objects 'X','Y','Z','P',
where P wrote Z, Z describes X, X 'unifies' Y, and X is an Information
System. In our example from the original figure in the WWW proposal
this would find a number of scenarios where nodes could be found that
match this query. Written out in full, the answer to this query might look
something like the following.
</P>

<PRE>
<CODE>
X= A Proposal: Mesh
Y= ENQUIRE
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
P= Tim Berners-Lee

X= A Proposal: Mesh
Y= CERNDOC
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
P= Tim Berners-Lee

X= A Proposal: Mesh
Y= VAX/Notes
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
P= Tim Berners-Lee

X= A Proposal: Mesh
Y= UUCP/News
Z= 'This document' (ie. http://www.w3.org/History/1989/proposal.html)
P= Tim Berners-Lee
</CODE>
</PRE>

<H3>Conclusions: Querying RDF information models</H3>

<P>
We have seen a few brief examples based around the <A
HREF="/History/1989/Image1.gif">image</A> included in the original WWW
proposal. The simple query presented here shows the way in which a
question might be asked of a system that is organised around the 'nodes
and arcs' model common to both the WWW proposal and RDF.
</P>

<P>
The RDF system does not yet include a specification for querying RDF
models. However, a number of <A HREF="/RDF/#sw">projects and
applications</A> exist that are exploring mechanisms for implementing RDF
query. Most of them take a similar form to the above scenario; the only
difference is that within the formal RDF model, URI identifiers must be
used to unambiguously identify each node and each relationship-type
(eg. 'creator' becomes 'http://purl.org/dc/elements/1.0/Creator'. In the
simple query example above, we abbreviate these URIs for increased
readability.
</P>


<ADDRESS>
<A HREF="mailto:danbri@w3.org">danbri@w3.org</A> November 1999
</ADDRESS>
</BODY>
</HTML>