NameMyth.html 20 KB

Raw Blame History Permalink

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <meta http-equiv="Content-Type" content="text/html" />
    <title>
      The Name Myth -- Axioms of Web architecture
    </title>
    <link href="di.css" rel="stylesheet" type="text/css" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000">
    <address>
      Tim Berners-Lee
      <p>
        Date: December 19, 1996
      </p>
      <p>
        Status: personal view. Editing status: Italic text is
        rough. Reques complete edit and possibly massaging, but
        content is basically there.
      </p>
    </address>
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <h3>
      Axioms of Web Architecture: 2
    </h3>
    <hr />
    <h1>
      The Myth of Names and Addresses<br />
    </h1>
    <p>
      The discussion above about the universality of URIs
      (Universal Resource Identifiers) mentions briefly how URIs
      are designed to encompass both things we think of as
      addresses and those we think of as names. Much of the
      discussion of this issue has been clouded by attempts to
      distinguish names from addresses. The term "identifier" was
      picked in an attempt to side-step this issue but
      historically, that did not prevent a quagmire of circular
      discussion which in some circles paralyzed any forward
      progress. Therefore, in this section let me state the
      philosophy which to my mind sets this problem in the right
      light and should prevent further fruitless discussion.
      <i><br /></i>
    </p>
    <p>
      There is the commonly held belief that names and addresses
      are different and distinct. We learn the importance of the
      difference between identifiers in a programming language and
      addresses within a computer memory. We learn the difference
      in properties between fully qualified domain names on the
      internet and internet protocol addresses. This can lead us
      easily into imagining that there are two types of objects:
      Names, which once attached to an object follow it for its
      life wherever it should reside, and "addresses" which change
      frequently whenever an object moves or is copied or
      replicated from one "location" to another.
    </p>
    <p>
      However, the only true location is a point in three
      dimensional space, and within computer systems and especially
      networked computer systems there is a very large number of
      complex indirection between almost anything we would call a
      name <i>or</i> an address and the actual physical location of
      the memory cell which stores it. At one end of the spectrum a
      computer memory address often is really an address within a
      virtual memory space allocated to a particular project, and
      when used is translated by the hardware into a physical
      memory address, or for that matter into an address, into a
      piece of memory which is being moved out into somewhere and
      swapping the file on disk storage. Filenames are mapped
      though mount tables and directory files into "inodes" which
      are mapped onto track and sector locations. Internet protocol
      addresses [IP Addresses] similarly are not bound absolutely
      to a given computer: they can be re-allocated within the
      constraints that because they are used for routing, there is
      information connecting parts of the IP address with routing
      information and so the computer corresponding to a given IP
      address cannot be moved far in the routing structure. So, we
      see that the constraint on how you can re-use an address is a
      function of what information is in the address. When most
      programs or people mention IP addresses, they simply quote
      four decimal numbers, each between naught and 255 without
      worrying about the internal structure. So, the information
      within the IP address which prevents it being re-used in a
      different area is to most people not explicit: It is, if you
      like, hidden within there as the reason why IP addresses
      can't be used. When we want to use something to refer to a
      computer but still be able to move the computer or at least
      the thing corresponding to that identification across from
      one part of the internet to another, we use our domain name.
      The domain name system, being completely independent of the
      routing system, allows us to allocate any IP address at all
      to a computer of a given domain name. Therefore, if we
      believe the naming myth the domain name is a name and the IP
      address is truly an address.
    </p>
    <h2>
      <a name="Anecdotes" id="Anecdotes">Two anecdotes about names
      and addresses</a>
    </h2>
    <p>
      Two real-life anecdotes illustrate the dangers of making this
      assumption. When there were only a few web servers and I kept
      a registry of all those which I knew, I was contacted by a
      group in Australia who were putting up a server with some
      interesting botanical information. They sent me some details
      of the server to be put into the list and they gave me the IP
      address of the machine. My email reply explained that I
      always prefer to refer to servers by their domain name rather
      than their IP address and asked them for the domain name of
      the server. They replied that the domain name they would use
      would depend on the department within the university which
      was responsible for maintaining the server but due to a
      university re-organization, it was not at this point clear
      which department that would be. However, they explained that
      they could guarantee that the IP address of the server would
      remain unchanged for a long time.
    </p>
    <p>
      Several years later, the list of servers now abandoned as a
      single list of all World Wide Web servers was among the
      now-extensive web of information maintained on the server
      known as info.cern.ch, the first World Wide Web server set up
      at the start of the World Wide Web project. At this time the
      responsibility for the coordination of World Wide Web
      protocols was shifting from CERN to MIT/LCS and the embryonic
      World Wide Web Consortium. For a while, CERN continued to
      maintain the server, but later the master sources for that
      information were maintained in America. Soon after this the
      authorities at CERN requested that the name info.cern.ch
      should no longer be used to refer to this information, as it
      was no longer under control of CERN and they could no longer
      assume responsibility for it. In fact, there was a policy
      that names in the cern.ch domain should never be allowed to
      refer to Internet addresses which were not physically on the
      CERN site. Therefore all hypertext pointers into the
      info.cern.ch space have had to be changed over the course of
      time to point to the <code>w3.org</code> space.
    </p>
    <p>
      These two examples show the "name" of objects having to be
      changed even though the objects retained their essential
      identity. The reason was in each case imbedded information in
      the name: the domain name on the server contains authority
      information about the maintainer of the computer whose
      address corresponds to the domain name. If the authority for
      an object changes, whether it "moves" on not, then there may
      be a need to change its name under these circumstances. It
      turns out that for almost any naming or addressing system in
      which there is some information (other than random numbers or
      dates of creation of the objects) built into the name that
      the name might have to be changed when the facts
      corresponding to that information change. Therefore it
      becomes simply a matter of choice between naming or
      addressing systems as to what sort of information you wish to
      include implicitly or explicitly within your "name" or
      "address".
    </p>
    <h2>
      <a name="Why" id="Why">Why Names Change</a><br />
    </h2>
    <p>
      <small>See also:</small>
    </p>
    <ul>
      <li>
        <small>In the Syyle Guide for Online Hypertext, <a href=
        "../Provider/Style/Overview.html"><i>Cool URLs don't
        change</i></a></small>
      </li>
    </ul>
    <p>
      It is worth looking at some of the reasons for names in
      practical use to change or need to be changed. Some World
      Wide Web servers have unwisely simply mapped the URL space
      onto a Unix filename space, and the results of this,
      especially in the early days, were URLs which might look like
      this:
    </p>
    <p>
      http://pegasus.cs.foo.edu/disk1/students/romeo/cool/latest/readthis.html
    </p>
    <p>
      Looking at the segments of this name we can see as many
      reasons for the name to need to be changed.
    </p>
    <p>
      The "http:" will only be changed if the document is later
      served up using a different protocol and, in fact, that is
      probably one of the least likely pieces to change.
    </p>
    <p>
      "Pegasus", the name of the computer, probably has a
      significance within the university as a computer dedicated to
      some particular tasks such as supporting personal student
      activities, and maybe maintained by a particular department
      or may even be a name from a project for which the computer
      was originally put into use before it became shared with
      general user space. So, "pegasus" will be changed whenever
      the function of supporting this particular student's web
      pages has to be shared with other functions.
    </p>
    <p>
      "Cs" indicates the computer science department, so the
      document is bound to the computer science department. It may
      not be something which the computer science department has a
      lot of interest in, and the student may well transfer his or
      her interests to other departments in the future.
    </p>
    <p>
      The name of the university "foo.edu" will probably last for a
      good while, though whether the university wants to continue
      to be associated with the document for more than two or three
      years is questionable.
    </p>
    <p>
      The next section of the path, "disk1", is clearly a mistake.
      In fact, of course, disc1 is just a name which can be
      attached to any physical disk, but by grouping together all
      the students on a certain disk in this arbitrary way, one
      makes a binding between all the documents which they create
      which will have to be broken whenever the computer is
      reorganized. In fact, the relocation tables which most
      servers support allow much translation of names to take place
      and make this sort of path quite unnecessary.
    </p>
    <p>
      The next element identifies Romeo as a student which may
      change even though he continues to study for the rest of his
      life, and then the next path element "romeo" identifies the
      author of the document. As in the case with CERN above, the
      original author of a document may later not wish to keep
      maintenance or responsibility for ongoing versions. For
      example, the document may be submitted to an organization
      which publishes it and formally takes over responsibility for
      its upkeep; it may achieve a status of some kind as a
      standard or an accepted thesis which causes its maintainers
      to change. The original author may in fact deliberately
      simply pass on authorship of the document to someone else. In
      any of these cases the name would have to change, and all
      references to that name would break.
    </p>
    <p>
      The student himself has not been very wise with his choice of
      path name. For many people, what is "cool" changes with time
      and for most people what is "latest" changes with time.
    </p>
    <p>
      Perhaps the unlikely to change piece of information in the
      URL "readthis" as it contains no information at all, just
      like the proverbial "click here". Effectively, it is a random
      name assigned to the document and as such, is perhaps the
      safest part of the path.
    </p>
    <p>
      The last element of the path, "html" is not strictly
      necessary with most servers, as at least some servers will,
      given a URL of &nbsp;"readthis" , &nbsp;serve up the data
      from a file which is called "readthis.html". Here the student
      is making it difficult for himself later to change the format
      or formats in which the file is available, without at least
      some confusion. Suppose, for example, that he later decides
      that the information is worth providing in audio format for
      blind readers. The CERN server can easily be configured so
      that clients specifically&nbsp;requesting audio formats in
      preference to HTML can be served as preferentially whereas
      more normal clients will get the HTML. So, here again is a
      part of the path which may be later regretted.
    </p>
    <p>
      You can play this game with almost any name and address in
      any system, and it is interesting to ask yourself in each
      case: to what extent do I call this a "name" and to what
      extend do I call it an "address"? So, in conclusion we see
      that any information explicitly owned or implicitly included
      in a name is a threat to its longevity. &nbsp;We see
      &nbsp;that the difference between a "name" and an "address"
      is not so fundamental. &nbsp;That is why
    </p>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            When a new URI scheme is defined, the specification
            defining ity should describe the name-like and
            address-like properties of URIs in the new scheme, so
            that that those using them can know what to be able to
            expect.
          </td>
        </tr>
      </tbody>
    </table>
    <h2>
      <a name="What" id="What">What's in a name?</a><br />
    </h2>
    <p>
      Why is information included then? Generally, the information
      is included because in order to discover anything about the
      name, one has to "dereference" the name. Typically this uses
      some official or unofficial set of indexes distributed or
      otherwise to look up the name. Many names are hierarchical in
      the authority which allocates them. DNS names are a good
      example. Road names within towns are another good example.
      Therefore to find out where the new "North Street" is located
      in small town one goes to the town for the definitive answer.
      For information as to where the server "pegasus.cs.foo.edu"
      is, one must send a message directly or indirectly to a
      server controlled by the Foo University.
    </p>
    <p>
      Is it possible to omit all such information from a name?
      Certainly. Message identifiers in mail have only the need to
      be unique. So, whereas hierarchical names and time stamps may
      be used to help make such identifiers unique, you cannot
      dereference the names at all. Perhaps we should call these
      "identifiers" rather than "names". Within a certain context,
      it is extremely useful to be able to refer to a mail message
      by its mail identifier. We say that these identifiers support
      the notion of equality: even though they cannot be
      dereferenced, you can test two mail messages to find out
      whether they are in fact the same simply by testing their
      identifiers. You can also within a finite set of mail
      messages look up a message of a given identifier. You just
      can't do this on a global scale. So this then is the essence
      of the naming problem:<br />
    </p>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            The naming problem: if you put information in a name,
            it decreases its longevity; if you don't you can't
            dereference it to a resource.
          </td>
        </tr>
      </tbody>
    </table>
    <h3>
      <a name="social" id="social">Naming: A social and contracual
      Issue</a>
    </h3>
    <p>
      Many, many solutions to the naming problem have been
      attempted and successfully deployed in different
      circumstances. At one end of the scale, it would be in fact
      possible using a huge network of hash tables around the
      world, to keep a hash index of all randomly generated unique
      names. The problem with this idea is that there would have to
      be one single funding model and one homogeneous quality of
      service for all names. There would be no way to pay more for
      a more persistent name.
    </p>
    <p>
      At the other end of the scale, hierarchical systems such as
      the domain name system, and the x500 name system, have been
      implemented. Suppose one wants to use a name which can be
      dereferenced and therefore must put some information in it.
      That information will lead us to some authority or some root
      to dereferencing the name. How can we maintain the lifetime
      of that name as something which can be dereferenced? The only
      way is that we have a contract with all the agencies which
      are involved in supporting the systems which dereference that
      name that they should continue their operation giving a
      certain quality of service for a certain period of time.
    </p>
    <p>
      Suppose the Foo Alumni Association ran a URL service in which
      a special name such as
      "http://alumni.foo.edu/1998/romeo/202-aab" would be available
      to any graduating paying their dues, and maintained
      indefinitely (perpetual care) on receipt of a suitable
      endowment.
    </p>
    <p>
      Of course, as organizations disolve and mutate, there is
      nothing to stop one organization from taking over the support
      of &nbsp;the archives another. &nbsp;Forthis purpose, it
      would be very useful to have a syntax for putting a date into
      a domain name. &nbsp;This would allow a system to find an
      archive server. &nbsp;Imaging that, failing to find
      "info.cern.ch", one could search back and find an entry
      "info.cern.ch.1994" which pointed to www.w3.org as a current
      server holding archive information for info.cern.ch as it was
      in 1994, with, of course, &nbsp;pointers to newer versions of
      the documents.
    </p>
    <h3>
      <a name="QoS" id="QoS">Quality of Service</a>
    </h3>
    <p>
      Looking at an "http:" URL, while some look more sensible than
      others, it is not immediately evident whether great pains are
      being taken to make the name very persistent. &nbsp;We have
      just discussed such a range of reasons why names can change,
      and clearly the social and contractual arrangements can be
      quite involved, so it is clearly difficult to simply define a
      quality of service for naming. &nbsp;However, defining some
      well known quality of service levels would be a very useful
      task. This is the sort of task ideally suited to a group of
      trechnologies, librraians or archivists.
    </p>
    <p>
      &nbsp;In any event, for identifiers in the http space and
      many others, it would be useful to be able to assert what the
      quality of service is. This is information about a URI and a
      resource. &nbsp;Like the <a href=
      "Generic.html#Dimensions">information about generic URIs</a>,
      it is about the sort of identity between the URI and the
      resource.
    </p>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            Metadata should be used to express the quality of
            service for the binding between a URI and a resource.
          </td>
        </tr>
      </tbody>
    </table>
    <h2>
      <i><br /></i>
    </h2>
    <hr />
    <p>
      <a href="Metadata.html">Next: Metadata architecture</a>
    </p>
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
  </body>
</html>