Stack.html 32.7 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      The stack of specifications - Design Issues
    </title>
    <link rel="Stylesheet" href="di.css" type="text/css" />
    <meta http-equiv="Content-Type" content="text/html" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
    <address>
      Tim Berners-Lee Date: 2002/05, last change: $Date: 2003/01/06
      19:40:09 $<br />
      Status: personal view only. Editing status: rough..
    </address>
    <p>
      <em>Abstract: This is backgrounder explaining where the web
      specifications fit into the internet technology as a whole.
      It explains the philosophy of electronic communications
      having well-defined meaning grounded in a stack of
      interconnected specifications. This is all normally -- and
      quite justifiably -- taken for granted by Web engineers. But
      it is needs to be emphasised when the Internet is abused ,
      for example by spammers who forge email headers, or companies
      who cheat protocol timeouts in order to claim greater
      performance, and in doing so, break the system. This article
      debunks the idea that "its Ok to interpret things this way as
      more and more people are doing it".</em>
    </p>
    <p>
      <em>It was originally the subject of a keynote address at the
      International World Wide Web Conference in Hawai'i, April
      2002.</em>
    </p>
    <p>
      <a href="./">Up to Design Issues</a>
    </p>
    <hr />
    <h1>
      The Stack of Specifications
    </h1>
    <p>
      Bits mean something.
    </p>
    <p>
      When you connect a cat-5 ethernet cable to your computer, you
      effectively commit to taking part, with your computer, in a
      very special system. It is a system in which the meaning of
      messages is determined, in advance, by specifications. This
      is a principle which is so basic to network computer systems
      that it is rarely stated. But as the stack of specifications
      gets higher and higher, and as electronic commerce, legally
      enforceable agreements, and socially sensitive issues such as
      privacy and fraud become matters of public concern, it is
      worth reiterating for the record.
    </p>
    <p>
      The Internet works because of interoperability between
      different computers, despite different hardware, operating
      systems, local language context, and software supplier. Users
      of the web sign on to the use of these languages when they
      use the Internet.
    </p>
    <p>
      There is this little philosophy joining many specifications,
      without which the Web falls apart.
    </p>
    <p>
      Lets take an example.
    </p>
    <h3>
      You have an ethernet cable
    </h3>
    <p>
      You walk into a meeting room, and you are offered a thin
      cat-5 cable with a 10-base T connector. This is an Ethernet
      connector which only takes Ethernet packets. The only way to
      use it to communicate is for your computer to send packets
      which are formatted to the Ethernet specification. The
      Ethernet specification is a large document (Similar to
      <strong>IEEE standard 802.3</strong>) put together by a bunch
      of engineers, and once they were done Ethernet existed as a
      standard, and computers which know nothing about each other
      could exchange packets over local area networks..
    </p>
    <p>
      The Ethernet defines the format of an Ethernet packet, which
      has a little header information, but mostly carries
      information on behalf you the user. The spec also,
      importantly, defines some rules of behaviour. For example,
      the ethernet doesn't work if more than one computer tries to
      transmit at once. There is a rule that if you find that
      happens, everyone involved backs off and comes back at a
      random interval. Each computer is supposed to wait on average
      the same amount of time before trying again. Of course, you
      could cheat by actually pretending that your random number
      happened to be really small every time, and on average your
      computer would end up getting though more and blocking
      everyone else out, just like people who always seem to be the
      one talking in a meeting. But that would be cheating, and
      contrary to the Ethernet specification. By connecting to an
      ethernet cable, there is an understanding that your computer
      will stick to the rules
    </p>
    <p>
      An ethernet packet can be sent to anyone on the same wired or
      wireless local area network. How does a computer know what to
      do with a packet when it gets it? How does it know how to
      interpret that packet? Well, there is a field in the packet
      which tells it, in a coded way, what the use of the packet
      is, and therefore how to interpret it.
    </p>
    <p>
      Of course, there are lots of uses of the Ethernet, but a very
      common use of an Ethernet packet is to use it to carry an
      Internet packet. Ethernet packets can only cross the local
      area network, while Internet packets are forwarded anywhere
      in the world. So, there is a particular code - a particular
      value for the field in the Ethernet packet - which tells any
      receiving computer that the data is actually an Internet
      Packet. This means that to understand anything more about the
      packet means, you have to read another spec: the
      <strong>Internet Protocol (IP, RFC791).</strong>
    </p>
    <p>
      @@@ The complete graph of interdependencies between
      specifications.
    </p>
    <h3>
      You send an Internet packet
    </h3>
    <p>
      So suppose you send an Internet packet. You put the ethernet
      address of the local "router" computer into the ethernet
      address field, but within the "data" part of the ethernet
      packet is the IP packet and inside that is an internet
      address field, which takes the IP address (the thing like
      18.96.237.175) which identifies the computer Although the
      ethernet packet you send it in only gets as far as some
      computer a "router" on the local net, that computer passes
      the IP contents on, from computer to computer across
      interconnected networks until it arrives on the right local
      network for its actual destination.
    </p>
    <p>
      So how does that computer know what to do with it? Well,
      there is a field in the IP packet which carries a coded value
      to tell the computer receiving it what to do with it. .
    </p>
    <pre>
From Internet Protocol (RFC791):
A summary of the contents of the internet header follows:

      0                   1                   2                   3
      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |Version|  IHL  |Type of Service|          Total Length         |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |         Identification        |Flags|      Fragment Offset    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |  Time to Live |    <strong>Protocol</strong>   |         Header Checksum       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                       Source Address                          |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                    Destination Address                        |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                    Options                    |    Padding    |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                      Example Internet Datagram Header

                                 Figure 4.
</pre>
    <p>
      And there are a lot of things you can do with an IP packet,
      but a very common one is to use that IP packet to set up, or
      to be a part of, a reliable stream of communication using the
      <strong>Transmission Control Protocol (TCP) (RFC
      793).</strong>
    </p>
    <h3>
      You send a TCP packet
    </h3>
    <p>
      When you send, or your computer sends, a packet in the TCP
      protocol, there is an understanding that that packet conforms
      to the protocol. That means a couple of things. It means that
      you agree that the packet's contents it to be interpreted
      according to the TCP protocol specification. It also means
      that you agree to abide by the rules of the specification,
      which determine, rather like with the Ethernet protocol, how
      long your computer will wait before re-sending a packet which
      didn't seem to get there. If your computer re-sends too
      early, then it hogs the Internet and slows down everyone
      else. If your computer send a packet to start a new
      connection when it doesn't really want to, then the
      destination computer will prepare a lot of memory to receive
      all the data you are going to send, and wait. If you keep
      doing it, then that computer can just run out of memory and
      stop working. So you can cheat and you can do real damage by
      breaking protocols.
    </p>
    <h3>
      Introducing IANA: The Port number registry
    </h3>
    <p>
      So you computer must stick to the TCP specification. When it
      does that, the TCP protocol assures that the two computers
      have a reliable connection without any missing bits. What
      they use it for is no concern of TCP, apart from the fact
      that the TCP protocol specifies, within the TCP packet (which
      is inside the IP packet (inside the ethernet packet)) a
      special field whose coded value, or <strong>port
      number</strong>. There is a convention, which is written into
      the TCP specification, (@@check and quote wording) that the
      meaning of the port number is determined by a table which is
      changed from time to time, but kept by the <strong>Internet
      Assigned Numbers Authority</strong> (IANA). Without going
      into the politics of the changes and control around IANA, it
      is just worth noting that this is, architecturally, a
      "flexibility point", where the community can introduce a new
      protocol to run on top of TCP/IP without having to write it
      into a new version of the TCP/IP specification itself.
    </p>
    <p>
      The port number registry is on the web (@@ link) but also, on
      a unix computer, there is a list of the well-known ports in
      the file /etc/services.
    </p>
    <p>
      When you send a TCP/IP packet there is therefore an
      understanding that if you send to one of the well-defined
      port numbers, then you are going to use it in a way defined
      by the specification defined in the IANA registry. For
      example, port number 25 indicates that you are going to use
      it to transfer some email, and that you undertake to
      communicate according to the Simple Mail Transport Protocol
      specification.
    </p>
    <h3>
      You send an email message
    </h3>
    <p>
      You get the picture. One specification, once you commit to
      it, depending on the values of certain fields, invokes
      further specifications. By committing originally to using an
      ethernet cable, you commit to your computer using, on your
      behalf, the various other specifications. In the case in
      which your computer sends email, it may for example open a
      TCP/IP connection to to port 25, and then use the Simple Mail
      Transfer Protocol (SMTP, RFC821). This specification
      indicates that the body of the SMTP communication is
      formatted according to the email message specification,
      RFC822. RFC 822 specifies the headers on email messages. It
      specifies, for example that a given "From" field indicates
      the email address sender of the message.
    </p>
    <p>
      It is possible, of course, to cheat. with the SMTP protocol.
      It is possible to lie about who is sending the message - to
      send a message which appears to be from one person to a
      friend. This breaks to protocol. It breaks it, here, in a way
      which is very clear to people: it sneaks past their personal
      email filtering, and also any automated filtering, tricking
      them into reading a message. This is a security violation. It
      can use up a person's time, energy, bandwidth and disk space
      for the commercial gain (indirectly through advertising and
      sales) of the perpetrator.
    </p>
    <p>
      The Internet specifications, to which any Internet user
      implicitly agrees in using the Internet at all, define what
      the fields in an email message mean. To put incorrect
      information in these fields is to make a misrepresentation,
      just as it would have been in any other medium. It should be
      subject to the same penalties as lying or fraud in any other
      medium.
    </p>
    <p>
      When the Internet was young and used by research
      institutions, its misuse would inconvenience other users and
      lead to reprobation and the disdain of one's peers. Now that
      the Internet is such as large force in society, it is
      possible to make a lot of money and create a lot of damage by
      protocol abuse. You can compare a lie in an internet message,
      depending on how it is done, to forging a check, connecting
      to the electricity supply the other side of the meter, or to
      poisoning the water supply. Society must therefore be careful
      to be absolutely clear about the illegality of such misuse.
    </p>
    <h3>
      <a name="publish2" id="publish2">You publish a Web page</a>
    </h3>
    <p>
      When you publish a web page, just as when you send an email
      message, the web page or the message generally carries a
      meaning. Well,it can be a picture or a poem which is more
      artistic than linguistic, but in a large number of cases the
      meaning is a well-defined part of a communication between
      parties. It may be a human-readable document, like the page
      describing a pair of pants your are about to buy from a
      store, or it may be machine-processable, like the Online
      Financial Exchange (OFX) format bank statement your financial
      software downloads from your bank.
    </p>
    <p>
      Of course, you would find it hard work to make sense of the
      OFX file if you just read it without the help of the
      financial agent, and your financial agent wouldn't make much
      sense of the catalog page. Something must allow us to
      distinguish how web pages and emails should be interpreted,
      just as a computer has to figure out how to make sense of an
      Ethernet packet. And just the same sort of thing indeed
      happens.
    </p>
    <p>
      When you publish a web page, you give it a HTTP URI. You pick
      a URI from the space of URIs which are yours to define. Some
      people have space on their own domain, some people have the
      right to pick URIs in part of someone else's domain. But the
      URI is one which you own or over which you have authority.
      You are not allowed to pick one in someone else's space.
    </p>
    <p>
      Whoever owns the domain has the authority to define which
      computer serves information in it. They have the authority
      then to have a computer -- a web server - which is configured
      to act on their behalf. It is then assumed that the computer
      acts on the their behalf. The server is the agent of the
      publisher. What it does is tell any asking browser what you
      have said is a representation of the document for a given
      URI.
    </p>
    <p>
      When someone follows a link to your web page, their browser
      opens a TCP/IP connection to TCP port 80 on the machine which
      is registered as serving the (www.whatever.com, etc) in
      question. Their agent, their browser, asks your agent, the
      server, to give it some representation of the web page for
      that URI.
    </p>
    <p>
      Why? Because the URI specification says that what you can
      tell about a URI depends on the first bit, in this case
      <code>http:</code>. It indicates that an <strong>IANA URI
      scheme registry</strong> is used to tell you what
      specification applies.
    </p>
    <p>
      The IANA registry indicates that the <code>http:</code>
      scheme calls out the <strong>HTTP 1./1
      specification</strong>, RFC@@@.
    </p>
    <p>
      HTTP 1.1 says that (unless otherwise specified) the client
      contacts the server on TCP port 80. The IANA registry of port
      numbers, just as it allocates port 25 to mail transfer,
      allocates 80 to HTTP. The HTTP spec is therefore mutually
      assumed by both parties. This spec describes what a request
      means, and that when the request is successful, what the
      response message sent back to the browser means.
    </p>
    <p>
      According to HTTP 1.1, in that response, there is a field
      (<strong>Content-type</strong>) which indicates how the body
      of the response should be interpreted. For each valid value
      of that field, there is an <strong>IANA content-type
      registry</strong> value which explains which specification
      applies to the body of the message. This is just the same
      system as for email.
    </p>
    <p>
      When the value if the field is <code>text/html</code>, it
      indicates that the message is a hypertext document ("web
      page") which is to be presented to the human being and
      interpreted then by the human being in the usual human way.
      If the field indicates it is an OFX file, then that means
      that the OFX specification determines what it means, and you
      need a program or something which understands what the fields
      of the OFX documents mean. In neither case can you argue that
      you didn't know. So long as the writers of the specification
      do a good job (and goodness knows they work hard enough at
      it) then there can be no argument as to what the actual
      fields in your bank statement mean.
    </p>
    <h3>
      <a name="publish1" id="publish1">You publish an XML
      document</a>
    </h3>
    <p>
      When you publish a document in XML, then there is another
      layer involved. Many different languages -- or even mixture
      of languages -- can be sent structured as XML. The mime type
      of the document can just be "application/xml", which doesn't
      tell the reader how to interpret it. For that, you have to
      look at the outermost element of the XML document. The
      namespace declaration gives a URI indicating the namespace.
    </p>
    <p>
      Note the difference between the use of a URI and a central
      registry. Because the namespace is identified by a URI, the
      web becomes the registry. Anyone can make a new XML
      namespace. Also, one can use a URI, such as a HTTP URI, which
      can be dereferenced. This allows the information which would
      have been in the registry to be put into a web document. (The
      W3C TAG is currently debating the issue of the best format to
      use for this meta information, but HTML, RDDL and RDF have
      been used in various combinations. But broadly there are two
      types of information. There may be a specification (or a
      reference to one) to tell a human reader what the language is
      and how to interpret it. there may also be data - a schema
      which describes the grammar of the language, or even the
      start of a logical definition of what the language means.
    </p>
    <p>
      But whatever information may or may not be available
      automatically, in an XML world, a system has to look into the
      document, at the namespace of the outermost element, to know
      how to interpret it. This generally means what application to
      launch - not to mention what icon to use to represent the
      document to a person.
    </p>
    <p>
      An example of a machine-readable document with important
      semantics is an online P3P web site privacy policy. This is
      an XML document which gives, for each category of personal
      information, the sort of thing the web site promises to do or
      not do with it. It can be scanned by a a browser more easily
      than a person can read a privacy policy. It is a useful
      feature, as it saves everyone's time and increases public
      confidence in responsible web sites. It clearly depends on
      the meaning of the terms being well defined by the
      specification.
    </p>
    <p>
      <em>(Problem: this doesn't always happen: MathML and XHTML as
      XML in practice.@@ links)</em>
    </p>
    <h3>
      <a name="publish" id="publish">You publish an RDF
      document</a>
    </h3>
    <p>
      Now let's talk semantics. Harder semantics - for logical
      systems. Some XML documents are RDF documents. RDF/XML is an
      XML-based language for data. It is very simple: each document
      is just a set of "triples". A triple gives the value of some
      property of some object - or some relationship between some
      object and some other object. The triples are independent, so
      interpreting the document is just, the RDF spec explains, a
      question of interpreting each triple.
    </p>
    <p>
      How do you figure out what a triple means? Well, the property
      (or relationship) is identified by a URI. And whoever made up
      the URI gets to say what the property means, that is, what
      any triple using that property means.
    </p>
    <p>
      So if make a property http://www.w3.org/2002/05/example#color
      and define that the color property is a name out of the
      Pantone(tm) list of colors and you send someone an order in
      RDF for a hat which has
      <em>http://www.w3.org/2002/05/example#color</em> of
      <em>blue256</em> then you are specifying blue256 on the
      pantone scale. No one can argue that you meant some other
      scale of blue. Normally the argument is made much easier by
      my actually writing a document
      http://www.w3.org/2002/05/example in which I explain what
      #color means. No one can argue, in their catalog, that "By
      suit, we mean something which is black, whatever
      <em>http://www.w3.org/2002/05/example#color</em> someone
      might say it is". The meaning of the triple is determined by
      the property, not by the subject or the object of the triple.
    </p>
    <table border="2">
      <caption>
        A <a name="section" id="section">section</a> through the
        stack
      </caption>
      <tbody>
        <tr>
          <th>
            Specification
          </th>
          <th>
            Field
          </th>
          <th>
            Where to look up values
          </th>
          <th>
            example value
          </th>
          <th>
            Example value calls out
          </th>
        </tr>
        <tr>
          <td>
            Ethernet (cf. IEEE 802.3)
            <p>
              and either DIX(RFC894) or 802.2,3 <a href=
              "http://www.ietf.org/rfc/rfc1042.txt">RFC1042</a>
            </p>
          </td>
          <td>
            Ethernet type (or protocol identification field for
            LLC) 16-bit Ethertype
          </td>
          <td>
            IEEE registry
            <p>
              Assignment by RAC process @@link
            </p>
          </td>
          <td>
            0x800
          </td>
          <td>
            <a href="http://www.faqs.org/rfcs/rfc791.html">Internet
            Protocol (RFC791)</a>
          </td>
        </tr>
        <tr>
          <td>
            <a href="http://www.faqs.org/rfcs/rfc791.html">Internet
            Protocol (RFC791)</a>
          </td>
          <td>
            Protocol
          </td>
          <td>
            IANA protocol-numbers
          </td>
          <td>
            <a href=
            "http://www.iana.org/assignments/protocol-numbers">6</a>
          </td>
          <td>
            Transmission Control protocol (RFC793)
          </td>
        </tr>
        <tr>
          <td>
            <a href=
            "http://www.ietf.org/rfc/rfc0793.txt">Transmission
            Control protocol (RFC793)</a>
          </td>
          <td>
            port
          </td>
          <td>
            IANA registry
            <p>
              port-numbers
            </p>
          </td>
          <td>
            <a href=
            "http://www.iana.org/assignments/port-numbers">80</a>
          </td>
          <td>
            HTTP 1.1
          </td>
        </tr>
        <tr>
          <td>
            <a href="/Protocols/rfc2616/rfc2616.html">HTTP 1.1</a>
          </td>
          <td>
            content-type
          </td>
          <td>
            IANA registry
            <p>
              mime types
            </p>
          </td>
          <td>
            application/xml
          </td>
          <td>
            XML1.0+NS
          </td>
        </tr>
        <tr>
          <td>
            <a href="/TR/REC-xml">XML</a> 1.0+<a href=
            "/TR/REC-xml-names">NS</a>
          </td>
          <td>
            xmlns
          </td>
          <td>
            The Web
          </td>
          <td>
            ...@@..rdf
          </td>
          <td>
            RDF M&amp;S 1.0
          </td>
        </tr>
        <tr>
          <td>
            <a href="/TR/REC-rdf-syntax">RDF MS 1.0</a>
          </td>
          <td>
            property
          </td>
          <td>
            The Web
          </td>
          <td>
            rdf:type
          </td>
          <td>
            RDF MS 1.0 section 4.1
          </td>
        </tr>
        <tr>
          <td>
            <a href="/TR/REC-rdf-syntax/#type">RDF MS 1.0
            definition of rdf:type</a>
          </td>
          <td>
            object
          </td>
          <td>
            The Web
          </td>
          <td>
            cyc:Person
          </td>
          <td>
            cyc ontology
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      Looking at the table which summarizes the steps we have been
      through, you will see the specs are connected by some field
      which points to the next spec through some list or registry.
      For the more recent layers, the registry has been replaced by
      the Web.
    </p>
    <h2 id="hooks">
      The hooks - identifiers
    </h2>
    <p>
      That's an interesting trend. If you like, we can see the
      technology move through three stages of civilization, in
      terms of the identifiers which are used for concepts.
    </p>
    <ol>
      <li>Using numbers or strings
      </li>
      <li>Using URIs - identify the same thing in all contexts
      </li>
      <li>Using dereferencable URIs
      </li>
    </ol>
    <p>
      The early protocols used numbers and strings which requires a
      central registry. that worked, because the only common
      concepts were those in the standard protocols, and those had
      to be common across the net for interoperability. In these
      areas still there is a strong argument for central control.
    </p>
    <p>
      As we move on to later protocols, the protocols themselves
      become more diverse. This is partly because they are at a
      higher application level. The centralized model starts to
      break down, as witness some of the social difficulties of
      getting an IANA allocation for a MIME type an embryonic W3C
      specification. So new protocols allow new applications to be
      defined using URIs, allowing anyone who has access to a bit
      of domain space to allocate them.
    </p>
    <p>
      The third stage of civilization is the one at which the
      identifiers can be looked up on the web. This is quite useful
      for engineers who encounter new languages. It doesn't really
      justify its existence, though, until one has technology --
      Semantic Web technology -- in which an automated agent can
      pick up metadata about the languages on the fly, and use that
      metadata to enhance its processing of data in that language.
    </p>
    <p>
      (What if I don't have a web site? This is becoming less and
      less of a problem. There are all kinds of existing ways of
      allocating an identifier. But the persistence of such
      information is, and always will be, like the cleanliness of
      water and air, an important social issue.)
    </p>
    <h2>
      <a name="When" id="When">When the chain does NOT connect</a>
    </h2>
    <p>
      We have seen how any user of the Internet is bound to a
      series of specifications which define the meanings of terms,
      and hence allow his or her equipment and agents to
      interoperable with others. This stack prevents one from
      sending a nasty email to someone and then protesting that the
      message didn't mean anything. So if the stack is so strict,
      how <em>does</em> one send a nasty email message when one
      <em>doesn't</em> mean it? There are plenty of times you want
      to include an attachment to which you want to refer, but for
      which you don't claim authorship or responsibility.
      Understanding the exceptions is as important as understanding
      the general rule. Many protocols have ways of breaking the
      chain, of including information which is not part of the
      meaning of the message.
    </p>
    <p>
      In email it is an <strong>attachment</strong>. There is
      always in email a cover note, the basic message, which
      conveys the actual message. You normally only use any
      attachment according to the main message. It might be "Hey,
      Joe, what do you think of this paper?", or "Look at this
      stupid program - but whatever you do don't run it!"
    </p>
    <p>
      Currently (2002) XML doesn't have a common standard for what
      has been called in that context "<strong>packages</strong>".
      This is a pity. It is on the agenda for XML Protocol working
      group, as seen as essential for SOAP operations. One must be
      able to include documents stapled to a SOAP request or
      response, which are not to be just acted on.
    </p>
    <p>
      At the Semantic Web level, those who have played with the
      <a href="Notation3.html">Notation3</a> language will
      recognize the curly brackets as the packaging, or
      <strong>quoting</strong>. Whereas a document
    </p>
    <pre>
my:car  srgb:color "000044".
</pre>
    <p>
      asserts that the car in question is blue, the document
    </p>
    <pre>
my;form67  :says {my:car  srgb:color "000044"}.
</pre>
    <p>
      does not. It merely says something about the statement that
      the car is blue.
    </p>
    <p>
      So being able to refer to something without asserting it,
      whether you call it attachment, packaging, or quoting, is an
      important feature of a language. The fact that you can do
      this removes the last excuse for anyone claiming not to have
      meant whatever they did say in the main message!
    </p>
    <h2 id="Conclusion">
      Conclusion
    </h2>
    <p>
      Internet messages and Web documents are represented in
      computer languages with well-defined specifications. Use of
      the Internet and the Web implies an acceptance of the
      specifications as authoritative.
    </p>
    <p>
      The specifications are linked together by identifiers which
      in earlier specs were numbers, but in later specs are URIs,
      ideally URIs which can be looked up on the Web. The ability
      to make these linked specifications requires the
      specifications to be designed very independently. This is
      simply the software engineering practice of information
      hiding between layers.
    </p>
    <p>
      The trend for the higher layers is toward more and more
      machine-processable metadata about such languages, which can
      be retrieved automatically and will aid in processing. Some
      of these will relate the semantics of terms in one vocabulary
      to terms in another, on a web-like way.
    </p>
    <p>
      The fact that as we move into the applications we see more
      and more diverse uses of the Web and the Net does not
      diminish our reliance on a sound standards in the supporting
      infrastructure.
    </p>
    <hr />
    <h2>
      Related
    </h2>
    <ul>
      <li>The Meaning of a document
      </li>
      <li>The meaning of an XML document
      </li>
    </ul>
    <h2>
      References
    </h2>
    <p>
      The table above contains hypertext links to some
      specifications used as examples.
    </p>
    <p>
      See also:
    </p>
    <ul>
      <li>The RDF concepts document. @@
      </li>
    </ul>
    <hr />
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <p>
      <a href="../People/Berners-Lee">Tim BL</a>
    </p>
  </body>
</html>