Metadata.html 33.7 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator" content=
    "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
    <title>
      Web architecture: Metadata
    </title>
    <link href="di.css" rel="stylesheet" type="text/css" />
    <meta http-equiv="Content-Type" content="text/html" />
  </head>
  <body bgcolor="#DDFFDD" text="#000000">
    <address>
      Tim Berners-Lee
      <p>
        Date started: January 6, 1997
      </p>
      <p>
        . Status: personal view, but corresponds &nbsp;generally to
        the W3C architecture for metadata.
      </p>
      <p>
        .
      </p>
      <p>
        Additions are at the end about consistency in
        label/metaset/collection syntax and semantics.
      </p>
      <p>
        The syntaxes used in this document are meant to illustrate
        the architecture and be clear but are otherwise random.
        This note was written before the more general <a href=
        "Semantic.html">Semantic Web</a> note.
      </p>
    </address>
    <p>
      <a href="Overview.html">Up to Design Issues</a>
    </p>
    <h3>
      Axioms of Web Architecture: Metadata
    </h3>
    <hr />
    <h1>
      Metadata Architecture
    </h1>
    <h4 id="Preface">
      Preface
    </h4>
    <p>
      <em>This document was written before the Semantic Web
      Roadmap, but is an introduction to the same ideas. Both
      introduce the world of machine-readable data on the web. This
      document introduces the concepts in the historical sequence
      at W3C, where the first driving applications of semantic web
      were metadat, and the first driving metadata applications
      were endorsement labels (<a href="#PICS">PICS</a>)</em>.
    </p>
    <h2>
      Documents, Metadata, and Links<br />
    </h2>
    <p>
      The thing which you get when you follow a link, when you
      de-reference a URI, has a lot of names. Formally we call it a
      <b>resource</b>. Sometimes it is referred to as a document
      because many of the things currently on the Web are human
      readable documents. Sometimes it is referred to as an object
      when the object is something which is more machine readable
      in nature or has hidden state. I will use the words document
      and resource interchangeably in what follows and sometimes
      may slip into using "object".
    </p>
    <p>
      One of the characteristics of the World Wide Web is that
      resources, when you retrieve them, do not stand simply by
      themselves without explanation, but there is information
      about the resource. Information about information is
      generally known as <b>Metadata</b>. Specifically, in the web
      design,
    </p>
    <h4>
      Definition
    </h4>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            Metadata is machine understandable information about
            web resources or other things
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      The phrase "machine understandable" is key. &nbsp;We are
      talking here about information which software agents can use
      in order to make life easier for us, ensure we obey our
      principles, the law, check that we can trust what we are
      doing, and make everything work more smoothly and rapidly.
      Metadata has well defined semantics and structure.
    </p>
    <p>
      Metadata was called "Metadata" because it started life, and
      is currently still chiefly, information about web resources,
      so data about data. &nbsp;In the future, when the metadata
      languages and engines are more developed, it should also form
      a strong basis for a web of machine understandable
      information about anything: about the people, things,
      concepts and ideas. &nbsp;We keep this fact in our minds in
      the design, even though the first step is to make a system
      for information about information.
    </p>
    <p>
      For an example of metadata, when an object is retrieved using
      the HTTP protocol, the protocol allows information about its
      date, its expiry date, its owner, and other arbitrary
      information to be sent by the server. The world of the World
      Wide Web is therefore a world of information and some of that
      information is information about information. In order to
      have a coherent picture of this, we need a few axioms about
      metadata. The first axiom is that :
    </p>
    <h4>
      Axiom
    </h4>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            metadata is data.
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      That is to say, information about information is to be
      counted in all respects as information. There are various
      parts of this.
    </p>
    <p>
      One is that metadata can be stored regarded as data, it can
      be stored in a resource. So, one resource may contain
      information about itself or about another resource. In
      current practice on the World Wide Web there are three ways
      in which one gets metadata. The first is the data about a
      document contained within the document itself, for example in
      the HEAD part of an HTML documents or within word processor
      documents. The second is that during the HTTP transfer the
      server transfers some metadata to the client about the object
      which is being transferred. This, during an http GET, is
      transferred from the server to the client and, during a PUT
      or a POST, is transferred from the client to the server. One
      of the things which we have to rationalize in our
      architecture of the World Wide Web is who exactly is making
      the statement. Whose statement, whose property is that
      metadata. The third way in which metadata is found is when it
      is looked up in another document. This practice has not been
      very common until the PICS initiative was to define label
      formats specifically for representing information about World
      Wide Web resources. The PICS architecture specifically allows
      for PICS labels which are resources about other resources to
      be buried within the resource itself, to be retrieved as
      separate resources, or to be passed over during the http
      transaction. To conclude,
    </p>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            Metadata about one document can occur within the
            document, or within a separate document, or it may be
            transferred accompanying the document.<br />
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      Put another way, metadata can be a first class object.
    </p>
    <p>
      The second part of the above axiom is:
    </p>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            Metadata can describe metadata
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      That is, metadata itself may have attributes such as
      ownership and an expiry date, and so there is meta-metadata
      but we don't distinguish many levels, we just say that
      metadata is data and that from that it follows that it can
      have other data about itself. This gives the Web a certain
      consistency.
    </p>
    <h2>
      The Form of Metadata<br />
    </h2>
    <p>
      Metadata consists of assertions about data, and such
      assertions typically, when represented in computer systems,
      take the form of a name or type of assertion and a set of
      parameters, just as in the natural language a sentence takes
      the form of a verb and a subject, an object and various
      clauses.
    </p>
    <h4>
      <a name="independent" id="independent">Axiom</a>
    </h4>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            The architecture is of metadata represented as a set of
            independent assertions.
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      This model implies that in general, two assertions about the
      same resource can stand alone and independently. When they
      are grouped together in one place, the combined assertion is
      simply the sum (actually the logical AND) of the independent
      ones. Therefore (because AND is commutative) collections of
      assertions are essentially unordered sets. This design
      decision rules out for example, in simple sets of data,
      assertions which are somehow cumulative or later ones
      override earlier ones. Each assertion stands independently of
      others.
    </p>
    <p>
      We will see below how logical expressions are formed to
      combine assertions in more varied ways, and syntactic rules
      which allow the subject at least of the assertion to be made
      implicit. But neither of these change the basic operation of
      combining assertions in unordered AND lists.
    </p>
    <h3>
      <a name="Attributes" id="Attributes">Attributes</a>
    </h3>
    <p>
      Assertions about resources are often referred to as
      attributes of the resource. That is, the type of assertion is
      an assertion that the object, the resource in question, has a
      particular named property such as it's author, and in that
      case the parameter is the name or identity of the author.
      Similarly, if the attribute is the document's date of expiry
      then the parameter is that date.
    </p>
    <p>
      Often, a group of assertions about the same resource occur
      together, in which case the syntax generally omits the URI of
      that resource as it is implicit. In these cases, when it is
      clear from the context about which resource the assertion is
      being made, the assertion often takes the form of a list of
      attributes and values. In RFC822 format messages, such as
      mail messages and HTTP messages, metadata is transferred
      where the attribute name is an RFC822 header name and the
      rest of the RFC822 line is the value of the attribute, such
      as Date: and From: and To: information. The attribute value
      pair model is that used by most activities defining the
      semantics of metadata today.<br />
    </p>
    <p>
      I use the word "assertion" to emphasize the fact that the
      attribute value pair when it is transferred is a statement
      made by some party. It does not simply and directly imply
      that the resource at any given time has that value for the
      given attribute. It must be seen as a statement by a
      particular party with or without implicit or explicit
      guarantees as to validity. Throughout the World Wide Web, as
      trust becomes an important issue, it will be important for
      software -- and people -- to keep track of and take into
      account who said what in terms of data and metadata. So, our
      model of data of a resource is something about which
      typically we know the creator or the person responsible, and
      typically the date of which the information was created,
      which implies, in the case of a piece of information which
      makes an assertion, the date at which the assertion was made.
    </p>
    <p>
      An assertion
    </p>
    <blockquote>
      (A u1, p, q...)
    </blockquote>
    <p>
      typically has as explicit parameters,
    </p>
    <ul>
      <li>the URI of the resource about which the assertion is made
      (u1).
      </li>
      <li>some identifier (A) for the type of assertion being made,
      such as author or date or expiry date.
      </li>
      <li>other parameters (p, q,...) according to the type of
      assertion.
      </li>
    </ul>
    <p>
      As implicit or explicit or implicit parameters,
    </p>
    <ul>
      <li>The party making the assertion
      </li>
      <li>The date/time of the assertion
      </li>
      <li>etc...
      </li>
    </ul>
    <p>
      We can often make an analogy with programming languages. An
      assertion in metadata can be compared with a function call in
      a programing language. In object oriented languages, the
      object of the function has a special place among the
      parameters just as the subject of an assertion does in
      metadata. In object oriented languages, though, the set of
      possible functions depends on the object, whereas in metadata
      the set of assertion types is more or less unlimited, defined
      by independent choice of vocabulary. <em>Anyone can say
      anything about anything</em>.
    </p>
    <h3>
      A space for attribute names
    </h3>
    <p>
      It is appropriate for the Web architecture to define like
      this the topology and the general concepts of links and
      metadata. What about the significance of individual
      relationships? Sometimes, as above, these are special,
      defined in the architecture, and having an architectural
      significance or a significance to the protocols. In other
      cases, the significance of relationships or indeed of
      attributes is part of other specifications, other design, or
      other applications, and must be defined easily by third
      parties. Therefore, the set of such relationship and
      attributes names must be extremely easily extensible and
      therefore extensible in a decentralized manner. This is why
    </p>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            the URL space is an appropriate space for the
            definition of attribute names.
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      We have already (1997) several vocabularies of attribute
      names: for example, the HTML elements which can occur within
      the HEAD element, or as another example, the headers in an
      HTTP request which specify attributes of the object. These
      are defined within the scope of particular specifications.
      There is always pressure to extend these specifications in a
      flexible way. HTTP header names are generally extended
      arbitrarily by those doing experiments. The same can also be
      true of HTML elements and extension mechanisms have been
      proposed for both. If we look generically at the very wide
      space of all such metadata attribute names, we find something
      in which the dictionary would be so large that ad hoc
      arbitrary extension would be just as chaotic as central
      registration would be stifling.
    </p>
    <blockquote>
      <b>Aside: Comparison with Entity-Relationship models</b>.
      <p>
        This architecture, in which the assertion identifier is
        taken from (basically) URL space differs from the
        "Entity-relationship" (ER) model and many similar models
        like it, including most object-oriented programming
        systems. In an ER model, typically every object is typed
        and the type of an object defines the attributes can have,
        and therefore the assertions which are being made about it.
        Once a person is defined as having a name, address and
        phone number, then the schema has to be altered or a new
        derived type of person must be introduced before one can
        make assertions about the race, color or credit card number
        of a person. The scope of the attribute name is the entity
        type, just as in OOP the scope of a method name is an
        object type (or interface)By contrast, in the web, the
        hypertext link allows statements of new forms to be made
        about any object, even though (before anything other than
        syntax checking) this may lead to nonsense or paradox. One
        can define a property "coolness" within one's own part of
        the web, and then make statements about the "coolness" of
        any object on the web.
      </p>
      <p>
        This design difference is in essence a resurfacing of the
        decision to make links mondirectional, sacrificing
        consistency for scalability.
      </p>
      <p>
        An advantage of ER systems is that they allow one to work,
        in the user interface for example, with a set of properties
        which "should" be defined for each entity. You can define
        these in the Metadata's predicate calculus by defining an
        expression for a "well specified" object. ("For all
        <i>X</i> such that <i>X</i> is a customer <i>X</i> is
        well-specified if there exists <i>n</i> such that <i>n</i>
        is the name of <i>X</i> and there exists <i>t</i> such that
        <i>t</i> is the telephone number of <i>X</i> and...)
      </p>
      <p>
        end of aside.
      </p>
    </blockquote>
    <h3>
      <a name="MetadataHeaders" id="MetadataHeaders">Metadata
      ("Entity") headers in HTTP</a>
    </h3>
    <p>
      In the above it is important to realize that the HTTP headers
      which contain what can be considered as metadata ("entity
      headers") should be separated quite distinctly from HTTP
      headers which do not. HTTP headers which contain metadata
      contain information which can follow the document around. For
      example, it is reasonable for a cache to pass such
      information on without treatment, it is reasonable for
      clients or other programs which process data to store those
      headers as metadata with the document for later processing.
      The content of those headers do not have to be associated
      with that particular HTTP transaction. By contrast, the
      RFC822 headers in HTTP which deal specifically with the
      transaction or deal specifically with the TCP link between
      the two application programs have a shorter scope and can
      only be regarded as parameters of the HTTP method. To make
      this separation clear will be to make it easier not only to
      understand HTTP and how it should be processed, it will also
      make it clear which pieces of HTTP can be used easily and
      transparently by other protocols which may use different
      methods with different parameters. The clarification of the
      architecture of HTTP such that both the metadata and the
      methods can be extended into other domains is an important
      part of the work of the World Wide Web Consortium. The
      Internet protocols SMTP and NNTP and HTTP as well as many new
      and proposed protocols share much of the semantics of the
      RFC822 headers. Formalizing the shared space and making it
      clear that there is a single design for a particular header,
      rather than four designs which are independent and happen to
      look very similar, requires a general architecture, some
      careful thought, and is essential for the future design of
      protocols. It will allow protocol design to happen in small
      groups which can take for granted the bulk of previous work
      and concentrate on independent new design.
    </p>
    <h4>
      Authorship of HTTP entity headers
    </h4>
    <p>
      It may be possible to remove or at least encompass the
      apparent anomaly of metadata transferred from an HTTP server
      by creating a special link type which links the document
      itself to the set of attributes which the server would give
      in the HTTP headers. In other words, the server would be able
      to say, "here is a document, here is some metadata about it,
      and the metadata about it has the following URL". This would
      allow one, for example, request a signed copy of the HTTP
      headers. It would allow one to ask about the intellectual
      property rights of those headers, and the authorship of those
      headers.
    </p>
    <p>
      It is important to be completely clear about the authorship
      of the HTTP headers. The server should be seen as a software
      agent acting on behalf of a party which is the publisher or
      document author: the definer of the URI to resource identity
      mapping. The webmaster is only an administrator who is
      responsible for ensuing that (through an appropriately
      configured server) the transactions on the wire faithfully
      represent the statements and wishes of that party.
    </p>
    <h2>
      Links<br />
    </h2>
    <p>
      An assertion of relationship between two resources is known
      as a <b>link</b>.
    </p>
    <p>
      In this case, it is a triple
    </p>
    <blockquote>
      (<i>A u1 u2</i>)
    </blockquote>
    <p>
      of:
    </p>
    <ul>
      <li>the type of assertion being made, that is, the
      relationship which is being asserted,
      </li>
      <li>the first URI,
      </li>
      <li>and the second URI.
      </li>
    </ul>
    <p>
      These sorts of assertions, links, are the basis of navigation
      in the World Wide Web; they can be used for building
      structure within the World Wide Web and also for creating a
      semantic Web which can express knowledge about the world
      itself. That is to say, links may be used both for the
      structure of data, in which case they are metadata, but also
      they may be used as a form of data.
    </p>
    <p>
      Links, like all metadata can be transferred in three ways.
      They can be embedded in a document, which is one end of the
      link, they can be transferred in an HTTP message, for example
      what is called the header of the document, and they can be
      stored in a third document. This latter method has not been
      used widely on the World Wide Web to date.
    </p>
    <h2>
      Goal: <a name="Self-descr" id="Self-descr">Self-describing
      information</a><br />
    </h2>
    <p>
      A critical part of the design of the whole system is the way
      that the semantics of metadata or indeed of data are defined.
      The semantics of metadata in our RFC822 headers in mail
      messages and in http messages are defined by hand in english
      in the specifications of those protocols. The PICS system
      takes this to one stage further in terms of flexibility by
      allowing a message to contain a pointer to the document which
      defines, in human readable terms, the semantics of each
      assertion made within a <a href="#PICS">PICS</a> label. In
      the future we would like to move toward a state in which any
      metadata or eventually any form of machine readable data
      carries a reference to the specification of the semantics of
      all the assertions made within it.
    </p>
    <p>
      For example, suppose that when a link is defined between two
      documents, the relationship which is being asserted is
      defined in a such way that it can be looked up on the World
      Wide Web (i.e. using some form of URI), and someone or some
      program, which has not come across that relationship before
      can follow the link and extend its understanding or
      functionality to take advantage of this new form of
      assertion.
    </p>
    <p>
      In the case of PICS, one can dynamically pick up a human
      readable definition of what that assertion really means. In
      PICS (and in theory in SGML using DTDs), one can also pick up
      a machine readable definition of what form that assertion can
      take, what syntax, what types of parameters it can take. This
      allows a human interface to a new PICS scheme to built on the
      fly. To go one step further, one could, given a suitable
      logic or knowledge representation language, pick up a machine
      readable definition of the semantics of that assertion in
      terms of other relationships.
    </p>
    <p>
      The advantages of such self describing information is that it
      allows development of new applications and new functionality
      independently by many groups across the web. Without
      self-describing information, development must wait for large
      companies or standards committees to meet and agree on the
      commonly agreed semantics.
    </p>
    <p>
      Of course a pragmatic way of extending software to handle new
      forms of information is to dynamically download the code to
      support a software object which can handle such data for one.
      Whereas this is a powerful technique, and one which will be
      used increasingly, it is not sufficient. It is not sufficient
      because one has to trust the implementation of the object,
      and the state.
    </p>
    <h4>
      Goal
    </h4>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            As much as possible of the syntax and semantics should
            be able to be acquired by reference from a metadata
            document.
          </td>
        </tr>
      </tbody>
    </table>
    <h3>
      Building Applications using Link Relationships
    </h3>
    <p>
      It turns out that a very large number of applications both
      built on top of the web and also built within the
      infrastructure of the Web can largely be built by defining
      new relationship types. Examples of these are the document
      versioning problem which can be largely solved by defining
      link values relating documents to previous and future
      versions and to lists of versions; intellectual property
      rights, distribution terms, and other labeling which can be
      solved by making a link from one document to the document
      containing the metadata.
    </p>
    <hr />
    <h3>
      Summary so far
    </h3>
    <ol>
      <li>Metadata is data
      </li>
      <li>Metadata may refer to any resource which has a URI
      </li>
      <li>Metadata may be stored in any resource no matter to which
      resource it refers
      </li>
      <li>Metadata can be regarded as a set of assertions, each
      assertion being about a resource &nbsp;(A &nbsp;<i>u1</i>
      &nbsp;...).
      </li>
      <li>Assertions which state a named relationship between two
      resources are known links &nbsp;(A <i>u1 u2</i>)
      </li>
      <li>Assertion types (including link relationships) should be
      first class objects in the sense that they should be able to
      be defined in addressable resources and referred to by the
      address of that resource &nbsp;A in { u }
      </li>
      <li>The development of new assertion types and link
      relationships should be done in a consistent manner so that
      these sort of assertions can be treated generically by people
      and by software.
      </li>
    </ol>
    <hr />
    <p>
      <i>Rough from here on down</i>
    </p>
    <h3 id="Label">
      Label syntax: Assertions about a common subject
    </h3>
    <p>
      When labeling information, it is often useful to make a lot
      of statements about the same object. It is also useful to be
      able to make the same set of &nbsp;statements about a set of
      resources. For example, the assertions
    </p>
    <pre>
(A1 u1  a b ... )
 (A2 u1  c d )
 (A2 u1  a f g h )
 
</pre>
    <p>
      might be written
    </p>
    <pre>
(for u1
         (A1 a b ... )
         (A2 c d )
         (A3 a f g h )
 )
 
</pre>
    <p>
      Therefore in the syntax of an actual assertion the subject is
      implicit. This is just the case with RFC822 headers which
      implicitly refer to the following body, and with HTML "HEAD"
      element contents which implicitly refer to the containing
      document. &nbsp;(Though notice there is a fundamental
      difference, discussed <a href=
      "w:/DesignIssues/temp.html#mesages">below</a>, between a
      general label and a message header because the message header
      is definitive.)
    </p>
    <p>
      So it is wise to recognise the label as case which it is wise
      to specifically optimize in the syntax. <em>[In RDF this
      indeed the case, that the subject is established as a
      context, and then many properties are given within that
      context. -2000/9]</em>
    </p>
    <p>
      Assertions, when the subject is implicit, are known as
      attribute-value pairs as discussed above. Let's use the term
      "label" for a set of assertions with the subject extracted.
      &nbsp;Like the label on a jam jar, it contains information
      but there must be something else (in this case if its
      placement on the jar) which tells you to what it applies.
      &nbsp;(The PICS label in fact contained other information
      too, including the subject and meta-meta-data about the
      authorship of the label.)
    </p>
    <p>
      Local definition:
    </p>
    <table border="1" cellpadding="2">
      <tbody>
        <tr>
          <td>
            A label is a set of assertions with a common implicit
            subject. &nbsp;In this architecture it is a set of
            attribute-value pairs
          </td>
        </tr>
      </tbody>
    </table>
    <p>
      <i>(There is a convention that you can write "Jam" on a jam
      jar label. &nbsp;You don't write "Jam jar" or "Jam Jar
      label". &nbsp; Even though I once saw a label on a cardboard
      box with the words "Equipment shipping box label" on it!)</i>
    </p>
    <h3>
      Authorship of Metadata
    </h3>
    <p>
      It follows from the fact that metadata is data that here can
      be metadata about it. &nbsp;Some of this metadata becomes
      crucial when we consider a trust model. &nbsp;The logic we
      need includes the author of metadata
    </p>
    <p>
      p1: (A u1 . . .)
    </p>
    <p>
      where p1 is ,in a system with low trust, the author as
      stated, but in a cryptographically secure system is a
      principle represented by a key.
    </p>
    <p>
      On the web, the granularity of information is the resource.
      Authorship and access control genrally use this granularity.
      Therefore, typically, the trust one places in an assertion is
      function the document which asserted it, and the metadata
      about that document. However, when information is then
      combined from many resources, one needs a language which
      allows the source of the original to be recorded. Like
      blockquote in HTML, this separates the data itself from the
      resource, so the resource does assert the data directly but
      asserts that it was asserted.
    </p>
    <h2>
      Analysing labels
    </h2>
    <p>
      See <a href="Labels.html">Analysing PICS labels as generic
      Metadata</a>
    </p>
    <p>
      where we look at PICS labels and try to sift out the actual
      semantics of them. This is a thought experiment generating
      requiremnts. The conclusions are that information such as
      authorship and date information in fact form a tree of
      assertions about assertions, and it is important to be clear
      about the structure of that tree. The notion of a message is
      brought up there too, but not followed up as it is not
      germaine to the discussion at this point.
    </p>
    <h2>
      Algebraic Manipulations
    </h2>
    <p>
      If you can make assumptions about the properties of labels
      then you can manipulate them, possibly without knowing
      everything about their meaning. &nbsp;Properties such as
      commutativity, transitivity and associativity would be very
      useful to have easily available: perhaps in the syntax, or
      failing that in the schema.
    </p>
    <p>
      [See <a href="Semantic.html">Semantic Web roadmap</a> for
      higher levels of logic]
    </p>
    <p>
      For example, given a label saying a pair of jeans has a 32
      inch waist and a price of $28, I can deduce a label which
      just has the price of $28. &nbsp;But given a label which says
      that the punishment for the crime is a 2 month in jail and a
      fine of $3000, &nbsp;I can't deduce one that says that that
      the punishment &nbsp;is 2 months in jail.
    </p>
    <p>
      A typical use of metadata will be to provide a statement
      along with its proof to be verified by another party.
      &nbsp;Being able to process these things efficiently and with
      limited knowledge will be crucial.
    </p>
    <p>
      The most practical way to do this is to create a basic
      commonvocabulary for the logical functions. Sometimes known
      as the "RDF upper layers", these are mentioned in the
      <a href="Semantic.html">note on the Semantic Web.</a>
    </p>
    <h4>
      Ordered/Unordered
    </h4>
    <p>
      The <a href="#independent">axiom of independence of
      assertions</a> above gives us that in any set of assertions,
      as assertions are independently true, specific assertions may
      be removed or reordered, leaving the document just as valid
      (though possibly less informative).
    </p>
    <p>
      Examples of unordered things currently are: RFC822 message
      header lines, SGML attributes. Examples of ordered things
      are: HTTP header lines and SGML elements.
    </p>
    <p>
      Do we need a form in which we can make an assertion which has
      many parameters which are in fact not mutable in any way?
    </p>
    <h2>
      Summary of Requirements
    </h2>
    <p>
      There are ways of representing &nbsp;the above things:
      &nbsp;messages, labels, specifying labels, and statements and
      distinguish between them.
    </p>
    <p>
      As much as possible of the syntax and semantics should be
      able to be acquired by reference from a metadata document.
    </p>
    <p>
      It must be possible to mix multiple vocabularies within the
      same scope.
    </p>
    <p>
      The syntax and structure should be such that as many
      manipulations as possible can be done without having to know
      the semantics of the vocabulary in use.
    </p>
    <p>
      A common voabulary for basic logic and knowledge
      representation functionality will be required.
    </p>
    <hr />
    <h2>
      References
    </h2>
    <p>
      <a name="PICS" id="PICS">PICS</a> - The PICS project was a
      project to define standards for interchange of endorsement
      information, aimed at the content filterting problem. See the
      PICS home page.
    </p>
    <hr />
    <address>
      Tim BL, &nbsp;January 1997
      <p>
        Last edit $Date: 2009/08/27 21:38:08 $
      </p>
    </address>
  </body>
</html>