key-free-trust.html 26 KB

Raw Blame History Permalink

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="generator"
    content="HTML Tidy for Linux/x86 (vers 1st March 2002), see www.w3.org" />
    <title>
      Key Free Trust in the Semantic Web
    </title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
    <link href="http://www.w3.org/StyleSheets/base.css" rel="stylesheet"
    type="text/css" />
  <style type="text/css">
<!--
    body {margin: 0; padding: 0; color: black; background: white;}
    h1, h2, h3, h4, h5, p, pre, table, div, ol, ul, dl, dt, dd
        { margin-left: 8%; margin-right: 22%; padding:0;}
    em {font-weight: normal; font-style: italic;}
    u,ins,.ins  { background: white; color: red;}
    del,strike,.strike   { background: white; color: silver; text-decoration: line-through;}
    code     {font-weight: normal; }
    .def        { background: #FFFFFF; font-weight: bold}
    .link-sec   { font-style: italic;}
    .link-def   { background: #FFFFFF; color: teal;  font-style: italic;}
    .comment    { background: #FFFFF5; color: black; padding: .7em; border:
                  navy thin solid;}
    .discuss    { color: blue; background: yellow; }
    .xml-example,.xml-dtd { margin-left: -1em; padding: .5em; white-space:
                            pre; border: none;}
    .xml-dtd    { background: #efeff8; color: black;}

-->

  </style>


  </head>
  <body xml:lang="en" lang="en">
    <h1>
      Finding Bacon's Key
    </h1>
    <h2>
      Does Google Show How the Semantic Web Could Replace Public Key Infrastructure?
    </h2>
    <p>
      Joseph M. Reagle Jr., &lt;reagle@w3.org&gt;
    </p>
    <h2>
      Abstract
    </h2>
    <p>
      This document briefly introduces the topic of trusted semantic web applications
      that do not require the existence of an complex public key infrastructure. It
      derives from a discussion with Tim Berners-Lee, has been improved given comments
      from folks in this <a
      href="http://lists.w3.org/Archives/Public/www-rdf-interest/2002Apr/0016.html">thread</a>,
      but I'm solely responsible for any errors.
    </p>
    <h2>
      Trust
    </h2>
    <p>
      The question of <a
      href="http://www.firstmonday.dk/issues/issue2/markets/#wit">what is trust</a> has
      been the subject of many a graduate thesis. For simplicity's sake I will rely
      upon the following definition:
    </p>
    <dl>
      <dt>
        Trust (worthiness)
      </dt>
      <dd>
        The degree to which an agent considers an assertion to be true for a given
        context. While the term "trust" is often used to denote a very high degree of
        confidence, there is an associated risk of the assertions being wrong.
      </dd>
    </dl>
    <p>
      In traditional cryptographic applications the trust in a statement is
      commensurate with the trust in the reputation of its author via a cryptographic
      binding. This assurance is accomplished via a digital signature which requires
      that:
    </p>
    <ol>
      <li>
        A cryptographic key be strongly bound to a statement via a digital signature
        algorithm.
      </li>
      <li>
        Only the specific person has access to the given key.
      </li>
    </ol>
    <p>
      Consequently the following properties are ensured: authenticity (the trust in the
      person, who keeps their key private, is extended to the binding between the key
      and statement), integrity (any change to the key or statement will result in a
      different signature), and sometimes non-repudiation (if they key is indeed unique
      to the control of the person, then the person can not deny the binding because
      how else would it have been created?). Frequently, this cryptographic binding is
      associated with a semantic such as "I believe", "I assert", "It is true", or "I
      notarize". (I tend to think in the semantic "I believe", however one can often
      cast a semantic of one type as another: "I believe 'I notarize this was presented
      to me on this date and time.'")
    </p>
    <h2>
      Key Based Trust
    </h2>
    <p>
      How is this cryptographic digital signature created such that it has these
      properties? Public key algorithms are based on <a
      href="http://www.x5.net/faqs/crypto/q7.html">trap-door one-way functions</a>:
    </p>
    <blockquote>
      <p>
        "The public key gives information about the particular instance of the
        function; the private key gives information about the trap door. Whoever knows
        the trap door can perform the function easily in both directions, but anyone
        lacking the trap door can perform the function only in the forward direction.
        The forward direction is used for encryption and signature verification; the
        inverse direction is used for decryption and signature generation." &mdash; <a
        href="http://www.x5.net/faqs/crypto/q7.html">Cryptography FAQ</a>.
      </p>
    </blockquote>
    <p>
      Consequently, a single person (and only that person) can bind their key to a
      statement that anyone else, posessing the public key, can confirm! This is
      brilliant, but of course the problem of this scenario is that when I want to
      confirm <a
      href="http://backissues.worldlink.co.uk/articles/250100180310/22.htm">Kevin
      Bacon's</a> signature, how do I know I posess his <em>real</em> public key? On
      the Internet today there are many cryptographic keys out there purporting to
      belong to famous people. There may even be some cryptographically signed
      documents that can be confirmed to be bound to one of those Kevin Bacon keys, but
      did the <em>real</em> Bacon sign that document? Probably not. How is this problem
      addressed? Not easily.
    </p>
    <p>
      The two common approaches to finding the right public key are:
    </p>
    <ol>
      <li>
        Public Key Infrastructure (<strong>PKI</strong>) typically entails a
        hierarchically organized infrastructure for organizing trust relationships. For
        example, what if I wanted to confirm a key I found? When I was hired by MIT I
        was given a floppy disk with MIT's public key. I trust this. This key also
        signed other keys (i.e., a <strong>certificate</strong>) such that I can
        transitively trust those keys as well. When I find a key purporting to belong
        to Kevin Bacon I note that it is signed by the Actors' Guild. Of course, how do
        I know that's the real Actors' Guild? Fortunately, the MIT key has signed the
        Department of Education (DoE) key, which signed the Department of Commerce's
        key (DoC), which signed the Actors' Guild key. If I successfully verify all the
        signatures on these certificates (i.e., a <strong>certificate chain</strong>) I
        can be confident I have Kevin Bacon's public key! I can then use that to
        confirm if the document was signed by Kevin Bacon.
      </li>
      <li>
        <a href="http://www.rubin.ch/pgp/weboftrust.en.html">Web of Trust</a> was
        popularized by the PGP privacy application and uses a similar transitive trust
        model as PKI, but without the heirarchical structure. Instead, it is informal
        and decentralized. Typically, when users of PGP meet together at conferences
        they have key signing parties where they can easily and personally identify
        each other and add the appropriate signatures to each others' keys. If I'm not
        sure that the key I found is really Kevin Bacon's, perhaps I know someone, who
        knows someone, (through <a href="http://smallworld.sociology.columbia.edu/">six
        degrees</a>), that does!
      </li>
    </ol>
    <h2>
      Preponderance Based Trust
    </h2>
    <p>
      If public key infrastructures, transitive trust, and certificate chaining sounds
      complex, it is! First, the infrastructure or density of the web of certificates
      must be sufficient to be able to confirm keys. Second, extended trust
      relationships can be nonintuitive to humans. We like immediate and intuitive
      reasons for trust. While the infrastructural mechanism can address institutional
      requirements for liability, they don't appeal to us viscerally. Fortunately, PGP
      offered an even simpler method of engendering confidence in a key without the
      need for other signatures: <strong>fingerprints</strong>!
    </p>
    <p>
      A critical concept to cryptography is that of a digest value (hash result or
      fingerprint):
    </p>
    <blockquote>
      <p>
        "A (mathematical) function which maps values from a large (possibly very large)
        domain into a smaller range. A 'good' hash function is such that the results of
        applying the function to a (large) set of values in the domain will be evenly
        distributed (and apparently at random) over the range." [X509] ... A
        cryptographic hash is "good" ... [when] any change to an input data object
        will, with high probability, result in a different hash result, so that the
        result of a cryptographic hash makes a good checksum for a data object."
        &mdash; <a href="http://www.ietf.org/rfc/rfc2828.txt">RFC2828.</a>
      </p>
    </blockquote>
    <p>
      Strong hash functions are almost always used with a digital signature algorithm
      because it can be computationally expensive to perform a cryptographic signature
      on the whole of a document. Instead, one can take its digest value and sign
      <em>that</em> instead. Integrity is maintained because any alteration to the
      document will yield a different digest value (and consequent signature) and it's
      <em>very</em>&nbsp;difficult to find another document which hashes to the same
      value.
    </p>
    <p>
      However, digests are useful independent of a signature. The nifty PGP fingerprint
      feature enabled a person to leave the finger print (the digest value) of their
      public key all over the Internet. Consequently, if I find a purported Bacon key
      and generate the fingerprint and then find postings on the Internet about or by
      Bacon with that fingerprint, my confidence that I possess his real key can be
      very high. Personally, I'd intuitively trust a key and its fingerprint, if I
      found the fingerprint on his official web page, repeated on his fans' pages, and
      included in his posting to a celebrity mailing list.
    </p>
    <p>
      You can find a <a
      href="http://groups.google.com/groups?q=Reagle+++++E0+D5+B2+05+B6+12+DA+65++BE+4D+E3+C1+6A+66+25+4E&amp;hl=en&amp;lr=lang_en&amp;scoring=r&amp;selm=%24m2n1705-.3.0.32.19961213001747.00937d30%40rpcp.mit.edu&amp;rnum=4">
      PGP fingerprint from me on Usenet back in 1996</a>! How do you know it is not an
      imposter? It's improbable: the desire to impersonate me <em>now</em> would have
      required a determined effort to post messages that sound as if they've been
      written by me, (otherwise they'd be identified as fraudulent), for the past six
      years!
    </p>
    <h2>
      The Semantic Web
    </h2>
    <p>
      The Semantic Web envisions a web of machine processable information in the form
      of statements:
    </p>
    <blockquote>
      <p>
        "The Semantic Web will bring structure to the meaningful content of Web pages,
        creating an environment where software agents roaming from page to page can
        readily carry out sophisticated tasks for users... The Semantic Web is not a
        separate Web but an extension of the current one, in which information is given
        well-defined meaning, better enabling computers and people to work in
        cooperation." &mdash; <a
        href="http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html">The
        Semantic Web</a>.
      </p>
    </blockquote>
    <p>
      On the Web today we have a rich interconnection of the rather stupid hyperlinks
      (i.e. "go there") between Web pages; these pages are in many natural language
      (e.g., English, Japanese) and include descriptions that are useful only to
      humans, but not our computers. Even so, the popular Web search service <a
      href="http://www.google.com/technology/">Google</a> is able to make great use of
      these simple interconnections of hyperlinks to help us find words.
    </p>
    <blockquote>
      <p>
        "Google bridges the divide between human-generated indexes and
        machine-generated analysis. Y'see, the Web is full of people like you and me,
        making links between documents; human beings, making decisions about documents,
        voting with their links. When I link to some arbitrary document, it's an
        indication that I think that it's in some way authoritative. When you link to a
        document I wrote, you're indicating that I'm in some way authoritative. The
        Internet is already structured in a meaningful way, but that structure is
        obscured. Google teases out the relationship between the URLs, examining the
        webs of authority: this person is linked to by 50,000 others, and he links to
        this other person over here, which indicates that person one is a pretty sharp
        individual, one who's inspired 50,000 human beings to take time out of their
        busy schedules to link to him; and person one thinks that person two is on the
        ball, which suggests that person two knows what she's on about." &mdash; <a
        href="http://www.oreillynet.com/lpt/a/network/2002/03/08/cory_google.html">How
        I Learned to Stop Worrying and Love the Panopticon</a>
      </p>
    </blockquote>
    <p>
      What if we had more than simple hyperlinks, but those words that describe things
      like contact information, schedules, interests, and relationships were also
      interconnected and easily processed by computers? And, just like in Google,
      statements about a "sharp" person endorsing a person "on the ball" can be made or
      inferred? Not only would search engines become more accurate, but we could have
      our programs easily organize the abundance of information available to us now but
      hidden in a babble of inconsistent formats. (For example, I typically enter the
      flight itineraries I receive in email into a Web form and my PDA, this is
      redundant!)
    </p>
    <h2>
      The Semantic Web and Trust
    </h2>
    <p>
      Not surprisingly, most security applications that entail authorization, access
      control, or any trust application are statements. The Semantic Web has tools that
      allow one to make simple statements of the form: "X has the property Y".
      Computers can then help humans by making use of that information: "Find all pages
      with property Y".
    </p>
    <p>
      Imagine two projects at MIT: one is working on giving credentials to employees
      such that "Joseph is an employee of MIT", another is determining who has access
      to the online reference materials, "MIT Employees can access library services."
      Given the nature of large institutions one might not be surprised to find those
      working on these two projects know nothing of each other. But, if they are
      written using semantic web tools we need not worry about system
      incompatibilities. The development of an application to determine "Joseph has
      read access to MIT library services" is natural to the technology: neither system
      needs to be re-architected, one simply writes a few rules!
    </p>
    <p>
      Additionally, one of the strengths of the Semantic Web is that one can make
      statements about statements. One of the simplest statements one can make about
      another statement is: "Statement X has the fingerprint: Z".
    </p>
    <h2>
      Key Free Trust in the Semantic Web
    </h2>
    <p>
      I've written about 1600 words so far to identify concepts that I will use towards
      a simple hypothesis. Those concepts are:
    </p>
    <ol>
      <li>
        Trust is one's confidence in a statement.
      </li>
      <li>
        Cryptographic signatures permits one to associate a level of trust in a
        statement (represented in digital form) akin to that of the reputation of its
        author/key.
      </li>
      <li>
        It's hard to know if one has the real public key of someone else.
      </li>
      <li>
        The Semantic Web can be a rich, decentralized, archived, and interconnected
        source of machine processable statements. Many of those statements will relate
        to the identity, relations, capabilities, and authorizations of agents (human
        or computer).
      </li>
      <li>
        Cryptography need not be the only basis by which we evaluate trust. I've shown
        that the preponderance of a fingerprint or a link to a site permits a
        relatively high level of confidence in the owner of the key (i.e., PGP
        fingerprints) or relevance of a site (i.e., Google).
      </li>
      <li>
        The Semantic Web permits statements that describe the digest value of and
        trust-worthiness of other statements; it will be permeated with annotated
        fingerprints.
      </li>
    </ol>
    <dl>
      <dt>
        My hypothesis:
      </dt>
      <dd>
        The pervasive use of digest values to identify the statements in the Semantic
        Web will engender a preponderance of evidence for trust <em>without</em>
        cryptography.<br />
        <br />
      </dd>
    </dl>
    <p>
      There is a major and minor consequent of this hypothesis. The major consequent is
      that complex public key infrastructure may not be necessary. Instead, the
      Semantic Web can mirror the informal and decentralized character of PGP's Web of
      Trust with some improvements: it is available on and inter-related to the rest of
      the Web, redundantly archived, harvested and processed by roving agents and
      engines that can trivially repurpose it or offer other value added services. For
      example, institutions that demand liability assurances can easily build
      applications: "The cost of these statements is $4." and "I will pay $40,000 if
      these statements are incorrect." The minor consequent is the cryptographic
      signatures themselves might not be necessary to make a reasonable trust
      evaluation about a statement that has had time to grow into the tangled root
      structure of the Web. One might be willing to rely upon information if there is a
      dense set of inter-related statements of the form: "the information with this
      digest value (Z), was trustworthy for my purposes."
    </p>
    <p>
      Of course, the presence of a digital signature (beyond the simple digest value)
      would increase one's confidence and the signature itself is a relatively
      inexpensive operation. So the true import would be the simplification of a
      mechanism for obtaining keys. It can be simple, bottom-up, and decentralized; if
      need be, decentralized and extensible systems can simulate hierarchical and
      closed systems much easier than vice-versa.
    </p>
    <h2>
      Gaming the System
    </h2>
    <p>
      How secure would this system be? Nothing is perfectly secure. People can work for
      years to build a reputation such that they can cheat once and gain more in that
      single act than their reputation is otherwise worth. Or, a community can band
      together to discredit the reputation of someone they dislike. This has nothing to
      do with cryptography, but human nature and game theory.
    </p>
    <p>
      Recently, a businessman who owned a real store and who also had many loyal Web
      customers disappeared. <a
      href="http://www.msnbc.com/local/wdiv/a1085793.asp?cp1=1">Stewart Richardson</a>
      built up a solid reputation with many rave reviews on the eBay auction site; he
      was known for the timely completion of Web transactions. Now he is gone and so is
      the $200,000 from his most recent auction.
    </p>
    <p>
      Last year, tens of thousands people banded together for some amusing political
      antics: if you asked Google for a "dumb motherfucker", during the last
      presidential election the George W. Bush Presidential Campaign On-Line Store was
      the top return. Interestingly, the system corrected itself and the <a
      href="http://www.google.com/search?q=dumb+motherfucker">search term now returns
      articles about the phenomona</a>, which seems entirely appropriate!
    </p>
    <h2>
      Revocation
    </h2>
    <p>
      Security applications often require a mechanism to <a
      href="http://csrc.nist.gov/pki/PKImodels/">revoke</a> a previous "statement." For
      example, when I no longer consider my 1024 bit <a
      href="http://pgp.mit.edu:11371/pks/lookup?search=reagle&amp;op=index&amp;fingerprint=on&amp;exact=on">
      key</a> to be strong enough, how do I uproot this statement from the Semantic Web
      and replace it with my new key? As I've written elsewhere, <a
      href="http://cyber.law.harvard.edu/people/reagle/regulation-19990326.html#_Deprecation">
      it can be hard to deprecate pre-existing information</a>, as <a
      href="http://cyber.law.harvard.edu/people/reagle/inet-quotations-19990709.html">they
      say</a>: "You can't take something off the Internet - it's like taking pee out of
      a pool." However, one can make a new statement, "<a
      href="http://pgp.mit.edu:11371/pks/lookup?op=get&amp;search=0x3E111335">old
      key</a> is obsoleted by <a
      href="http://pgp.mit.edu:11371/pks/lookup?op=get&amp;search=0xDB2CAD7F">new
      key"</a>. The problem then is of ecology and economics. Would there be an
      incentive for the always evolving branches of the Semantic Web to gravitate
      towards this new statement? To give you an example, I recently wanted to
      determine if a vegetarian restaurant that I had heard of still existed. When I <a
      href="http://www.google.com/search?q=%22five+seasons%22+brookline">queried Google
      for "Five Seasons"</a>, the top returns were references to old pages describing
      what a great place it was. Only at the bottom of the listing did I find
      references to new restaurants that now occupied its location. Few people are
      going to bother to link to something that no longer exists! The same
      characteristic might pertain to the Semantic Web.
    </p>
    <p>
      However, there are possible solutions. Just as the W3C uses a "Latest Version"
      hyperlink within its specifications so that people can always find the latest
      version of that specification, one could do the same for trust statements. Many
      of the recent <a href="http://csrc.nist.gov/pki/PKImodels/">on-line certificate
      or built-in expiration certificate mechanisms</a> can be emulated: a statement
      may include properties that specify its duration or an on-line resource that must
      be used to determine if statement has been deprecated.
    </p>
    <h2>
      Finding Bacon's Key
    </h2>
    <p>
      To summarize how the application of my hypothesis might work, let's reconsider
      the problem of determining the authenticity and integrity of a statement from (an
      alleged) Kevin Bacon. Perhaps he said, "my latest movie is stupid." That would be
      an odd thing for an actor to say! So to confirm the statement I query a Semantic
      Web search engine and find a dense set of statements from otherwise reputable
      sources commenting on and confirming that statement.
    </p>
    <p>
      Still, the press tends to repeat the misinformation of their peers. Ironically
      enough, the fact that something is well known and commented on is
      <em>sometimes</em> a reason to distrust the information (e.g., urban myths).
      Fortunately, the statement has a digital signature. If I can find a key that I
      trust to be Kevin Bacon's, (independent of this latest Hollywood controversy),
      that validates the signature, I will be satisfied.
    </p>
    <p>
      Instead of validating the certificate chain of {MIT, DoE, DoC, Screen Actors
      Guild, and Kevin Bacon}, I query the Semantic Web for "Kevin Bacon PGP Key" and
      find a key that is <em>highly</em> inter-related. I can easily follow the source
      of references to that key to an official Web page, two large fan pages, and the
      <a href="http://us.imdb.com/Name?Bacon,+Kevin">Internet Movie Data Base</a>. And
      indeed, that key can be used to validate the disparaging statement! I now trust
      that Kevin Bacon made that statement. Is that trust perfect? No, but it's
      sufficient for my following of Hollywood gossip.
    </p>
    <p>
      (Out of curiousity, I do a similar query for the director Alan Smithee and find
      many poorly inter-related statements describing his filmography, and a few
      statements that <a
      href="http://www.salon.com/ent/feature/1998/10/09feature2.html">Alan Smithee is a
      Director/Writer Guild pseudonym</a>.)
    </p>
    <h2>
      Conclusion
    </h2>
    <p>
      It's easy to complain of complex public key infrastructures. It's also easy to
      wave one's hands about pie-in-the-sky solutions. In this paper, I do both with
      the excuse that I want my hypothesis to be easily understood.
    </p>
    <p>
      The ability to assume agents are always on-line is changing the way the security
      community thinks about digital trust. I want to push this assumption a little
      further: not only are security services and data objects on-line, but they
      identifiable via a <a href="">URI</a>, easily referenced and annotated with other
      statements, accessible in a widely deployed syntax (e.g., XML), and structured as
      filaments in the Semantic Web. This could lead to a dense web of information that
      is sufficient for providing one with confidence sufficient for decentralized
      (furthering PGP's approach) light-weight trust applications. Additionally, this
      can then be the foundation for hierarchical business and risk models (satisfying
      PKI's goal).
    </p>
    <hr />
    <p>
      last revised $Date: 2002/11/25 21:55:05 $ by $Author: reagle $
    </p>
  </body>
</html>