key-free-trust.html
26 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Linux/x86 (vers 1st March 2002), see www.w3.org" />
<title>
Key Free Trust in the Semantic Web
</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<link href="http://www.w3.org/StyleSheets/base.css" rel="stylesheet"
type="text/css" />
<style type="text/css">
<!--
body {margin: 0; padding: 0; color: black; background: white;}
h1, h2, h3, h4, h5, p, pre, table, div, ol, ul, dl, dt, dd
{ margin-left: 8%; margin-right: 22%; padding:0;}
em {font-weight: normal; font-style: italic;}
u,ins,.ins { background: white; color: red;}
del,strike,.strike { background: white; color: silver; text-decoration: line-through;}
code {font-weight: normal; }
.def { background: #FFFFFF; font-weight: bold}
.link-sec { font-style: italic;}
.link-def { background: #FFFFFF; color: teal; font-style: italic;}
.comment { background: #FFFFF5; color: black; padding: .7em; border:
navy thin solid;}
.discuss { color: blue; background: yellow; }
.xml-example,.xml-dtd { margin-left: -1em; padding: .5em; white-space:
pre; border: none;}
.xml-dtd { background: #efeff8; color: black;}
-->
</style>
</head>
<body xml:lang="en" lang="en">
<h1>
Finding Bacon's Key
</h1>
<h2>
Does Google Show How the Semantic Web Could Replace Public Key Infrastructure?
</h2>
<p>
Joseph M. Reagle Jr., <reagle@w3.org>
</p>
<h2>
Abstract
</h2>
<p>
This document briefly introduces the topic of trusted semantic web applications
that do not require the existence of an complex public key infrastructure. It
derives from a discussion with Tim Berners-Lee, has been improved given comments
from folks in this <a
href="http://lists.w3.org/Archives/Public/www-rdf-interest/2002Apr/0016.html">thread</a>,
but I'm solely responsible for any errors.
</p>
<h2>
Trust
</h2>
<p>
The question of <a
href="http://www.firstmonday.dk/issues/issue2/markets/#wit">what is trust</a> has
been the subject of many a graduate thesis. For simplicity's sake I will rely
upon the following definition:
</p>
<dl>
<dt>
Trust (worthiness)
</dt>
<dd>
The degree to which an agent considers an assertion to be true for a given
context. While the term "trust" is often used to denote a very high degree of
confidence, there is an associated risk of the assertions being wrong.
</dd>
</dl>
<p>
In traditional cryptographic applications the trust in a statement is
commensurate with the trust in the reputation of its author via a cryptographic
binding. This assurance is accomplished via a digital signature which requires
that:
</p>
<ol>
<li>
A cryptographic key be strongly bound to a statement via a digital signature
algorithm.
</li>
<li>
Only the specific person has access to the given key.
</li>
</ol>
<p>
Consequently the following properties are ensured: authenticity (the trust in the
person, who keeps their key private, is extended to the binding between the key
and statement), integrity (any change to the key or statement will result in a
different signature), and sometimes non-repudiation (if they key is indeed unique
to the control of the person, then the person can not deny the binding because
how else would it have been created?). Frequently, this cryptographic binding is
associated with a semantic such as "I believe", "I assert", "It is true", or "I
notarize". (I tend to think in the semantic "I believe", however one can often
cast a semantic of one type as another: "I believe 'I notarize this was presented
to me on this date and time.'")
</p>
<h2>
Key Based Trust
</h2>
<p>
How is this cryptographic digital signature created such that it has these
properties? Public key algorithms are based on <a
href="http://www.x5.net/faqs/crypto/q7.html">trap-door one-way functions</a>:
</p>
<blockquote>
<p>
"The public key gives information about the particular instance of the
function; the private key gives information about the trap door. Whoever knows
the trap door can perform the function easily in both directions, but anyone
lacking the trap door can perform the function only in the forward direction.
The forward direction is used for encryption and signature verification; the
inverse direction is used for decryption and signature generation." — <a
href="http://www.x5.net/faqs/crypto/q7.html">Cryptography FAQ</a>.
</p>
</blockquote>
<p>
Consequently, a single person (and only that person) can bind their key to a
statement that anyone else, posessing the public key, can confirm! This is
brilliant, but of course the problem of this scenario is that when I want to
confirm <a
href="http://backissues.worldlink.co.uk/articles/250100180310/22.htm">Kevin
Bacon's</a> signature, how do I know I posess his <em>real</em> public key? On
the Internet today there are many cryptographic keys out there purporting to
belong to famous people. There may even be some cryptographically signed
documents that can be confirmed to be bound to one of those Kevin Bacon keys, but
did the <em>real</em> Bacon sign that document? Probably not. How is this problem
addressed? Not easily.
</p>
<p>
The two common approaches to finding the right public key are:
</p>
<ol>
<li>
Public Key Infrastructure (<strong>PKI</strong>) typically entails a
hierarchically organized infrastructure for organizing trust relationships. For
example, what if I wanted to confirm a key I found? When I was hired by MIT I
was given a floppy disk with MIT's public key. I trust this. This key also
signed other keys (i.e., a <strong>certificate</strong>) such that I can
transitively trust those keys as well. When I find a key purporting to belong
to Kevin Bacon I note that it is signed by the Actors' Guild. Of course, how do
I know that's the real Actors' Guild? Fortunately, the MIT key has signed the
Department of Education (DoE) key, which signed the Department of Commerce's
key (DoC), which signed the Actors' Guild key. If I successfully verify all the
signatures on these certificates (i.e., a <strong>certificate chain</strong>) I
can be confident I have Kevin Bacon's public key! I can then use that to
confirm if the document was signed by Kevin Bacon.
</li>
<li>
<a href="http://www.rubin.ch/pgp/weboftrust.en.html">Web of Trust</a> was
popularized by the PGP privacy application and uses a similar transitive trust
model as PKI, but without the heirarchical structure. Instead, it is informal
and decentralized. Typically, when users of PGP meet together at conferences
they have key signing parties where they can easily and personally identify
each other and add the appropriate signatures to each others' keys. If I'm not
sure that the key I found is really Kevin Bacon's, perhaps I know someone, who
knows someone, (through <a href="http://smallworld.sociology.columbia.edu/">six
degrees</a>), that does!
</li>
</ol>
<h2>
Preponderance Based Trust
</h2>
<p>
If public key infrastructures, transitive trust, and certificate chaining sounds
complex, it is! First, the infrastructure or density of the web of certificates
must be sufficient to be able to confirm keys. Second, extended trust
relationships can be nonintuitive to humans. We like immediate and intuitive
reasons for trust. While the infrastructural mechanism can address institutional
requirements for liability, they don't appeal to us viscerally. Fortunately, PGP
offered an even simpler method of engendering confidence in a key without the
need for other signatures: <strong>fingerprints</strong>!
</p>
<p>
A critical concept to cryptography is that of a digest value (hash result or
fingerprint):
</p>
<blockquote>
<p>
"A (mathematical) function which maps values from a large (possibly very large)
domain into a smaller range. A 'good' hash function is such that the results of
applying the function to a (large) set of values in the domain will be evenly
distributed (and apparently at random) over the range." [X509] ... A
cryptographic hash is "good" ... [when] any change to an input data object
will, with high probability, result in a different hash result, so that the
result of a cryptographic hash makes a good checksum for a data object."
— <a href="http://www.ietf.org/rfc/rfc2828.txt">RFC2828.</a>
</p>
</blockquote>
<p>
Strong hash functions are almost always used with a digital signature algorithm
because it can be computationally expensive to perform a cryptographic signature
on the whole of a document. Instead, one can take its digest value and sign
<em>that</em> instead. Integrity is maintained because any alteration to the
document will yield a different digest value (and consequent signature) and it's
<em>very</em> difficult to find another document which hashes to the same
value.
</p>
<p>
However, digests are useful independent of a signature. The nifty PGP fingerprint
feature enabled a person to leave the finger print (the digest value) of their
public key all over the Internet. Consequently, if I find a purported Bacon key
and generate the fingerprint and then find postings on the Internet about or by
Bacon with that fingerprint, my confidence that I possess his real key can be
very high. Personally, I'd intuitively trust a key and its fingerprint, if I
found the fingerprint on his official web page, repeated on his fans' pages, and
included in his posting to a celebrity mailing list.
</p>
<p>
You can find a <a
href="http://groups.google.com/groups?q=Reagle+++++E0+D5+B2+05+B6+12+DA+65++BE+4D+E3+C1+6A+66+25+4E&hl=en&lr=lang_en&scoring=r&selm=%24m2n1705-.3.0.32.19961213001747.00937d30%40rpcp.mit.edu&rnum=4">
PGP fingerprint from me on Usenet back in 1996</a>! How do you know it is not an
imposter? It's improbable: the desire to impersonate me <em>now</em> would have
required a determined effort to post messages that sound as if they've been
written by me, (otherwise they'd be identified as fraudulent), for the past six
years!
</p>
<h2>
The Semantic Web
</h2>
<p>
The Semantic Web envisions a web of machine processable information in the form
of statements:
</p>
<blockquote>
<p>
"The Semantic Web will bring structure to the meaningful content of Web pages,
creating an environment where software agents roaming from page to page can
readily carry out sophisticated tasks for users... The Semantic Web is not a
separate Web but an extension of the current one, in which information is given
well-defined meaning, better enabling computers and people to work in
cooperation." — <a
href="http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html">The
Semantic Web</a>.
</p>
</blockquote>
<p>
On the Web today we have a rich interconnection of the rather stupid hyperlinks
(i.e. "go there") between Web pages; these pages are in many natural language
(e.g., English, Japanese) and include descriptions that are useful only to
humans, but not our computers. Even so, the popular Web search service <a
href="http://www.google.com/technology/">Google</a> is able to make great use of
these simple interconnections of hyperlinks to help us find words.
</p>
<blockquote>
<p>
"Google bridges the divide between human-generated indexes and
machine-generated analysis. Y'see, the Web is full of people like you and me,
making links between documents; human beings, making decisions about documents,
voting with their links. When I link to some arbitrary document, it's an
indication that I think that it's in some way authoritative. When you link to a
document I wrote, you're indicating that I'm in some way authoritative. The
Internet is already structured in a meaningful way, but that structure is
obscured. Google teases out the relationship between the URLs, examining the
webs of authority: this person is linked to by 50,000 others, and he links to
this other person over here, which indicates that person one is a pretty sharp
individual, one who's inspired 50,000 human beings to take time out of their
busy schedules to link to him; and person one thinks that person two is on the
ball, which suggests that person two knows what she's on about." — <a
href="http://www.oreillynet.com/lpt/a/network/2002/03/08/cory_google.html">How
I Learned to Stop Worrying and Love the Panopticon</a>
</p>
</blockquote>
<p>
What if we had more than simple hyperlinks, but those words that describe things
like contact information, schedules, interests, and relationships were also
interconnected and easily processed by computers? And, just like in Google,
statements about a "sharp" person endorsing a person "on the ball" can be made or
inferred? Not only would search engines become more accurate, but we could have
our programs easily organize the abundance of information available to us now but
hidden in a babble of inconsistent formats. (For example, I typically enter the
flight itineraries I receive in email into a Web form and my PDA, this is
redundant!)
</p>
<h2>
The Semantic Web and Trust
</h2>
<p>
Not surprisingly, most security applications that entail authorization, access
control, or any trust application are statements. The Semantic Web has tools that
allow one to make simple statements of the form: "X has the property Y".
Computers can then help humans by making use of that information: "Find all pages
with property Y".
</p>
<p>
Imagine two projects at MIT: one is working on giving credentials to employees
such that "Joseph is an employee of MIT", another is determining who has access
to the online reference materials, "MIT Employees can access library services."
Given the nature of large institutions one might not be surprised to find those
working on these two projects know nothing of each other. But, if they are
written using semantic web tools we need not worry about system
incompatibilities. The development of an application to determine "Joseph has
read access to MIT library services" is natural to the technology: neither system
needs to be re-architected, one simply writes a few rules!
</p>
<p>
Additionally, one of the strengths of the Semantic Web is that one can make
statements about statements. One of the simplest statements one can make about
another statement is: "Statement X has the fingerprint: Z".
</p>
<h2>
Key Free Trust in the Semantic Web
</h2>
<p>
I've written about 1600 words so far to identify concepts that I will use towards
a simple hypothesis. Those concepts are:
</p>
<ol>
<li>
Trust is one's confidence in a statement.
</li>
<li>
Cryptographic signatures permits one to associate a level of trust in a
statement (represented in digital form) akin to that of the reputation of its
author/key.
</li>
<li>
It's hard to know if one has the real public key of someone else.
</li>
<li>
The Semantic Web can be a rich, decentralized, archived, and interconnected
source of machine processable statements. Many of those statements will relate
to the identity, relations, capabilities, and authorizations of agents (human
or computer).
</li>
<li>
Cryptography need not be the only basis by which we evaluate trust. I've shown
that the preponderance of a fingerprint or a link to a site permits a
relatively high level of confidence in the owner of the key (i.e., PGP
fingerprints) or relevance of a site (i.e., Google).
</li>
<li>
The Semantic Web permits statements that describe the digest value of and
trust-worthiness of other statements; it will be permeated with annotated
fingerprints.
</li>
</ol>
<dl>
<dt>
My hypothesis:
</dt>
<dd>
The pervasive use of digest values to identify the statements in the Semantic
Web will engender a preponderance of evidence for trust <em>without</em>
cryptography.<br />
<br />
</dd>
</dl>
<p>
There is a major and minor consequent of this hypothesis. The major consequent is
that complex public key infrastructure may not be necessary. Instead, the
Semantic Web can mirror the informal and decentralized character of PGP's Web of
Trust with some improvements: it is available on and inter-related to the rest of
the Web, redundantly archived, harvested and processed by roving agents and
engines that can trivially repurpose it or offer other value added services. For
example, institutions that demand liability assurances can easily build
applications: "The cost of these statements is $4." and "I will pay $40,000 if
these statements are incorrect." The minor consequent is the cryptographic
signatures themselves might not be necessary to make a reasonable trust
evaluation about a statement that has had time to grow into the tangled root
structure of the Web. One might be willing to rely upon information if there is a
dense set of inter-related statements of the form: "the information with this
digest value (Z), was trustworthy for my purposes."
</p>
<p>
Of course, the presence of a digital signature (beyond the simple digest value)
would increase one's confidence and the signature itself is a relatively
inexpensive operation. So the true import would be the simplification of a
mechanism for obtaining keys. It can be simple, bottom-up, and decentralized; if
need be, decentralized and extensible systems can simulate hierarchical and
closed systems much easier than vice-versa.
</p>
<h2>
Gaming the System
</h2>
<p>
How secure would this system be? Nothing is perfectly secure. People can work for
years to build a reputation such that they can cheat once and gain more in that
single act than their reputation is otherwise worth. Or, a community can band
together to discredit the reputation of someone they dislike. This has nothing to
do with cryptography, but human nature and game theory.
</p>
<p>
Recently, a businessman who owned a real store and who also had many loyal Web
customers disappeared. <a
href="http://www.msnbc.com/local/wdiv/a1085793.asp?cp1=1">Stewart Richardson</a>
built up a solid reputation with many rave reviews on the eBay auction site; he
was known for the timely completion of Web transactions. Now he is gone and so is
the $200,000 from his most recent auction.
</p>
<p>
Last year, tens of thousands people banded together for some amusing political
antics: if you asked Google for a "dumb motherfucker", during the last
presidential election the George W. Bush Presidential Campaign On-Line Store was
the top return. Interestingly, the system corrected itself and the <a
href="http://www.google.com/search?q=dumb+motherfucker">search term now returns
articles about the phenomona</a>, which seems entirely appropriate!
</p>
<h2>
Revocation
</h2>
<p>
Security applications often require a mechanism to <a
href="http://csrc.nist.gov/pki/PKImodels/">revoke</a> a previous "statement." For
example, when I no longer consider my 1024 bit <a
href="http://pgp.mit.edu:11371/pks/lookup?search=reagle&op=index&fingerprint=on&exact=on">
key</a> to be strong enough, how do I uproot this statement from the Semantic Web
and replace it with my new key? As I've written elsewhere, <a
href="http://cyber.law.harvard.edu/people/reagle/regulation-19990326.html#_Deprecation">
it can be hard to deprecate pre-existing information</a>, as <a
href="http://cyber.law.harvard.edu/people/reagle/inet-quotations-19990709.html">they
say</a>: "You can't take something off the Internet - it's like taking pee out of
a pool." However, one can make a new statement, "<a
href="http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x3E111335">old
key</a> is obsoleted by <a
href="http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xDB2CAD7F">new
key"</a>. The problem then is of ecology and economics. Would there be an
incentive for the always evolving branches of the Semantic Web to gravitate
towards this new statement? To give you an example, I recently wanted to
determine if a vegetarian restaurant that I had heard of still existed. When I <a
href="http://www.google.com/search?q=%22five+seasons%22+brookline">queried Google
for "Five Seasons"</a>, the top returns were references to old pages describing
what a great place it was. Only at the bottom of the listing did I find
references to new restaurants that now occupied its location. Few people are
going to bother to link to something that no longer exists! The same
characteristic might pertain to the Semantic Web.
</p>
<p>
However, there are possible solutions. Just as the W3C uses a "Latest Version"
hyperlink within its specifications so that people can always find the latest
version of that specification, one could do the same for trust statements. Many
of the recent <a href="http://csrc.nist.gov/pki/PKImodels/">on-line certificate
or built-in expiration certificate mechanisms</a> can be emulated: a statement
may include properties that specify its duration or an on-line resource that must
be used to determine if statement has been deprecated.
</p>
<h2>
Finding Bacon's Key
</h2>
<p>
To summarize how the application of my hypothesis might work, let's reconsider
the problem of determining the authenticity and integrity of a statement from (an
alleged) Kevin Bacon. Perhaps he said, "my latest movie is stupid." That would be
an odd thing for an actor to say! So to confirm the statement I query a Semantic
Web search engine and find a dense set of statements from otherwise reputable
sources commenting on and confirming that statement.
</p>
<p>
Still, the press tends to repeat the misinformation of their peers. Ironically
enough, the fact that something is well known and commented on is
<em>sometimes</em> a reason to distrust the information (e.g., urban myths).
Fortunately, the statement has a digital signature. If I can find a key that I
trust to be Kevin Bacon's, (independent of this latest Hollywood controversy),
that validates the signature, I will be satisfied.
</p>
<p>
Instead of validating the certificate chain of {MIT, DoE, DoC, Screen Actors
Guild, and Kevin Bacon}, I query the Semantic Web for "Kevin Bacon PGP Key" and
find a key that is <em>highly</em> inter-related. I can easily follow the source
of references to that key to an official Web page, two large fan pages, and the
<a href="http://us.imdb.com/Name?Bacon,+Kevin">Internet Movie Data Base</a>. And
indeed, that key can be used to validate the disparaging statement! I now trust
that Kevin Bacon made that statement. Is that trust perfect? No, but it's
sufficient for my following of Hollywood gossip.
</p>
<p>
(Out of curiousity, I do a similar query for the director Alan Smithee and find
many poorly inter-related statements describing his filmography, and a few
statements that <a
href="http://www.salon.com/ent/feature/1998/10/09feature2.html">Alan Smithee is a
Director/Writer Guild pseudonym</a>.)
</p>
<h2>
Conclusion
</h2>
<p>
It's easy to complain of complex public key infrastructures. It's also easy to
wave one's hands about pie-in-the-sky solutions. In this paper, I do both with
the excuse that I want my hypothesis to be easily understood.
</p>
<p>
The ability to assume agents are always on-line is changing the way the security
community thinks about digital trust. I want to push this assumption a little
further: not only are security services and data objects on-line, but they
identifiable via a <a href="">URI</a>, easily referenced and annotated with other
statements, accessible in a widely deployed syntax (e.g., XML), and structured as
filaments in the Semantic Web. This could lead to a dense web of information that
is sufficient for providing one with confidence sufficient for decentralized
(furthering PGP's approach) light-weight trust applications. Additionally, this
can then be the foundation for hierarchical business and risk models (satisfying
PKI's goal).
</p>
<hr />
<p>
last revised $Date: 2002/11/25 21:55:05 $ by $Author: reagle $
</p>
</body>
</html>