Meaning.html
25.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
The meaning of a document -- Axioms of Web architecture
</title>
<link rel="Stylesheet" href="di.css" type="text/css" />
<meta http-equiv="Content-Type" content=
"text/html; charset=us-ascii" />
</head>
<body bgcolor="#DDFFDD" text="#000000" lang="en" xml:lang="en">
<address>
Tim Berners-Lee<br />
Date: 1999, last change: $Date: 2009/08/27 21:38:08 $<br />
Status: personal view only. Editing status: first draft.
<em>Written partly when the Namespace argument came around
again and I realized that where there</em>
</address>
<p>
<a href="./">Up to Design Issues</a>
</p>
<h3>
Axioms of Web Architecture: the meaning of a document
</h3>
<p>
<em>Abstract: The meaning of a document is then the product
of some text in some language) and the meaning of the
language. The text is found in a document and the language
defined in a document called a schema.</em>
</p>
<hr />
<h1>
Meaning
</h1>
<p>
<em>Grounding the meaning of a document in URI space.</em>
</p>
<p>
What is the meaning of a document?
</p>
<p>
The meaning of a document on the Web can be defined more
precisely than an arbitrary paper document. Because we have
the benefit of a global namespace (URIs), things become
possible which were not before. One example is global
hypertext; another is the rigid (though rarely absolute)
specification of meaning. Just as a hypertext document can
now exactly point to another document when it makes a
reference (instead of making some vague natural language
reference to it), so can a formal document make a precise
reference to the language it uses.
</p>
<p>
A writer of a document uses the language to convey his intent
to the reader. It is essential that the intent of the writer
can be well defined for both parties and in general for a
third party.
</p>
<p>
The "<dfn>language</dfn>" here I means the set of symbols,
the syntactic rules which constrain their combination, and
some semantics which are conveyed by defining their
interpretation in one or more other formal language, or in
some natural language.
</p>
<table border="5">
<tbody>
<tr>
<td>
The meaning of a document is then the product of the
text of the document (in some language) and the meaning
of the language.
</td>
</tr>
</tbody>
</table>
<p>
On the Web, <a href="Axioms.html#Universality2">important
things are identified by URIs</a>. This should clearly apply
both to the document itself and to the language. The party
which defines what a URI refers to I call the publisher, or
owner of the URI. HTTP allows a delegated system of authority
for ownership (DNS) to define ownership of URIs, and it also
provides a network protocol to retrieve documents
representing that identified by the URI. The text a document
is defined by its publisher and the meaning of the language
is defined by the publisher of the language.
</p>
<p>
Natural languages are constantly evolving and rather vague,
in that no one (except <em>Scrabble</em> players) use a
particular dictionary as a definitive set of meanings. In
practice, the meaning of a word in a natural language is the
sum of the associations of that word -- logical or poetic --
in the mind of the reader or writer. Of course society works
on the basis of a very strong similarity of the webs of
association in different people's minds.
</p>
<p>
In the semantic web, however, meaning is not vague: the idea
is that languages must be defined formally and as precisely
as possible. The semantic web consists of some "terminal"
languages which are defined solely in natural language terms,
and some languages for which there are machine-readable
interpretations into other formal languages. Whereas programs
processing documents in the first sort of language will
typically have to be hand coded, documents in the second set
may be processed automatically to convert them into languages
in the first set.
</p>
<p>
URIs can be of various sorts, with various properties
depending on their scheme (and, for http URIs, the
publisher), but some URIs can be dereferenced to a definitive
document. The document resulting from dereferencing the URI
for a language is a place where the publisher of the language
can put definitive information about the meaning of a
language.
</p>
<h3>
<a name="Language1" id="Language1">Language and document
subsets</a>
</h3>
<p>
As languages evolve, there can be many languages which are
similar. "Similarity" doesn't mean much, but something which
is well defined is when a document in one language A can be
treated precisely as though it had been in another language
B.
</p>
<h3>
<a name="Meaning1" id="Meaning1">Meaning in XML</a>
</h3>
<p>
In XML, a language is a "namespace", and the document about
the language is called a "schema". In XML, one document can
contain a mixture of languages, and so the schema if written
in XML may contain information about syntactic constraints
(in XML-schema language) and/or RDF properties (in rdf-schema
language), or any combination of the above. (<a href=
"#Language">note</a>)
</p>
<p>
XML puts no constraints on a language apart from syntactic
structure. There is not (without RDF and logic or some other
higher level) any overall framework into which new languages
can be introduced. So, the question of <strong>what an XML
document means depends</strong> first upon the fully
qualified name of the <strong>document element</strong>. No
semantics can be attached to any of its descendents in the
document tree except in as much as is defined by the
specification of that element type in that namespace. One
cannot talk about the "meaning" of a subtree of a document
without understanding the semantics of the language. In fact,
because languages only necessarily define meaning for
documents, the only way one can talk about the meaning of a
subset of a document is to define a how those parts of the
document can be reassembled into a second whole document.
This is what must be done when a digital signature is applied
to a document.
</p>
<h3>
<a name="Meaning" id="Meaning">The Meaning of Digital
Signature</a>
</h3>
<p>
The language defines semantics. On the simple philosophy that
one place is enough, It is not the place of a digital
signature to define semantics. A digital signature on a
document may give a party reason to use the information
therein for purposes it would not have otherwise. The issuer
of a public key may also put constraints on what sort of
guarantees are made by signature with a given key. But the
signature itself must not affect the semantics - the meaning
- of a document. To allow it to would be to create an
inconsistency between the intent of the writer of the
original document and the meaning of the signed document. So,
signatures themselves have no meaning. The meaning has to be
ascibed to them by other documents. For example, I may say,
"If an organization is a member of W3C according to a
document signed with this key, then that organzation is
indeed a member". That is a trust statement which gives the
key a connection into the world of meaning of documents.
</p>
<h3>
<a name="Style" id="Style">Style as meaning</a>
</h3>
<p>
(Although few people would think of presentation style of a
document as its "meaning", and many of us spend a lot of time
emphasising the difference between style and content and
semantics, in fact much of what applies to style applies to
semantics. Therefore the "meaning in terms of presentation"
is a good test case for the architecture of the system. (For
many documentation systems, the only semantics required is
"H2 means a big bold block on the left"!) Style sheets
provide an "interpretation"of a document by mapping it onto
another well-defined language of formatting properties. The
style sheet language gives a good definition (in English) of
what is needed. This is an interesting comparison, and I
mention it as a place where architectural conssistency should
be maintained, but it isn't what I normally mean by
"meaning".)
</p>
<h3>
<a name="Logical" id="Logical">Logical meaning</a>
</h3>
<p>
When XML is used to encode logic, then a document is a
formula and the (see <a href="Logic.html">Logic on the
web</a>). Then, the way new predicates and constants interact
is defined by the logic. The way fundamental new parts of the
language (such as quantification) are added is part of a more
general question of how arbitrary languages interact.
Examples we have seen are the mixing of XHTML and XSL. What
is the result - XHTML or XSL? A document or a style sheet?
Both?
</p>
<h3>
<a name="Mixing" id="Mixing">Mixing Languages</a>
</h3>
<p>
XML puts no contarints on a language apart from syntactic
structure. There is not (without for example RDF and logic)
some overall framework into which new languages can be
introduced. This means that every language has to define how
it canbe extended by mixing with other languages. Typically
it will indicate the element types which can be subclassed by
extensions and therefore incorporated into documents wherever
that element type is allowed.
</p>
<p>
One particular example of such a type is common to almost all
languages. This is the sentence, the fully qualified
assertion or statement, the formula with no free variables.
Almost all whole documents count as such, though an
interesting counterexample is a style sheet which represent a
function: it specified the result document as a functin of an
input document, and so itself cannot be said to be a
stand-alone statement. (If I sent you a message consisting
only af a stylesheet with no coverletter, what would it
signify? What would it mean if I digitally signed it?)
</p>
<p>
With that exception, it clearly makes sense to allow any
language which has the concept of a sentence -- maybe any
language at all - to allow sentences from other languages to
be included anywhere where a sentence of its own could go.
<strong>This should be a generic feature of XML
schemas</strong>.
</p>
<p>
(It is would be against the minimalist principle for XML
generically to define other common subclasses. Note that the
RDF spec does define properties and node types and the
concept of subclassing in RDF. HTML defines things like block
and inline elements, which can be subclassed in extensions;
SVG and SMIL probably define similar concepts. The
significance of this when looking at downloaded support code
would be that, for example, in a set of Java classes
implementing HTML, that any subclass of "Inline element"
would export the same software API to allow it to be
justified and line wrapped in a text flow object. So there is
a natural correspondence between element type subclassing and
support class subclassing, but the tow must remain distinct.
Language specifications must always define what a language
means without refering to implementations if they can
possibly avoid it)
</p>
<p>
Note that without the assurances given by such information
you cannot just go around embedding one language in another.
Every language has to address the issue which the concept of
RDF transparency potentially solves for RDF. A surrounding
XML context must have the ability to quote, deny, negate or
whatever any element. In fact, nothing in XML says that the
menaing of a fragment is not affected by thing anywhere else
in a document. Nothing suggests that the process of removing
sub-trees creates a valid document. (How does xml fragment
deal with this?)
</p>
<h3>
<a name="Grounded" id="Grounded">Grounded documents</a>
</h3>
<p>
We can say a document is "grounded" if its meaning is
completely defined because every term used is explicitly,
directly or indirectly, an explicit direct or indirect
referece to its definition in a document on the Web. Clearly
a definition of "grounding" depends on the set of documents
one considers acceptable definitions. "Grounded in W3C
Recommendations" would imply that the closure under [i.e. set
of all the things you can possibly end up with by repeated
applications of] the operation of looking up definitions
would be a subset of the set of W3C recommendations.
</p>
<p>
This is the basis for the entire web and internet
architecture stack today. (See also: <a href=
"Stack.html">Stack</a>) . All commercial use on the web is
largely to be considered in this light, that the meaning of
each messaeg sent across the Internet is well-defined by a
series of specifications.
</p>
<p>
(A sense of grounding also can be appliyed seperately to
different sorts of "understanding". When "understanding"
means presentation to a human for human understanding, a
presentation-grounded documents points to all information
such as schemata and style sheets which will enable it to be
presented.)
</p>
<h3>
Grounding as a myth: the Web of Meaning
</h3>
<p>
The concept of grounded documents is important for
predicatble systems, but it is a bad model for the web -- or
for life -- in the long run. Words in a <em>natural</em>
langauge such as English are not grounded in a unique base
set<a href="#Grounding">*</a>. Every time you look one up in
the dictionary all you find are more words. The world is
web-like, and any attempt by the Web to constrain it to be
tree-like is bound to force a misrepresentation of realtity.
This is the Wittgenstein view of meaning. Understanding this
view sometimes confuses people about the very systematic way
in which meaning in Internet protocols is defined by layers
and layers of specs.
</p>
<p>
In fact, the two views both apply, one nested inside the
other. Yes, meaning is use - but in the Internet protocols,
society has set up social constraints - laws and other
expectations - which constrain use to be according to the
specs. This is a social constraint which your computer is
under when you use the Internet, just as when you fill out a
tax form you don't have a choice as to how to interpret the
meaning of "Adjusted Gross Income on line 39 of a US IRS form
1040". There is a whole department of the government which
defines what it is and which socially owns the term. So while
the
</p>
<p>
What will change with the Semantic Web's development is that
its grounding in legacy systems will fade into history. Right
now, the meaning of "Invoice total vale" is effectively
defined by the software which you plug your RDF document
into, and how it treats invoices. This is an important way to
bootstrap the semantic web with useful terms. That will
become less important as many different software poducts
share teh same term. In the end, it is weblike form which
will characterize the semantic web. Everyone will be defining
things in terms of other things which they feel are useful
and stable enough. It will be impossible to insist that there
be a global ordering between more basic and less basic
specifications -- and to do so would stop the web scaling. No
one will agree on a directed <em>acyclic</em> graph
determining what terms are "more basic" than others. For any
set of definitions in one direction, there can always be some
reverse definitions which can be seen by others as just as
valid.
</p>
<p>
So, while the concept of documents grounded in a given base
set is important for interoperability, it must not be seen as
a goal to force the semantic web into an acyclic structure.
There will be no single Dewy decimal system for the semantic
web. The concepts of well-defined stable specifications will
still be essential. So will respect for the definitions of
terms. The difference will be that any one will chose their
own set of langauges they consider "basic", and find ways of
defining other languages they come across in terms of those.
A rich web of conversions, translations will grow up to
support this. The web of trust will provdie tools for
navigating within and selecting from this web in a safe way.
And of course, global standarsdw il wlways make like much
easier where they can be made.
</p>
<h3>
FAQ: Surely meaning is only defined by use?
</h3>
<p>
<em>This is all very well</em>, runs a popular line,
<em>except that to talk about "meaning" at all is basically
bogus</em>. <em>The meaning of words, and therefore
languages, is defined by use - by how people actuall respond
to them, by how they are processed. Surely the only way I can
guarantee that someone will interpret a document in a
particular way is to have some out-of-band agreement with
them first?</em>
</p>
<p>
Philosophically, it is indeed the case that you need some
out-of-band (not in the message itself) agreement. In real
life, though, in fact there a lot of widely-held agreements.
In fact, the law is a set of agreements which you are deemed
to accept whether you formally agree or not. So when you are
sent a tax form, you can't argue that the language of the tax
form is not one you interpret in that way. they just stick
you in jail.
</p>
<p>
The web works like one big agreement. By connecting your
computer to it and getting email from POP and IMAP ports,
there is an understanding that what you get are MIME
messages, and the same thing when you pick up web page using
HTTP. So by using the web you are entering a world where the
assumption can be made that messages are to be interpreted by
a set of specifications. the specifications are (currently)
generally written in english, and imperfect, but basically
debate about them is practically about details, not aboutteh
philosophy as to whether they apply. So that is why one can
in practice talk about meaning.
</p>
<h3>
FAQ: Doesn't the meaning of a document depend on its context?
</h3>
<p>
Of course it does. If i exclose a phtocopy of a document as
an attachment, it doesn't mean I am sending you that letter.
</p>
<p>
However, theer are a lot of contexts for a document which
have the same implication for the meaning of that document.
Publication, by email to a public list, or HTTP, or FTP, or
printing on paper and nailing to a tree, in each case leaves
the meaning of a document defined in the same way. These
contexts, in which a document is published by a party, or a
message converyed from one party to another, are so common
and basic that the meaning of the document in these contexts
is referred to simply as the meaning of the document (or
message).
</p>
<p>
The webarchitecture separately enumerates the ways in which
these contexts actually work under he hood (publication using
HTTP, etc) and teh way documents are interpreted and dealt
with once published. That way, XML langauegs don't ahve to
keep referring to "meaning when received with a 200 code in
HTTP".
</p>
<hr />
<h2>
See also
</h2>
<ul>
<li>
<a href="Metadata.html#Self-descr">Self-describing
information in "Metadata"</a>
</li>
<li>
<a href="Evolution.html">Evolvability</a>
</li>
</ul>
<h2>
Footnotes
</h2>
<h3>
<a name="Name-less" id="Name-less">Name-less and Address-less
systems</a>
</h3>
<p>
(Technically, it is possible to create a network with
"source-based routing" in which everything whether server or
document is identified by an md5 checksum or other random
unique ID, and network nodes learn to send packets with full
routing instructions. This is a little like the old email
addresses which specified a routing path like
timbl@cernvax!mcvax!mitmail!whatever. The process of
hypertext link involves the client A contacting the server B
of the source document of the link and finding the path which
B had stored as a way to get from B to the server C of the
link's destination document. Then the client A can contact C
first through the root ABC but then from local information
and information from B and C can maybe derive a more
efficient route AC. Such a system has different scaling
properties as a subset of teh information about the network
must reside in the network hosts rather than in the routers.
Its efficeny and scaling properties rely on features of the
topology of the web such as locality of reference.)
</p>
<h3>
<a name="Language" id="Language">Language identity crisis in
XML</a>
</h3>
<p>
(There is currently (1999/9) much debate in the XML world
over exactly what defines a language, the proposed answers
ranging though: the publisher of the namespace including any
information in the definitive schema; a separate note of a
schema; a schema plus a different namsepcae URI document plus
a version plus an HTML profile; and "nothing". If this debate
resolves itself such that athe identity of a language is not
clearly defined. In that case the XML namespace mechanism may
prove an insufficiently firm foundation for the semantic web,
or any application of data on the web.)
</p>
<h3 id="Grounding">
Grounding of words in English
</h3>
<p>
(Distracton: Is there a set of english words in the OED
which, if understood, allow one to understand any definition
by sufficient recursive dereferencing?)
</p>
<h3>
References:
</h3>
<p>
DNS mess: Weaving the Web p126, etc.
</p>
<p>
<a href=
"http://www.ietf.org/mail-archive/ietf-announce/msg05299.html">
Carpenter, Brian, et. al , "IAB Technical Comment on the
Unique DNS Root", IETF-announce, 1999/9/27.</a>
</p>
<h2>
Fodder
</h2>
<p>
[@@ Dan's quote (Ted N?) about all things being hopelesly?
intertwigled@@ :-) .. maybe some Bhuddist quotation about
interconnectedness...]
</p>
<p>
"I'm very glad you asked me that, Mrs Rawlinson. The term
`holistic' refers to my conviction that what we are concerned
with here is the fundamental interconnectedness of all
things. I do not concern myself with such petty things as
fingerprint powder, telltale pieces of pocket fluff and inane
footprints. I see the solution to each problem as being
detectable in the pattern and web of the whole. The
connections between causes and effects are often much more
subtle and complex than we with our rough and ready
understanding of the physical world might naturally suppose,
Mrs Rawlinson. Let me give you an example. If you go to an
acupuncturist with toothache he sticks a needle instead into
your thigh. Do you know why he does that, Mrs Rawlinson? No,
neither do I, Mrs Rawlinson, but we intend to find out. A
pleasure talking to you, Mrs Rawlinson. Goodbye." -- Douglas
Adams, _Dirk Gentley's Holistic Detective Agency
</p>
<p>
<a href="http://www.xent.com/nov99/0596.html">quoted in
Fork</a>
</p>
<p>
@@ Statistiscs from OED
</p>
<p>
<a href=
"http://www.eastgate.com/ht99/slides/Welcome.htm">Mark
Bernstein, "Everything is intertwingled"</a>.Opening Keynote,
Hypertext '99, Darmstadt, Germany. February 23, 1999.
</p>
<hr />
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<p>
<a href="../People/Berners-Lee">Tim BL</a>
</p>
</body>
</html>