Metadata
33.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content=
"HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 13), see www.w3.org" />
<title>
Web architecture: Metadata
</title>
<link href="di.css" rel="stylesheet" type="text/css" />
<meta http-equiv="Content-Type" content="text/html" />
</head>
<body bgcolor="#DDFFDD" text="#000000">
<address>
Tim Berners-Lee
<p>
Date started: January 6, 1997
</p>
<p>
. Status: personal view, but corresponds generally to
the W3C architecture for metadata.
</p>
<p>
.
</p>
<p>
Additions are at the end about consistency in
label/metaset/collection syntax and semantics.
</p>
<p>
The syntaxes used in this document are meant to illustrate
the architecture and be clear but are otherwise random.
This note was written before the more general <a href=
"Semantic.html">Semantic Web</a> note.
</p>
</address>
<p>
<a href="Overview.html">Up to Design Issues</a>
</p>
<h3>
Axioms of Web Architecture: Metadata
</h3>
<hr />
<h1>
Metadata Architecture
</h1>
<h4 id="Preface">
Preface
</h4>
<p>
<em>This document was written before the Semantic Web
Roadmap, but is an introduction to the same ideas. Both
introduce the world of machine-readable data on the web. This
document introduces the concepts in the historical sequence
at W3C, where the first driving applications of semantic web
were metadat, and the first driving metadata applications
were endorsement labels (<a href="#PICS">PICS</a>)</em>.
</p>
<h2>
Documents, Metadata, and Links<br />
</h2>
<p>
The thing which you get when you follow a link, when you
de-reference a URI, has a lot of names. Formally we call it a
<b>resource</b>. Sometimes it is referred to as a document
because many of the things currently on the Web are human
readable documents. Sometimes it is referred to as an object
when the object is something which is more machine readable
in nature or has hidden state. I will use the words document
and resource interchangeably in what follows and sometimes
may slip into using "object".
</p>
<p>
One of the characteristics of the World Wide Web is that
resources, when you retrieve them, do not stand simply by
themselves without explanation, but there is information
about the resource. Information about information is
generally known as <b>Metadata</b>. Specifically, in the web
design,
</p>
<h4>
Definition
</h4>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
Metadata is machine understandable information about
web resources or other things
</td>
</tr>
</tbody>
</table>
<p>
The phrase "machine understandable" is key. We are
talking here about information which software agents can use
in order to make life easier for us, ensure we obey our
principles, the law, check that we can trust what we are
doing, and make everything work more smoothly and rapidly.
Metadata has well defined semantics and structure.
</p>
<p>
Metadata was called "Metadata" because it started life, and
is currently still chiefly, information about web resources,
so data about data. In the future, when the metadata
languages and engines are more developed, it should also form
a strong basis for a web of machine understandable
information about anything: about the people, things,
concepts and ideas. We keep this fact in our minds in
the design, even though the first step is to make a system
for information about information.
</p>
<p>
For an example of metadata, when an object is retrieved using
the HTTP protocol, the protocol allows information about its
date, its expiry date, its owner, and other arbitrary
information to be sent by the server. The world of the World
Wide Web is therefore a world of information and some of that
information is information about information. In order to
have a coherent picture of this, we need a few axioms about
metadata. The first axiom is that :
</p>
<h4>
Axiom
</h4>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
metadata is data.
</td>
</tr>
</tbody>
</table>
<p>
That is to say, information about information is to be
counted in all respects as information. There are various
parts of this.
</p>
<p>
One is that metadata can be stored regarded as data, it can
be stored in a resource. So, one resource may contain
information about itself or about another resource. In
current practice on the World Wide Web there are three ways
in which one gets metadata. The first is the data about a
document contained within the document itself, for example in
the HEAD part of an HTML documents or within word processor
documents. The second is that during the HTTP transfer the
server transfers some metadata to the client about the object
which is being transferred. This, during an http GET, is
transferred from the server to the client and, during a PUT
or a POST, is transferred from the client to the server. One
of the things which we have to rationalize in our
architecture of the World Wide Web is who exactly is making
the statement. Whose statement, whose property is that
metadata. The third way in which metadata is found is when it
is looked up in another document. This practice has not been
very common until the PICS initiative was to define label
formats specifically for representing information about World
Wide Web resources. The PICS architecture specifically allows
for PICS labels which are resources about other resources to
be buried within the resource itself, to be retrieved as
separate resources, or to be passed over during the http
transaction. To conclude,
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
Metadata about one document can occur within the
document, or within a separate document, or it may be
transferred accompanying the document.<br />
</td>
</tr>
</tbody>
</table>
<p>
Put another way, metadata can be a first class object.
</p>
<p>
The second part of the above axiom is:
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
Metadata can describe metadata
</td>
</tr>
</tbody>
</table>
<p>
That is, metadata itself may have attributes such as
ownership and an expiry date, and so there is meta-metadata
but we don't distinguish many levels, we just say that
metadata is data and that from that it follows that it can
have other data about itself. This gives the Web a certain
consistency.
</p>
<h2>
The Form of Metadata<br />
</h2>
<p>
Metadata consists of assertions about data, and such
assertions typically, when represented in computer systems,
take the form of a name or type of assertion and a set of
parameters, just as in the natural language a sentence takes
the form of a verb and a subject, an object and various
clauses.
</p>
<h4>
<a name="independent" id="independent">Axiom</a>
</h4>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
The architecture is of metadata represented as a set of
independent assertions.
</td>
</tr>
</tbody>
</table>
<p>
This model implies that in general, two assertions about the
same resource can stand alone and independently. When they
are grouped together in one place, the combined assertion is
simply the sum (actually the logical AND) of the independent
ones. Therefore (because AND is commutative) collections of
assertions are essentially unordered sets. This design
decision rules out for example, in simple sets of data,
assertions which are somehow cumulative or later ones
override earlier ones. Each assertion stands independently of
others.
</p>
<p>
We will see below how logical expressions are formed to
combine assertions in more varied ways, and syntactic rules
which allow the subject at least of the assertion to be made
implicit. But neither of these change the basic operation of
combining assertions in unordered AND lists.
</p>
<h3>
<a name="Attributes" id="Attributes">Attributes</a>
</h3>
<p>
Assertions about resources are often referred to as
attributes of the resource. That is, the type of assertion is
an assertion that the object, the resource in question, has a
particular named property such as it's author, and in that
case the parameter is the name or identity of the author.
Similarly, if the attribute is the document's date of expiry
then the parameter is that date.
</p>
<p>
Often, a group of assertions about the same resource occur
together, in which case the syntax generally omits the URI of
that resource as it is implicit. In these cases, when it is
clear from the context about which resource the assertion is
being made, the assertion often takes the form of a list of
attributes and values. In RFC822 format messages, such as
mail messages and HTTP messages, metadata is transferred
where the attribute name is an RFC822 header name and the
rest of the RFC822 line is the value of the attribute, such
as Date: and From: and To: information. The attribute value
pair model is that used by most activities defining the
semantics of metadata today.<br />
</p>
<p>
I use the word "assertion" to emphasize the fact that the
attribute value pair when it is transferred is a statement
made by some party. It does not simply and directly imply
that the resource at any given time has that value for the
given attribute. It must be seen as a statement by a
particular party with or without implicit or explicit
guarantees as to validity. Throughout the World Wide Web, as
trust becomes an important issue, it will be important for
software -- and people -- to keep track of and take into
account who said what in terms of data and metadata. So, our
model of data of a resource is something about which
typically we know the creator or the person responsible, and
typically the date of which the information was created,
which implies, in the case of a piece of information which
makes an assertion, the date at which the assertion was made.
</p>
<p>
An assertion
</p>
<blockquote>
(A u1, p, q...)
</blockquote>
<p>
typically has as explicit parameters,
</p>
<ul>
<li>the URI of the resource about which the assertion is made
(u1).
</li>
<li>some identifier (A) for the type of assertion being made,
such as author or date or expiry date.
</li>
<li>other parameters (p, q,...) according to the type of
assertion.
</li>
</ul>
<p>
As implicit or explicit or implicit parameters,
</p>
<ul>
<li>The party making the assertion
</li>
<li>The date/time of the assertion
</li>
<li>etc...
</li>
</ul>
<p>
We can often make an analogy with programming languages. An
assertion in metadata can be compared with a function call in
a programing language. In object oriented languages, the
object of the function has a special place among the
parameters just as the subject of an assertion does in
metadata. In object oriented languages, though, the set of
possible functions depends on the object, whereas in metadata
the set of assertion types is more or less unlimited, defined
by independent choice of vocabulary. <em>Anyone can say
anything about anything</em>.
</p>
<h3>
A space for attribute names
</h3>
<p>
It is appropriate for the Web architecture to define like
this the topology and the general concepts of links and
metadata. What about the significance of individual
relationships? Sometimes, as above, these are special,
defined in the architecture, and having an architectural
significance or a significance to the protocols. In other
cases, the significance of relationships or indeed of
attributes is part of other specifications, other design, or
other applications, and must be defined easily by third
parties. Therefore, the set of such relationship and
attributes names must be extremely easily extensible and
therefore extensible in a decentralized manner. This is why
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
the URL space is an appropriate space for the
definition of attribute names.
</td>
</tr>
</tbody>
</table>
<p>
We have already (1997) several vocabularies of attribute
names: for example, the HTML elements which can occur within
the HEAD element, or as another example, the headers in an
HTTP request which specify attributes of the object. These
are defined within the scope of particular specifications.
There is always pressure to extend these specifications in a
flexible way. HTTP header names are generally extended
arbitrarily by those doing experiments. The same can also be
true of HTML elements and extension mechanisms have been
proposed for both. If we look generically at the very wide
space of all such metadata attribute names, we find something
in which the dictionary would be so large that ad hoc
arbitrary extension would be just as chaotic as central
registration would be stifling.
</p>
<blockquote>
<b>Aside: Comparison with Entity-Relationship models</b>.
<p>
This architecture, in which the assertion identifier is
taken from (basically) URL space differs from the
"Entity-relationship" (ER) model and many similar models
like it, including most object-oriented programming
systems. In an ER model, typically every object is typed
and the type of an object defines the attributes can have,
and therefore the assertions which are being made about it.
Once a person is defined as having a name, address and
phone number, then the schema has to be altered or a new
derived type of person must be introduced before one can
make assertions about the race, color or credit card number
of a person. The scope of the attribute name is the entity
type, just as in OOP the scope of a method name is an
object type (or interface)By contrast, in the web, the
hypertext link allows statements of new forms to be made
about any object, even though (before anything other than
syntax checking) this may lead to nonsense or paradox. One
can define a property "coolness" within one's own part of
the web, and then make statements about the "coolness" of
any object on the web.
</p>
<p>
This design difference is in essence a resurfacing of the
decision to make links mondirectional, sacrificing
consistency for scalability.
</p>
<p>
An advantage of ER systems is that they allow one to work,
in the user interface for example, with a set of properties
which "should" be defined for each entity. You can define
these in the Metadata's predicate calculus by defining an
expression for a "well specified" object. ("For all
<i>X</i> such that <i>X</i> is a customer <i>X</i> is
well-specified if there exists <i>n</i> such that <i>n</i>
is the name of <i>X</i> and there exists <i>t</i> such that
<i>t</i> is the telephone number of <i>X</i> and...)
</p>
<p>
end of aside.
</p>
</blockquote>
<h3>
<a name="MetadataHeaders" id="MetadataHeaders">Metadata
("Entity") headers in HTTP</a>
</h3>
<p>
In the above it is important to realize that the HTTP headers
which contain what can be considered as metadata ("entity
headers") should be separated quite distinctly from HTTP
headers which do not. HTTP headers which contain metadata
contain information which can follow the document around. For
example, it is reasonable for a cache to pass such
information on without treatment, it is reasonable for
clients or other programs which process data to store those
headers as metadata with the document for later processing.
The content of those headers do not have to be associated
with that particular HTTP transaction. By contrast, the
RFC822 headers in HTTP which deal specifically with the
transaction or deal specifically with the TCP link between
the two application programs have a shorter scope and can
only be regarded as parameters of the HTTP method. To make
this separation clear will be to make it easier not only to
understand HTTP and how it should be processed, it will also
make it clear which pieces of HTTP can be used easily and
transparently by other protocols which may use different
methods with different parameters. The clarification of the
architecture of HTTP such that both the metadata and the
methods can be extended into other domains is an important
part of the work of the World Wide Web Consortium. The
Internet protocols SMTP and NNTP and HTTP as well as many new
and proposed protocols share much of the semantics of the
RFC822 headers. Formalizing the shared space and making it
clear that there is a single design for a particular header,
rather than four designs which are independent and happen to
look very similar, requires a general architecture, some
careful thought, and is essential for the future design of
protocols. It will allow protocol design to happen in small
groups which can take for granted the bulk of previous work
and concentrate on independent new design.
</p>
<h4>
Authorship of HTTP entity headers
</h4>
<p>
It may be possible to remove or at least encompass the
apparent anomaly of metadata transferred from an HTTP server
by creating a special link type which links the document
itself to the set of attributes which the server would give
in the HTTP headers. In other words, the server would be able
to say, "here is a document, here is some metadata about it,
and the metadata about it has the following URL". This would
allow one, for example, request a signed copy of the HTTP
headers. It would allow one to ask about the intellectual
property rights of those headers, and the authorship of those
headers.
</p>
<p>
It is important to be completely clear about the authorship
of the HTTP headers. The server should be seen as a software
agent acting on behalf of a party which is the publisher or
document author: the definer of the URI to resource identity
mapping. The webmaster is only an administrator who is
responsible for ensuing that (through an appropriately
configured server) the transactions on the wire faithfully
represent the statements and wishes of that party.
</p>
<h2>
Links<br />
</h2>
<p>
An assertion of relationship between two resources is known
as a <b>link</b>.
</p>
<p>
In this case, it is a triple
</p>
<blockquote>
(<i>A u1 u2</i>)
</blockquote>
<p>
of:
</p>
<ul>
<li>the type of assertion being made, that is, the
relationship which is being asserted,
</li>
<li>the first URI,
</li>
<li>and the second URI.
</li>
</ul>
<p>
These sorts of assertions, links, are the basis of navigation
in the World Wide Web; they can be used for building
structure within the World Wide Web and also for creating a
semantic Web which can express knowledge about the world
itself. That is to say, links may be used both for the
structure of data, in which case they are metadata, but also
they may be used as a form of data.
</p>
<p>
Links, like all metadata can be transferred in three ways.
They can be embedded in a document, which is one end of the
link, they can be transferred in an HTTP message, for example
what is called the header of the document, and they can be
stored in a third document. This latter method has not been
used widely on the World Wide Web to date.
</p>
<h2>
Goal: <a name="Self-descr" id="Self-descr">Self-describing
information</a><br />
</h2>
<p>
A critical part of the design of the whole system is the way
that the semantics of metadata or indeed of data are defined.
The semantics of metadata in our RFC822 headers in mail
messages and in http messages are defined by hand in english
in the specifications of those protocols. The PICS system
takes this to one stage further in terms of flexibility by
allowing a message to contain a pointer to the document which
defines, in human readable terms, the semantics of each
assertion made within a <a href="#PICS">PICS</a> label. In
the future we would like to move toward a state in which any
metadata or eventually any form of machine readable data
carries a reference to the specification of the semantics of
all the assertions made within it.
</p>
<p>
For example, suppose that when a link is defined between two
documents, the relationship which is being asserted is
defined in a such way that it can be looked up on the World
Wide Web (i.e. using some form of URI), and someone or some
program, which has not come across that relationship before
can follow the link and extend its understanding or
functionality to take advantage of this new form of
assertion.
</p>
<p>
In the case of PICS, one can dynamically pick up a human
readable definition of what that assertion really means. In
PICS (and in theory in SGML using DTDs), one can also pick up
a machine readable definition of what form that assertion can
take, what syntax, what types of parameters it can take. This
allows a human interface to a new PICS scheme to built on the
fly. To go one step further, one could, given a suitable
logic or knowledge representation language, pick up a machine
readable definition of the semantics of that assertion in
terms of other relationships.
</p>
<p>
The advantages of such self describing information is that it
allows development of new applications and new functionality
independently by many groups across the web. Without
self-describing information, development must wait for large
companies or standards committees to meet and agree on the
commonly agreed semantics.
</p>
<p>
Of course a pragmatic way of extending software to handle new
forms of information is to dynamically download the code to
support a software object which can handle such data for one.
Whereas this is a powerful technique, and one which will be
used increasingly, it is not sufficient. It is not sufficient
because one has to trust the implementation of the object,
and the state.
</p>
<h4>
Goal
</h4>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
As much as possible of the syntax and semantics should
be able to be acquired by reference from a metadata
document.
</td>
</tr>
</tbody>
</table>
<h3>
Building Applications using Link Relationships
</h3>
<p>
It turns out that a very large number of applications both
built on top of the web and also built within the
infrastructure of the Web can largely be built by defining
new relationship types. Examples of these are the document
versioning problem which can be largely solved by defining
link values relating documents to previous and future
versions and to lists of versions; intellectual property
rights, distribution terms, and other labeling which can be
solved by making a link from one document to the document
containing the metadata.
</p>
<hr />
<h3>
Summary so far
</h3>
<ol>
<li>Metadata is data
</li>
<li>Metadata may refer to any resource which has a URI
</li>
<li>Metadata may be stored in any resource no matter to which
resource it refers
</li>
<li>Metadata can be regarded as a set of assertions, each
assertion being about a resource (A <i>u1</i>
...).
</li>
<li>Assertions which state a named relationship between two
resources are known links (A <i>u1 u2</i>)
</li>
<li>Assertion types (including link relationships) should be
first class objects in the sense that they should be able to
be defined in addressable resources and referred to by the
address of that resource A in { u }
</li>
<li>The development of new assertion types and link
relationships should be done in a consistent manner so that
these sort of assertions can be treated generically by people
and by software.
</li>
</ol>
<hr />
<p>
<i>Rough from here on down</i>
</p>
<h3 id="Label">
Label syntax: Assertions about a common subject
</h3>
<p>
When labeling information, it is often useful to make a lot
of statements about the same object. It is also useful to be
able to make the same set of statements about a set of
resources. For example, the assertions
</p>
<pre>
(A1 u1 a b ... )
(A2 u1 c d )
(A2 u1 a f g h )
</pre>
<p>
might be written
</p>
<pre>
(for u1
(A1 a b ... )
(A2 c d )
(A3 a f g h )
)
</pre>
<p>
Therefore in the syntax of an actual assertion the subject is
implicit. This is just the case with RFC822 headers which
implicitly refer to the following body, and with HTML "HEAD"
element contents which implicitly refer to the containing
document. (Though notice there is a fundamental
difference, discussed <a href=
"w:/DesignIssues/temp.html#mesages">below</a>, between a
general label and a message header because the message header
is definitive.)
</p>
<p>
So it is wise to recognise the label as case which it is wise
to specifically optimize in the syntax. <em>[In RDF this
indeed the case, that the subject is established as a
context, and then many properties are given within that
context. -2000/9]</em>
</p>
<p>
Assertions, when the subject is implicit, are known as
attribute-value pairs as discussed above. Let's use the term
"label" for a set of assertions with the subject extracted.
Like the label on a jam jar, it contains information
but there must be something else (in this case if its
placement on the jar) which tells you to what it applies.
(The PICS label in fact contained other information
too, including the subject and meta-meta-data about the
authorship of the label.)
</p>
<p>
Local definition:
</p>
<table border="1" cellpadding="2">
<tbody>
<tr>
<td>
A label is a set of assertions with a common implicit
subject. In this architecture it is a set of
attribute-value pairs
</td>
</tr>
</tbody>
</table>
<p>
<i>(There is a convention that you can write "Jam" on a jam
jar label. You don't write "Jam jar" or "Jam Jar
label". Even though I once saw a label on a cardboard
box with the words "Equipment shipping box label" on it!)</i>
</p>
<h3>
Authorship of Metadata
</h3>
<p>
It follows from the fact that metadata is data that here can
be metadata about it. Some of this metadata becomes
crucial when we consider a trust model. The logic we
need includes the author of metadata
</p>
<p>
p1: (A u1 . . .)
</p>
<p>
where p1 is ,in a system with low trust, the author as
stated, but in a cryptographically secure system is a
principle represented by a key.
</p>
<p>
On the web, the granularity of information is the resource.
Authorship and access control genrally use this granularity.
Therefore, typically, the trust one places in an assertion is
function the document which asserted it, and the metadata
about that document. However, when information is then
combined from many resources, one needs a language which
allows the source of the original to be recorded. Like
blockquote in HTML, this separates the data itself from the
resource, so the resource does assert the data directly but
asserts that it was asserted.
</p>
<h2>
Analysing labels
</h2>
<p>
See <a href="Labels.html">Analysing PICS labels as generic
Metadata</a>
</p>
<p>
where we look at PICS labels and try to sift out the actual
semantics of them. This is a thought experiment generating
requiremnts. The conclusions are that information such as
authorship and date information in fact form a tree of
assertions about assertions, and it is important to be clear
about the structure of that tree. The notion of a message is
brought up there too, but not followed up as it is not
germaine to the discussion at this point.
</p>
<h2>
Algebraic Manipulations
</h2>
<p>
If you can make assumptions about the properties of labels
then you can manipulate them, possibly without knowing
everything about their meaning. Properties such as
commutativity, transitivity and associativity would be very
useful to have easily available: perhaps in the syntax, or
failing that in the schema.
</p>
<p>
[See <a href="Semantic.html">Semantic Web roadmap</a> for
higher levels of logic]
</p>
<p>
For example, given a label saying a pair of jeans has a 32
inch waist and a price of $28, I can deduce a label which
just has the price of $28. But given a label which says
that the punishment for the crime is a 2 month in jail and a
fine of $3000, I can't deduce one that says that that
the punishment is 2 months in jail.
</p>
<p>
A typical use of metadata will be to provide a statement
along with its proof to be verified by another party.
Being able to process these things efficiently and with
limited knowledge will be crucial.
</p>
<p>
The most practical way to do this is to create a basic
commonvocabulary for the logical functions. Sometimes known
as the "RDF upper layers", these are mentioned in the
<a href="Semantic.html">note on the Semantic Web.</a>
</p>
<h4>
Ordered/Unordered
</h4>
<p>
The <a href="#independent">axiom of independence of
assertions</a> above gives us that in any set of assertions,
as assertions are independently true, specific assertions may
be removed or reordered, leaving the document just as valid
(though possibly less informative).
</p>
<p>
Examples of unordered things currently are: RFC822 message
header lines, SGML attributes. Examples of ordered things
are: HTTP header lines and SGML elements.
</p>
<p>
Do we need a form in which we can make an assertion which has
many parameters which are in fact not mutable in any way?
</p>
<h2>
Summary of Requirements
</h2>
<p>
There are ways of representing the above things:
messages, labels, specifying labels, and statements and
distinguish between them.
</p>
<p>
As much as possible of the syntax and semantics should be
able to be acquired by reference from a metadata document.
</p>
<p>
It must be possible to mix multiple vocabularies within the
same scope.
</p>
<p>
The syntax and structure should be such that as many
manipulations as possible can be done without having to know
the semantics of the vocabulary in use.
</p>
<p>
A common voabulary for basic logic and knowledge
representation functionality will be required.
</p>
<hr />
<h2>
References
</h2>
<p>
<a name="PICS" id="PICS">PICS</a> - The PICS project was a
project to define standards for interchange of endorsement
information, aimed at the content filterting problem. See the
PICS home page.
</p>
<hr />
<address>
Tim BL, January 1997
<p>
Last edit $Date: 2009/08/27 21:38:08 $
</p>
</address>
</body>
</html>