index.html
92.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349
2350
2351
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
<?xml version="1.0" encoding="UTF-8"?><!--*- nxml -*-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Gleaning Resource Descriptions from Dialects of Languages
(GRDDL)</title>
<meta name="RCS-Id" content="$Id: Overview.html,v 1.3 2007/09/10 14:49:31 connolly Exp $"/>
<style type="text/css">
.issue {
background-color:#dfd;
border: thin solid black;
color:black;
}
.assertion {
background-color:#dfd;
color:black;
}
.ed {
background-color:#fdf;
border: thin solid black;
color:black;
}
.postponed {
background-color:#fee;
border: thin dotted black;
color:black;
}
.tech {
background-color:#fdd;
border: thin solid black;
color:black;
font-size: 80%
}
.designSketch {
background-color:#fdf;
border: thin solid black;
color:black;
}
.illustration {
margin-left:auto;
margin-right:auto;
text-align:center;
}
.example {
margin-left:auto;
margin-right:auto;
padding-top:0.5em;
padding-bottom:0.5em;
width:85%;
border-top:thin dashed black;
border-bottom:thin dashed black;
}
td pre { font-size: smaller }
dfn { font-weight: bold }
/* try to get coherence bewteen the rule boxes */
table tr td.assertion { width: 500px }
</style>
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/base" />
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-REC" />
</head>
<body xml:lang="en" lang="en">
<div class="head">
<a href="http://www.w3.org/"><img alt="W3C"
src="http://www.w3.org/Icons/w3c_home" height="48" width="72" /></a>
<h1>Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</h1>
<h2>W3C Recommendation 11 September 2007</h2>
<dl>
<dt>This Version:</dt>
<dd><a href="http://www.w3.org/TR/2007/REC-grddl-20070911/">http://www.w3.org/TR/2007/REC-grddl-20070911/</a>
</dd>
<dt>Latest Version:</dt>
<dd><a href="http://www.w3.org/TR/grddl/">http://www.w3.org/TR/grddl/</a>
</dd>
<dt>Previous Version:</dt>
<dd><a href="http://www.w3.org/TR/2007/PR-grddl-20070716/">http://www.w3.org/TR/2007/PR-grddl-20070716/</a>
</dd>
<dt>Editor:</dt>
<dd><a
href="/People/Connolly/">Dan Connolly</a></dd>
<dt>Authors:</dt>
<dd>see <a href="#changes">Acknowledgments</a></dd>
</dl>
<p>Please refer to the <a
href="http://www.w3.org/2001/sw/grddl-wg/grddl-errata"><strong>errata</strong></a>
for this document, which may include some normative corrections.</p>
<p>See also <a href=
"http://www.w3.org/2003/03/Translations/byTechnology?technology=grddl"
><strong>translations</strong></a>.</p>
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2006-2007 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>
</div>
<hr />
<div><h2>Abstract</h2>
<p>GRDDL is a mechanism for <b>G</b>leaning <b>R</b>esource
<b>D</b>escriptions from <b>D</b>ialects of <b>L</b>anguages. This
GRDDL specification introduces markup based on existing standards for
declaring that an XML document includes data compatible with the
Resource Description Framework (RDF) and for linking to algorithms
(typically represented in XSLT), for extracting this data from the
document.</p>
<p>The markup includes a namespace-qualified attribute for use
in general-purpose XML documents and a profile-qualified
link relationship for use in valid XHTML documents. The GRDDL
mechanism also allows an XML namespace document
(or XHTML profile document) to declare that every document associated
with that namespace (or profile) includes gleanable data and for
linking to an algorithm for gleaning the data.</p>
<p>A corresponding <a href="#usecases">GRDDL Use Case Working
Draft</a> provides motivating examples. A <a href="#primer">GRDDL
Primer</a> demonstrates the mechanism on XHTML documents which include
widely-deployed dialects known as microformats. A
<a href="#GRDDL-TESTS">GRDDL Test Cases</a> document illustrates
specific issues in this design and provides materials to
aid in test-driven development of GRDDL-aware agents.
</p>
</div>
<div>
<h2 id="status">Status of This Document</h2>
<p><em>This section describes the status of this document at the time
of its publication. Other documents may supersede this document. A
list of current W3C publications and the latest revision of this
technical report can be found in the <a
href="http://www.w3.org/TR/">W3C technical reports index</a> at
http://www.w3.org/TR/.</em></p>
<p>This is a <a
href="http://www.w3.org/2005/10/Process-20051014/tr.html#RecsW3C">W3C
Recommendation</a>.</p>
<p>This document has been reviewed by W3C Members, by software
developers, and by other W3C groups and interested parties, and is
endorsed by the Director as a W3C Recommendation. It is a stable
document and may be used as reference material or cited from another
document. W3C's role in making the Recommendation is to draw attention
to the specification and to promote its widespread deployment. This
enhances the functionality and interoperability of the Web.</p>
<p>Comments on this document should be sent to
<a
href="mailto:public-grddl-comments@w3.org">public-grddl-comments@w3.org</a>,
a mailing list with a <a href=
"http://lists.w3.org/Archives/Public/public-grddl-comments">public
archive</a>.</p>
<p>This document was produced by <a
href="http://www.w3.org/2001/sw/grddl-wg/">GRDDL Working Group</a>,
which is part of the <a href="http://www.w3.org/2001/sw/Activity">W3C
Semantic Web Activity</a>. The first release of this document as a
Working Draft was 24 Oct 2006 and the Working Group has
addressed a number of <a href=
"http://lists.w3.org/Archives/Public/public-grddl-comments/">comments
received</a> and <a
href="http://www.w3.org/2001/sw/grddl-wg/issues">issues</a> since then.
<span class="assertion" id="sotd_ex">Normative assertions are marked
up in this way.</span>
</p>
<p id="implExp">The Working Group's <a
href="http://www.w3.org/2001/sw/grddl-wg/td/test_results">implementation
report</a> demonstrates that the goals for interoperable
implementations, set in the <a
href="http://www.w3.org/TR/2007/CR-grddl-20070502/">May 2007 Candidate
Recommendation draft of this document</a>, were achieved.</p>
<p>GRDDL is intended to contribute to addressing Web Architecture
issues such as <a href=
"http://www.w3.org/2001/tag/issues.html?type=1#RDFinXHTML-35"
>RDFinXHTML-35</a>, <a href=
"http://www.w3.org/2001/tag/issues.html?type=1#namespaceDocument-8"
>namespaceDocument-8</a>, and
<a href=
"http://www.w3.org/2001/tag/issues.html?type=1#xmlFunctions-34"
>xmlFunctions-34</a> as well as issues postponed by the RDF Core
working group such as <a href=
"http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf"
>rdfms-validating-embedded-rdf</a> and <a href=
"http://www.w3.org/2000/03/rdf-tracking/#faq-html-compliance"
>faq-html-compliance</a>.
<span class="postponed">In particular, the GRDDL Working Group has
postponed <a
href="http://www.w3.org/2001/sw/grddl-wg/issues#issue-faithful-infoset">issue-faithful-infoset</a>,
and anticipates that the resolution of TAG issue <a
href="http://www.w3.org/2001/tag/issues.html?type=1#xmlFunctions-34"
>xmlFunctions-34</a> will provide further clarification and
guidance.</span>
</p>
<p> This document was produced by a group operating
under the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>.
W3C maintains a
<a rel="disclosure"
href="http://www.w3.org/2004/01/pp-impl/39407/status">
public list of any patent disclosures</a> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p>
<p>The <span id="issues">issues appendix</span> that used to
be part of this draft has been moved to a <a href="http://www.w3.org/2001/sw/grddl-wg/issues">Working
Group issues list</a>; specifically:
<a id="issue-whichlangs" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-whichlangs"
>issue-whichlangs</a>,
<a id="issue-output-formats" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-output-formats"
>issue-output-formats</a>,
<a id="issue-base-param" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-base-param"
>issue-base-param</a>,
<a id="issue-tx-element" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-tx-element"
>issue-tx-element</a>,
<a id="issue-html-nsdoc" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-html-nsdoc"
>issue-html-nsdoc</a>,
<a id="issue-faithful-infoset" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-faithful-infoset"
>issue-faithful-infoset</a>,
<a id="issue-mt-ns" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-mt-ns"
>issue-mt-ns</a>,
<a id="issue-conformance-labels" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-conformance-labels"
>issue-conformance-labels</a>,
<a id="issue-http-header-links" href=
"http://www.w3.org/2001/sw/grddl-wg/issues#issue-http-header-links"
>issue-http-header-links</a>
</p>
<!--
<p>
<span class="ed">Each assertion bears an ID. An index of rules would
be nice to have; in the interest of stability, the editor is not
adding it just yet. An <a href="spec_lean">extract of normative
material only</a> has been put on hold indefinitely.</span>
</p>
-->
</div>
<div>
<h2 id="toc">Table of Contents</h2>
<ol>
<li><a href="#intro">Introduction</a></li>
<li><a href="#grddl-xml">Adding GRDDL to well-formed XML</a></li>
<li><a href="#ns-bind">GRDDL for XML Namespaces</a></li>
<li><a href="#grddl-xhtml">Using GRDDL with valid XHTML</a></li>
<li><a href="#profile-bind">GRDDL for HTML Profiles</a></li>
<li><a href="#txforms">GRDDL Transformations</a></li>
<li><a href="#sec_agt">GRDDL-Aware Agents</a></li>
<li><a href="#sec">Security Considerations</a></li>
<li><a href="#grddlvocab">The GRDDL Vocabulary</a></li>
<li><a href="#bib">References</a></li>
</ol>
<ul>
<li>Appendix: <a href="#stylepi">Transformations for Styling versus
data extraction</a></li>
<li>Appendix: <a href="#base_misc">Base IRI considerations</a></li>
<li>Appendix: <a href="#changes">Acknowledgements and Change History</a></li>
</ul>
<div>Linked documents:</div>
<ul>
<li>Appendix: <a id="mechspec" href="spec_rules"
>About the Mechanical Rules</a></li>
</ul>
</div>
<div>
<h2 id="intro"><span class="gen">1. </span>Introduction: Data and Documents</h2>
<p>There are many domain-specific languages ("dialects") used in
practice among the many XML documents on the web. There are dialects
of XHTML, XML and RDF that are used to represent everything from
poetry to prose, purchase orders to invoices, spreadsheets to
databases, schemas to scripts, and linked lists to ontologies.</p>
<p>While this breadth of expression is quite liberating, inspiring new
dialects to represent information, it can
be a barrier to understanding across different domains or
fields. How, for example, does software discover the author of a poem,
a spreadsheet and an ontology? And how can software determine whether
authors of each are in fact the same?</p>
<p>The following are examples of how the same musical work might be
described in different XML dialects:</p>
<dl>
<dt>iTunes Music Library</dt>
<dd>
<pre>
<key>Artist</key>
<string>The Jimi Hendrix Experience</string>
<key>Album</key>
<string>Are You Experienced?</string>
</pre>
</dd>
<dt>Audioscrobbler</dt>
<dd>
<pre>
<album>
<artist mbid="">The Jimi Hendrix Experience</artist>
<name>Are You Experienced?</name>
...
</album>
</pre>
</dd>
<dt>Atom</dt>
<dd>
<pre>
<entry ... >
<title>Are You Experienced?</title>
<author>
<name>The Jimi Hendrix Experience</name>
</author>
...
</entry>
</pre>
</dd>
<dt>Open Office</dt>
<dd><pre>
<office:document-meta ... >
<office:meta>
<dc:title>Are You Experienced?</dc:title>
<meta:initial-creator>
The Jimi Hendrix Experience
</meta:initial-creator>
<dc:creator>The Jimi Hendrix Experience</dc:creator>
</office:meta>
</office:document-meta>
</pre>
</dd>
</dl>
<p>Although the examples above are obviously encodings of the same information,
there remains no clear mechanism through which computer software
might be able to determine this connection.</p>
<h3 id="intro_rdf">Resource Descriptions</h3>
<p>The Resource Description Framework<a href="#RDFC04">[RDFC04]</a>
provides a standard for making statements about resources in the form
of a subject-predicate-object expression. One way to represent the
fact "<cite>Are You Experienced?</cite>'s artist is The Jimi Hendrix
Experience" in RDF would be as a triple whose subject is <cite>Are You
Experienced</cite>, whose predicate is "has artist," and whose object
is The Jimi Hendrix Experience. The predicate, "has artist" expresses
a relationship between the subject (Are You Experienced?) and the
object (The Jimi Hendrix Experience). Using URIs to uniquely identify
the album, the artist and even the relationship would facilitate
software design because not everyone knows The Jimi Hendrix Experience
or even spells its name consistently.</p>
<p>Here's the information contained in the XML fragments above, this
time expressed as RDF:</p>
<pre class="example">
<rdf:RDF
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about=
"http://musicbrainz.org/mm-2.1/album/6b050dcf-7ab1-456d-9e1b-c3c41c18eed2">
<dc:title>Are You Experienced?</dc:title>
<foaf:maker>
<foaf:Agent rdf:about=
"http://musicbrainz.org/mm-2.1/artist/33b3c323-77c2-417c-a5b4-af7e6a111cc9">
<foaf:name>The Jimi Hendrix Experience</foaf:name>
</foaf:Agent>
</foaf:maker>
</rdf:Description>
</rdf:RDF>
</pre>
<p>Both the entities (subject and object resources) and relationships
(predicates) are identified using unambiguous URIs.</p>
<p><em>Note that GRDDL follows HTML 4, RDF, and XML Schema in using
<em>Internationalized Resource Identifiers</em>, i.e. IRIs<a
class="norm" href="#rfc3987">[RFC3987]</a>. While in informal usage,
this specification uses the more familiar term <q>URI</q>
interchangeably with the recently standardized term <q>IRI</q>, the
formal rules use the relevant terms precisely.</em>
</p>
<p>The publishers of the XML above could also provide the same data in
RDF using RDF/XML or one of the other RDF syntaxes.
GRDDL provides a relatively inexpensive mechanism for bootstrapping
RDF content from uniform XML dialects, shifting the burden from
formulating RDF to creating transformation algorithms specifically for
each dialect.
</p>
<p>GRDDL works by associating transformations for an
individual document, either through direct inclusion of references or
indirectly through profile and namespace documents. Content authors
can nominate the transformations for producing RDF from their content
and use GRDDL to refer to them. </p>
<div><h3 id="sec_rend">Faithful Renditions</h3>
<p>By specifying a GRDDL transformation, the author of a document
states that the transformation will provide a faithful rendition in
RDF of information (or some portion of the information) expressed
through the XML dialect used in the source document.</p>
<p>Likewise, by specifying a GRDDL namespace transformation or profile
transformation, the creator of that namespace or profile states that
the transformation will provide a faithful RDF rendition of a class of
source documents which relate to that namespace or profile. A
namespace document or a profile document also provide a means for
their authors to explain in prose the purpose of the transformation or
any policy statements.</p>
</div>
<div><h3 id="intro_spec">Preface and Companion Documents</h3>
<p>This GRDDL specification is a concise technical specification of
the GRDDL mechanism and its XML syntax. It specifies the GRDDL syntax
to use in valid XHTML and well-formed XML documents, as well as how to
encode GRDDL into namespaces and HTML profiles. Discussions of the
GRDDL transformation link and security issues are also
covered. Appendices provide links to extended examples and existing
software and services that employ GRDDL.</p>
<h4 id="intro_primer">GRDDL Primer</h4>
<p>The GRDDL Primer<a href="#primer">[primer]</a> is a step-by-step tutorial on
the GRDDL mechanism. It develops a number of examples from the
GRDDL Use Cases document to illustrate GRDDL techniques for
associating documents with transformations for extracting RDF.</p>
<h4 id="intro_uc">GRDDL Use Cases</h4>
<p>The use cases document<a href="#usecases">[usecases]</a> collects a
number of use cases with their goals and requirements for
GRDDL.
These use cases also illustrate how XML and XHTML documents can be
decorated with microformats, Embedded RDF or RDFa statements to support
GRDDL transformations in charge of extracting valuable data that can
then be used to automate a variety of tasks.</p>
<h4 id="intro_testcases">GRDDL Test Cases</h4>
<p>The GRDDL Test Cases<a class="inform" href="#GRDDL-TESTS">[GRDDL-TESTS]</a>
provides a collection of tests illustrating this specification.
Some of the tests may help clarify the intended
reading of the normative text.</p>
</div>
</div>
<div><h2 id="grddl-xml"><span class="gen">2. </span>Adding GRDDL to well-formed XML</h2>
<p>The general form of associating a GRDDL transformation link with a
well-formed XML document is adding to the root element a
<code>grddl</code> namespace declaration and a
<code>grddl:transformation</code> attribute whose value is an IRI
reference, or list of IRI references, that refer to executable scripts
or programs which are expected to transform the source document into
RDF. This method is suitable for use with any XML dialects that can
accomodate an extra namespace-qualified attribute on the root
element.</p>
<p>For example, this XML document,
located at
<tt>http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html</tt>,
is linked to two GRDDL transformations:</p>
<pre class="example">
<html xmlns="http://www.w3.org/1999/xhtml"
<b>xmlns:grddl='http://www.w3.org/2003/g/data-view#'</b>
<b>grddl:transformation="glean_title.xsl
http://www.w3.org/2001/sw/grddl-wg/td/getAuthor.xsl"</b>
>
<head>
<title>Are You Experienced?</title>
<em>[...]</em>
</html>
</pre>
<ol>
<li>It is linked to the transformation identified by
<tt>http://www.w3.org/2001/sw/grddl-wg/td/getAuthor.xsl</tt>.</li>
<li>To resolve the relative URI reference <tt>glean_title.xsl</tt>
to absolute form, we use the base URI of this XML element,
<tt>http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html</tt>.
Then this document is also linked to the GRDDL transformation
identified by the absolute form,
<tt>http://www.w3.org/2001/sw/grddl-wg/td/glean_title.xsl</tt>.</li>
</ol>
<div class="illustration">
<img src="figTitleAuthor.png" alt="diagram: link to multiple transformations" />
<p>extracting title and author information</p>
<small>(<a href="figTitleAuthor.svg">svg</a>)</small>
</div>
<p>As you will see in later sections, there are other ways to add GRDDL
to HTML documents, especially designed to leverage HTML's existing capabilities
and thereby overcome constraints imposed by the XML DTDs for some dialects of HTML.
See <a href="#grddl-xhtml">Using GRDDL with valid XHTML</a> and
<a href="#profile-bind">GRDDL for HTML Profiles</a>.
</p>
<p>The formal specification of this markup is given below. <em>An
informative mechanical version of each rule is given with the premise
and the conclusion written as SPARQL graph patterns<a
href="#SPARQL">[SPARQL]</a>. See the <a href="spec_rules">Mechanical
Rules</a> appendix for namespace prefix bindings and further
explanation.
These are included for those readers who find them helpful.
Other readers are encouraged to ignore them.
</em></p>
<table border="1">
<tr>
<th>Normative Statement</th><th>Mechanical Rule<br />(Informative)</th>
</tr>
<tr>
<td class="assertion" id="rule_GRDDL_transformation">
Given an XPath<a href="#XPATH">[XPATH]</a> root
node <var>N</var> with root element <var>E</var>,
if the expression
<pre>/*/@*[local-name()="transformation"
and namespace-uri()=
"http://www.w3.org/2003/g/data-view#"]</pre>
matches an attribute of
an element
<var>E</var>, then for each <a href="#stok">space-separated
token</a> <var>REF</var> in the value of that attribute, the resource
identified<a class="norm" href="#WEBARCH">[WEBARCH]</a> by the
absolute form (see section 5.2 Relative Resolution in <a class="norm"
href="#rfc3986">[RFC3986]</a>) of <var>REF</var> with respect to the
base IRI<a class="norm" href="#rfc3987">[RFC3987]</a>,<a class="norm" href="#XMLBASE">[XMLBASE]</a>
of <var>E</var> is a <dfn>GRDDL transformation</dfn> of
<var>N</var>.
<p id="stok">
<dfn>Space-separated tokens</dfn> are the maximal non-empty
subsequences not containing the whitespace characters #x9, #xA, #xD or
#x20.
</p>
</td>
<td>
<table class="rule">
<tr><td>
<pre>
(?N "/*") gspec:xpath ?E.
(?N """/*/@*[local-name()="transformation" and
namespace-uri()=
"http://www.w3.org/2003/g/data-view#"]""")
gspec:xpath [ fn:string ?V].
?V fn:normalize-space ?Vnorm.
(?Vnorm "[ \t\r\n]+") fn:tokenize [
list:member ?REF ].
?E fn:base-uri ?BASE.
(?REF ?BASE) fn:resolve-uri ?TXURI.
?TX log:uri ?TXURI.
</pre>
</td></tr>
<tr><td><hr /></td></tr>
<tr><td>
<pre>?N grddl:transformation ?TX.</pre>
</td></tr>
</table>
</td>
</tr>
</table>
<p>The <tt>glean_title.xsl</tt> transformation computes
the following RDF/XML document, given the XML document
above as input:</p>
<pre class="example">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="">
<dc:title>Are You Experienced?</dc:title>
</rdf:Description>
</rdf:RDF>
</pre>
<p>The graph serialized by that document is a <b>GRDDL result</b> of
the resource identified by
<tt>http://www.w3.org/2001/sw/grddl-wg/td/titleauthor.html</tt>. Note
that this serialization of the graph contains a relative URI reference
(in the value of the <tt>rdf:about</tt> attribute). The base IRI for
interpretting relative IRI references in a serialization of a
graph produced by a GRDDL transformation is the base IRI of the source
document.</p>
<p>The <tt>glean_title.xsl</tt> resource specifies a function from
XPath document nodes to RDF/XML documents, and hence to RDF graphs;
this function is called the <b>transformation property</b> of the XSLT
document. See the <a href="#txforms">GRDDL Transformations
section</a> for more details.</p>
<p>The general rule for using GRDDL with well-formed XML is:</p>
<table border="1">
<tr>
<td class="assertion" id="rule_result">
If an information resource(<a class="norm" href="#WEBARCH">[WEBARCH]</a>,
section 2.2) <var>IR</var>
is represented by an XML document with
an XPath root node <var>R</var>,
and <var>R</var> has a GRDDL transformation
with a <dfn>transformation property</dfn> <var>TP</var>,
and <var>TP</var> applied to <var>R</var> gives an
RDF Graph<a class="norm" href="#RDFC04">[RDFC04]</a>
<var>G</var>, then <var>G</var>
is a <dfn>GRDDL result</dfn> of <var>IR</var>.
</td>
<td>
<table class="rule">
<tr><td>
<pre>
?IR log:uri [ fn:doc ?R ].
?R grddl:transformation [ grddl:transformationProperty ?TP ].
?R ?TP ?G.
</pre>
</td>
</tr>
<tr><td><hr /></td></tr>
<tr>
<td>
<pre>
?IR grddl:result ?G .
</pre>
</td>
</tr>
</table>
</td>
</tr>
</table>
<p>The <tt>titleauthor.html</tt> resource has another GRDDL
result via the <tt>getAuthor.xsl</tt> transformation. These
results can be merged together into another result, by
this rule:</p>
<table border="1">
<tr>
<td class="assertion" id="rule_merge">
If <var>F</var> and <var>G</var> are <b>GRDDL results</b> of <var>IR</var>,
then the
<a class="norm"
href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#defmerge">merge</a>
<a class="norm" href="#RDF-MT">[RDF-MT]</a>
of <var>F</var> and <var>G</var> is also a <b>GRDDL result</b> of <var>IR</var>.
</td>
<td>
<table class="rule">
<tr>
<td>
<pre>
?IR grddl:result ?F, ?G.
(?F ?G) log:conjunction ?H.</pre>
</td>
</tr>
<tr><td><hr /></td></tr>
<tr><td>
<pre>
?IR grddl:result ?H.</pre>
</td>
</tr>
</table>
</td>
</tr>
</table>
</div>
<div><h2 id="ns-bind"><span class="gen">3. </span>Using GRDDL with XML Namespace Documents</h2>
<p>Transformations can be associated not only with individual
documents but also with whole dialects that share an XML namespace.
Any resource available for retrieval from a namespace URI is a
<dfn>namespace document</dfn>
(cf. section <a class="norm"
href="http://www.w3.org/TR/2004/REC-webarch-20041215/#namespace-document">4.5.4. Namespace
documents</a> in <a class="norm" href="#WEBARCH">[WEBARCH]</a>). For example, a
namespace document may have an XML Schema representation or an RDF
Schema representation, or perhaps both, using <a class="norm"
href="http://www.w3.org/TR/webarch/#def-coneg">content
negotiation</a>.</p>
<!-- er... the conneg link isn't really normative,
but the fixrefs.xsl script doesn't grok citing the
same document both normatively and informatively. -->
<p>To associate a GRDDL transformation with a whole dialect, include
a <code>grddl:namespaceTransformation</code> property in a GRDDL
result of the namespace document.</p>
<p id="sec_rdf_nsdoc">For example, consider this privacy policy written in P3Q, a
contrived analog to P3P<a href="#P3P">[P3P]</a>:</p>
<div class="example">
<pre><POLICIES xmlns="http://www.w3.org/2004/01/rdxh/p3q-ns-example">
<EXPIRY max-age="604800"/>
<em>...</em>
</pre></div>
<p>The namespace document for P3Q relates the <tt>grokP3Q.xsl</tt>
transformation to all P3Q documents:</p>
<div class="example">
<pre><rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dataview="http://www.w3.org/2003/g/data-view#">
<rdf:Description rdf:about="http://www.w3.org/2004/01/rdxh/p3q-ns-example">
<dataview:namespaceTransformation
rdf:resource="http://www.w3.org/2004/01/rdxh/grokP3Q.xsl"/>
</rdf:Description>
</rdf:RDF>
</pre></div>
<p>That is: every document whose root namespace name
is <tt>...p3q-ns-example</tt> has <tt>grokP3Q.xsl</tt>
as a <b>GRDDL transformation</b> implicitly, as illustrated
in this figure:</p>
<div class="illustration">
<img src="figGleanNsDoc.png" alt="diagram: glean via namespace" />
<br />transformation applied to namespace<br />
<small>(<a href="figGleanNsDoc.svg">svg</a>)</small></div>
<p>Some namespace documents, such as the XHTML namespace document
<tt>http://www.w3.org/1999/xhtml</tt> have very many references to
them. If GRDDL-aware agents were to retrieve these documents every
time they processed a document referring to them, the origin servers
of those documents could become overloaded. GRDDL-aware agents
therefore should not retrieve such documents on every reference and
should retain some cache or local memory of the transformations those
documents indicate should be applied. To avoid misrepresentation of
published information, GRDDL-aware agents should ensure that this
local memory is up to date and should support user options to
configure or disable the cache. See also section section <a class="norm"
href="http://www.w3.org/TR/webarch/#dereference-uri">3.1. Using a URI
to Access a Resource</a> of <a class="norm"
href="#WEBARCH">[WEBARCH]</a>.</p>
<p>The general case of namespace transformations is:</p>
<table border="1">
<tr>
<th>Normative Statement</th><th>Mechanical Rule<br />(Informative)</th>
</tr>
<tr>
<td class="assertion" id="rule_nstx">
If
<ul>
<li>an information resource <var>NSDOC</var>, identified by an IRI
<var>NS</var> has a <b>GRDDL result</b> that includes a triple
whose
<ul>
<li> subject is <var>NSDOC</var>, whose</li>
<li>predicate is the property
<tt><http://www.w3.org/2003/g/data-view#namespaceTransformation></tt>,
and whose</li>
<li>object is <var>TX</var>,</li>
</ul>
</li>
<li>and an information resource
<var>IR</var> has an XML representation with
root node <var>NODE</var> and with a root element
with a namespace name <var>NS</var>,</li>
</ul> then <var>TX</var> is a <b>GRDDL
transformation</b> of <var>NODE</var>.
</td>
<td>
<table class="rule">
<tr><td>
<pre>
?NSDOC log:uri ?NS;
grddl:result [
log:includes [
rdf:subject ?NSDOC;
rdf:predicate grddl:namespaceTransformation;
rdf:object ?TX]].
?IR log:uri [ fn:doc ?NODE].
(?NODE "/*") gspec:xpath ?E.
?E fn:namespace-uri ?NS.
</pre>
</td></tr>
<tr><td><hr /></td></tr>
<tr><td>
<pre>
?NODE grddl:transformation ?TX.
</pre>
</td></tr>
</table>
</td>
</tr>
</table>
<p>Note that as a base case, the result of parsing an RDF/XML
document is a GRDDL result of that document:</p>
<table border="1">
<tr>
<th>Normative Statement</th><th>Mechanical Rule<br />(Informative)</th>
</tr>
<tr>
<td class="assertion" id="rule_rdfxbase">
If an information resource <var>IR</var> is represented
by a
<a class="norm" href="http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#dfn-conforming-rdf-xml-document">conforming RDF/XML document</a><a href="#RDFX">[RDFX]</a>,
then the RDF graph represented by that document
is a <dfn>GRDDL result</dfn> of <var>IR</var>.
</td>
<td>
<table class="rule">
<tr><td>
<pre>
?IR log:uri [ fn:doc [ gspec:rdfParse ?G ] ].
</pre>
</td></tr>
<tr><td><hr /></td></tr>
<tr><td>
<pre>
?IR grddl:result ?G.
</pre>
</td></tr>
</table>
</td>
</tr>
</table>
<p>Note that while an <tt>application/rdf+xml</tt> media type is one
indication that a document is RDF/XML, section <a href=
"http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#start"
>7.2.1 Grammar start</a> of <a href="#RDFX">[RDFX]</a> leaves open
"other means" by which an RDF/XML document may be identified. For the
purposes of the rule above, a root element whose local name is
<code>RDF</code> and whose namespace URI is
<code>http://www.w3.org/1999/02/22-rdf-syntax-ns#</code> is such a
means. For a case in point, see the <a href=
"http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#grddlonrdf-xmlmediatype"
>grddlonrdf-xmlmediatype</a> test case.</p>
<div><h3 id="sec_xsd_nsdoc">Example: Using GRDDL with an XML Schema
namespace document</h3>
<p>A namespace transformation link may be discoverable by transforming
the namespace document itself. Note that this means that namespace
documents need not be written in RDF/XML directly.</p>
<p>Consider a purchase order that has a namespace document
represented in XML Schema, where the XML Schema bears
a <tt>data-view:transformation</tt>
attribute licensing extraction of statements that include
<tt>namespaceTransformation</tt> statements:</p>
<div class="example">
<pre>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http:.../Order-1.0"
targetNamespace="http:.../Order-1.0"
version="1.0"
...
xmlns:data-view="http://www.w3.org/2003/g/data-view#"
data-view:transformation="http://www.w3.org/2003/g/embeddedRDF.xsl" >
<xsd:element name="Order" type="OrderType">
<xsd:annotation
<xsd:documentation>This element is the root element.</xsd:documentation>
</xsd:annotation>
...
<xsd:annotation>
<xsd:appinfo>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://www.w3.org/2003/g/po-ex">
<data-view:namespaceTransformation
rdf:resource="grokPO.xsl" />
</rdf:Description>
</rdf:RDF>
</xsd:appinfo>
</xsd:annotation>
<em>...</em>
</pre></div>
<p>Every purchase order using that schema as a namespace document
is linked to the <code>grokPO.xsl</code> transformation, as
illustrated below:</p>
<div class="illustration">
<img src="figGleanPO.png" alt="diagram: glean via namespace" />
<p>using GRDDL with an XML Schema</p>
<small>(<a href="figGleanPO.svg">svg</a>)</small></div>
</div>
</div>
<div><h2 id="grddl-xhtml"><span class="gen">4.</span> Using GRDDL with valid XHTML</h2>
<p>To accomodate the DTD-based syntax of XHTML<a
href="#XHTML">[XHTML]</a>, which precludes using attributes from
foreign namespaces, we use <code><a rel="ns-claim"
href="http://www.w3.org/2003/g/data-view">http://www.w3.org/2003/g/data-view</a></code>
as a metadata profile (cf. section <a class="norm"
href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3">7.4.4.3
Meta data profiles</a> of <a href="#HTML4">[HTML4]</a>).</p>
<p>The general form of adding a GRDDL assertion to a valid XHTML
document is by specifying the GRDDL profile in the
<code>profile</code> attribute of the <code>head</code> element, and
<code>transformation</code> as the value of the <code>rel</code>
attribute of a <code>link</code> or <code>a</code> element whose
<code>href</code> attribute value is an IRI reference that refers to an
executable script or program which is expected to transform the source
document into RDF. This method is suitable for use
with valid XHTML documents which are constrained by an XML DTD.
</p>
<div><h3 id="sec_dubc_ex">An example Dublin Core META transformation</h3>
<p>For example, this document follows the conventions of
<a href="#RFC2731">[RFC2731]</a>, and it explicitly uses the GRDDL
profile and links to an XSLT transformation to
RDF/XML to signal that the transformation is a faithful
rendition:</p>
<pre class="example"><html xmlns="http://www.w3.org/1999/xhtml">
<head <b>profile="http://www.w3.org/2003/g/data-view"</b>>
<title>Some Document</title>
<link <b>rel="transformation"</b>
href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />
<meta name="DC.Subject"
content="ADAM; Simple Search; Index+; prototype" />
...
</head>
...
</html></pre>
<p>The figure below shows the source document, the
<tt>dc-extract.xsl</tt> transformation, and the GRDDL result:</p>
<div class="illustration">
<img src="figGlean.png" alt="diagram: link to transformation" />
<p>Decoding HTML meta-data to RDF</p>
<small>(<a href="figGlean.svg">svg</a>)</small></div>
<p>This is what the data looks like in RDF/XML:</p>
<pre class="example"><rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="">
<dc:subject>ADAM; Simple Search; Index+; prototype</dc:subject>
</rdf:Description>
</rdf:RDF></pre>
</div>
<div><h3 id="sec_multi">Multiple transformations in XHTML</h3>
<p>An XHTML document may conform to a number of dialects
simultaneously and link to more than one GRDDL transformation. However,
since the <code>href</code> attribute of the <code>link</code> and
<code>a</code> elements accept only a single IRI reference, multiple
instances of these elements must be used to assert multiple links:</p>
<div class="example">
<pre><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head profile="http://www.w3.org/2003/g/data-view">
<title>Joe Lambda's Home page [an example of RDF in XHTML]</title>
<link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokFOAF.xsl" />
<link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokCC.xsl" />
<link rel="transformation" href="http://www.w3.org/2003/12/rdf-in-xhtml-xslts/grokGeoURL.xsl" />
...
</pre></div>
<div class="illustration">
<img src="figMultiTxform.png" alt="diagram: link to multiple transformations" />
<p>multiple transformations</p>
<small>(<a href="figMultiTxform.svg">svg</a>)</small>
</div>
</div>
<div><h3 id="prof_rules">Rules for GRDDL with valid XHTML</h3>
<p>The general rule is:</p>
<table border="1">
<tr><td class="assertion" id="rule_tlrel">
Given XPath root node <var>N</var>, if
<var>N</var> has <a href="#rule_metadata_profile_name">metadata profile name</a>
<tt>http://www.w3.org/2003/g/data-view</tt>, then
for each <tt>a</tt> and <tt>link</tt> descendant element <var>E</var>
whose <a href=
"http://www.w3.org/TR/1999/REC-html401-19991224/struct/links.html#adef-rel">
<tt>rel</tt>
attribute</a><a class="norm" href="#HTML4">[HTML4]</a> has
<tt>transformation</tt> as one of its <a href="#stok">space separated
values</a>
the resource identified by the absolute form of the
<tt>href</tt> attribute with respect to the base IRI of <var>E</var>
is a <dfn>GRDDL transformation</dfn> of <var>N</var>.
</td>
<td>
<table class="rule">
<tr><td>
<pre>
?N gspec:profileName "http://www.w3.org/2003/g/data-view".
(?N
""".//*[namespace-uri()="http://www.w3.org/1999/xhtml" and
(local-name() = "a"
or local-name() = "link")"""
) gspec:xpath ?E.
(?E "@rel") gspec:xpath [ fn:string [
fn:normalize-space ?E_REL ]].
(?E_REL "[ \t\r\n]+") fn:tokenize [
list:member "transformation" ].
(?E "@href") gspec:xpath [ fn:string ?T_REF ].
?E gspec:htmlBase ?BASE.
(?T_REF ?BASE) fn:resolve-uri ?TURI.
?T log:uri ?TURI.
</pre>
</td>
</tr>
<tr><td><hr /></td></tr>
<tr>
<td>
<pre>
?N grddl:transformation ?T.
</pre>
</td>
</tr>
</table>
</td>
</tr>
</table>
<p>Note that the base IRI of an element node in an XHTML document may
be influenced by factors such as a <a
href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/links.html#edef-BASE"><tt>base</tt>
element</a><a class="norm" href="#HTML4">[HTML4]</a> <a
href="http://gbiv.com/protocols/uri/rev-2002/rfc2396bis.html#base-retrieval">Retrieval
URI</a><a href="#rfc3986">RFC3986</a>, etc. See the <a href="#base_misc">Base IRI considerations</a> appendix and test cases such as <a
href="http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#htmlbase1">htmlbase1</a>
for further clarification.</p>
<p>The rule above depends on the following formalization of
metadata profiles in XHTML:</p>
<table border="1">
<tr><td class="assertion" id="rule_metadata_profile_name">
Given an XPath root node <var>N</var> of an XHTML document
(that is, an XML document whose root element has
a local name of <tt>html</tt> and a namespace name of
<tt>http://www.w3.org/1999/xhtml</tt>)
for each <a href="#stok">space-separated
token</a> <var>REF</var> in the value of the <tt>profile</tt>
attribute<a class="norm" href="#HTML4">[HTML4]</a>
of the <tt>head</tt> element <var>E</var>,
the absolute form of <var>REF</var> with respect to the
base IRI of <var>E</var> is a <dfn>metadata profile name</dfn> of
<var>N</var>.
</td>
<td>
<table class="rule">
<tr><td>
<pre>
(?N
"""
*[local-name()="html" and
namespace-uri()="http://www.w3.org/1999/xhtml"] /
*[local-name()="head" and
namespace-uri()="http://www.w3.org/1999/xhtml"]""")
gspec:xpath ?E.
(?E "@profile") gspec:xpath [ fn:string ?V ].
?E fn:base-uri ?BASE.
?V fn:normalize-space ?Vnorm.
(?Vnorm "[ \t\r\n]+") fn:tokenize [ list:member ?P_REF ].
(?P_REF ?BASE) fn:resolve-uri ?PROFID.
</pre>
</td>
</tr>
<tr><td><hr /></td></tr>
<tr>
<td>
<pre>
?N gspec:profileName ?PROFID.
</pre>
</td>
</tr>
</table>
</td></tr>
</table>
</div>
</div>
<div><h2 id="profile-bind"><span class="gen">5. </span>GRDDL for HTML Profiles</h2>
<p>XHTML provides the profile mechanism to link to the meaning of properties
and the set of legal values for those properties. As with namespace documents,
a profile document can effectively be written using XHTML with embedded RDF statements
and a GRDDL transformation to extract the definition of terms that are applicable.
Those terms can then be used in an XHTML document to convey profile-dependent meaning.
As discussed in
<a href="#grddl-xhtml">Using GRDDL with valid XHTML</a>, the GRDDL profile can be used
with XHTML documents to apply GRDDL semantics over <code>link</code> elements where
the value of <code>rel</code> attribute is <code>transformation</code>.
This very powerful and flexible mechanism integrates well with
<a class="inform" href="http://microformats.org/wiki/faqs-for-rdf#Are_there_Schemas_for_Microformats.3F">microformat profiles</a><a class="inform" href="#MF-RDF-FAQ">[MF-RDF-FAQ]</a> which overlay the normally semantically-poor HTML markup.</p>
<p>The following diagram illustrates an XFN document<a class="inform"
href="#XFN">[XFN]</a>, <tt>friends.html</tt> associated with the
<tt>grokXFN.xsl</tt> transformation indirectly via an XFN profile.
</p>
<div class="illustration">
<img src="figGleanProfile.png" alt="diagram: transformation linked indirectly via profile" />
<p>indirection via profile</p>
<small>(<a href="figGleanProfile.svg">svg</a>)</small>
</div>
<p>Adding a GRDDL <code>profileTransformation</code> assertion to a
profile document is much like <a href="#ns-bind">adding a
<code>namespaceTransformation</code> assertion to a namespace
document</a>. For a dialect defined by a valid XHTML profile
documents, add
<code>profile="http://www.w3.org/2003/g/data-view"</code> to the
<code>head</code> element and make a link of type
<code>profileTransformation</code> to the transformation of the
dialect.</p>
<p>The general rule is:</p>
<table border="1">
<tr>
<td class="assertion" id="rule_profiletrans">
If
<ul>
<li>an information resource <var>PDOC</var>, identified by an IRI
<var>PNAME</var> has a <b>GRDDL result</b> that includes a triple
whose
<ul>
<li> subject is <var>PDOC</var>, whose</li>
<li>predicate is the property
<tt><http://www.w3.org/2003/g/data-view#profileTransformation></tt>,
and whose</li>
<li>object is <var>TX</var>,</li>
</ul>
</li>
<li>and an information resource
<var>IR</var> has an XML representation with
XPath root node <var>NODE</var> that has a
<a href="#rule_metadata_profile_name">metadata profile name</a>
<var>PNAME</var>,</li>
</ul> then <var>TX</var> is a <b>GRDDL
transformation</b> of <var>NODE</var>.
</td>
<td>
<table class="rule">
<tr><td>
<pre>
?PDOC log:uri ?PNAME;
grddl:result [
log:includes [
rdf:subject ?PDOC;
rdf:predicate grddl:profileTransformation;
rdf:object ?TX]].
?IR log:uri [ fn:doc ?NODE].
?NODE gspec:profileName ?PNAME.
</pre>
</td></tr>
<tr><td><hr /></td></tr>
<tr><td>
<pre>
?NODE grddl:transformation ?TX.
</pre>
</td></tr>
</table>
</td>
</tr>
</table>
</div>
<div><h2 id="txforms"><span class="gen">6. </span>GRDDL Transformations</h2>
<p>As noted above, each GRDDL transformation specifies a
<b>transformation property</b>, a function from XPath document nodes
to RDF graphs. This function need not
be total; it may have a domain smaller than all XML document
nodes. For example, use of <tt>xsl:message</tt> with
<tt>terminate="yes"</tt> may be used to signal that the input is
outside the domain of the transformation.
</p>
<p>Developers of transformations should make available representations
in widely-supported formats. XSLT version 1<a class="inform"
href="#XSLT1">[XSLT1]</a> is the format most widely supported by GRDDL-aware
agents as of this writing, though though XSLT2<a
href="#XSLT2">[XSLT2]</a> deployment is increasing.
While technically Javascript, C, or virtually any other programming
language may be used to express transformations for GRDDL, XSLT is
specifically designed to express XML to XML transformations and has
some good safety characteristics; XQuery has similar characteristics
to XSLT, though use of XQuery in GRDDL implementation is
less widely deployed at the time of this writing.
</p>
<table border="1">
<tr>
<td class="assertion" id="rule_txprop">
If
<ul>
<li><var>RDFXML</var> is the root XPath node of a
<a class="norm" href="http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/#dfn-conforming-rdf-xml-document">conforming RDF/XML document</a><a href="#RDFX">[RDFX]</a>
that represents an RDF Graph <var>G</var>, and</li>
<li><var>R</var> is the root node of some XML document
and <var>TXNODE</var> is the root node of
an XSLT transformation<a class="inform"
href="#XSLT1">[XSLT1]</a>, and</li>
<li><var>RDFXML</var> is the root node of the
XSLT result tree when <var>TXNODE</var>
is applied to <var>R</var>, and</li>
<li><var>TXDOC</var> is an information
resource
with <em>transformation property</em>
<var>TP</var>
represented by an XML document
with root node <var>TXNODE</var>
</li>
</ul>
then <var>TP</var> relates <var>R</var> to <var>G</var>.
</td>
<td>
<table class="rule">
<tr><td>
<pre>
?RDFXML gspec:rdfParse ?G.
(?TXNODE ?R) gspec:resultTree ?RDFXML.
?TXDOC grddl:transformationProperty ?TP;
log:uri [fn:doc ?TXNODE].
</pre>
</td>
</tr>
<tr><td><hr /></td></tr>
<tr>
<td>
<pre>
?R ?TP ?G
</pre>
</td>
</tr>
</table>
</td>
</tr>
</table>
<p>The rule above covers the case of a <em>transformation
property</em> that relates an XPath document node to an RDF graph via
an RDF/XML document. Transformations may use other, unspecified,
mechanisms. For example, see <a
href="http://www.w3.org/2001/sw/grddl-wg/td/grddl-tests#atomttl1">test
<tt>#atomttl1</tt></a>, in which the the <tt>media-type</tt> attribute
of the <tt>xsl:output</tt> element bears a "text/rdf+n3" value to
indicate a media type other than "application/rdf+xml". GRDDL agents
that can process such a media type can then produce an RDF graph in
accordance with the media type. Non-XSLT transforms may indicate the
RDF graph in some other, unspecified, fashion.
</p>
<div class="postponed">
<p>At present, when an information resource
is represented by an XML document, the
corresponding XPath data model may not be fully determined, depending
on, for example, whether an agent elaborates inclusions, parameter
entities, fixed and default attributes, or checks digital signatures.
Put another way, if an author takes responsibility for the information
in an XML document, for what information exactly is the author taking
responsibility? And how can the author ensure that a GRDDL
transformation is able to meet GRDDL's <a href="#sec_rend">Faithful
Rendition assurance</a>?
</p>
<p>This specification is silent on the question of which XML
processors are employed by or for GRDDL-aware agents. Whether or not
processing of XInclude, XML Validity, XML Schema Validity, XML
Signatures or XML Decryption take place
is currently unspecified. However, this specification anticipates that
the resolution of TAG issue
<a href="http://www.w3.org/2001/tag/issues.html?type=1#xmlFunctions-34">
xmlFunctions-34
</a>
and the definition, by the
<a href="http://www.w3.org/XML/Processing/">XML Processing Model Working
Group</a>, of a default processing model will provide further
clarification and guidance, and GRDDL-aware agents are expected to
comply with such guidance if it is issued.
There is no universal expectation that an XSLT
processor will call on such processing before executing a GRDDL
transformation. Therefore, it is suggested that GRDDL transformations
be written so that they perform all expected pre-processing, including
processing of related DTDs, Schemas and namespaces. Such measure can
be avoided for documents which do not require such pre-processing to
yield an infoset that is faithful. That is, for documents which do not
reference XInclude, DTDs, XML Schemas and so on.</p>
<p>
Document authors, particularly XHTML document authors,
who wish their documents to be unambiguous when used with GRDDL
should avoid dependencies on an external <a
href="http://www.w3.org/TR/2006/REC-xml-20060816/#dt-doctype"
>DTD subset</a>;
specifically:
</p>
<ul>
<li>
Explicitly include the XHTML namespace declaration in an XHTML document,
or an appropriate namespace in an XML document.
</li>
<li>
Avoid use of entity references, except those listed
in <a href=
"http://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent">
section 4.6 of the XML specification</a>.
</li>
<li>
And, more generally,
follow the rules
listed for <a
href="http://www.w3.org/TR/2006/REC-xml-20060816/#vc-check-rmd">
the standalone document</a> validity constraint.
</li>
</ul>
<p><cite>XProc: An XML Pipeline Language</cite><a class="inform"
href="#XPROC">[XPROC]</a>, <em>a language for describing operations to
be performed on XML documents,</em> has recently been published as a
W3C Working Draft. It merits consideration for expressing more
complex or sophisticated transformations which require control over
the flow of processing through a variety of XML processing tools.
Using XProc, one could apply a sequence of operations such XInclude,
validation, and transformation to a document, aborting if the result
of an intermediate stage is not valid, for example.</p>
</div>
</div>
<div><h2 id="sec_agt"><span class="gen">7. </span>GRDDL-Aware Agents</h2>
<p class="assertion" id="GRDDL_aware_agent">A <dfn>GRDDL-aware
agent</dfn> is a software module that computes <b>GRDDL results</b> of
information resources.</p>
<p>For example, a SPARQL query service might use a GRDDL-aware agent
for collecting RDF data. Or a Web browser might serve as a GRDDL-aware
agent for the purpose of collecting calendar and contact data. The
appropriate policy, for which results to compute and when, is likely to
involve waiting for a signal from user more in the Web browser case
than in the query service case.
</p>
<div class="assertion" id="agt_obl">
<p>Subject to <a href="#sec">security considerations</a> below and
local policy as expressed in its configuration,
given an information resource <var>IR</var>, and
an XPath node <var>N</var> for a representation of <var>IR</var>,
a GRDDL-aware agent <b>should</b>:
</p>
<ol>
<li>Find each transformation associated with
<var>N</var>, i.e.
<ol>
<li>each transformation associated with <var>N</var> via the
<tt>grddl:transformation</tt> attribute as in the <a
href="#grddl-xml">Adding GRDDL to well-formed XML</a> section
</li>
<li>each transformation associated with <var>N</var> via HTML
links of type <tt>transformation</tt>, provided the document bears
the <tt>http://www.w3.org/2003/g/data-view </tt> profile, as in
the <a href="#grddl-xhtml">Using GRDDL with valid XHTML</a>
section.
</li>
<li>each transformation indicated by any available namespace
document, as in the <a href="#ns-bind">GRDDL for XML
Namespaces</a> section.</li>
<li>each transformation indicated by any XHTML profiles,
as in the <a href="#profile-bind">GRDDL for HTML Profiles</a>
section.
</li>
</ol>
</li>
<li>Selectively apply any or all discovered transformations to
obtain GRDDL results. Note selection may be guided by the agent's
capabilities, local security policies and possibly user/client
intervention.
</li>
<li>Merge those GRDDL results.</li>
</ol>
</div>
<p>Note that discovery by namespace or profile document is recursive;
Loops in the profile/namespace structure should be detected in order to avoid
infinite recursion.</p>
<div><h3 id="extrace">Example: A GRDDL-aware Agent protocol trace</h3>
<p>While this declarative specification of GRDDL allows a variety of
implementation strategies, in this example we trace the behavior
common to a number of typical implementations.</p>
<p>Consider a GRDDL-aware agent that is asked for results from
<tt>http://www.w3.org/2003/g/po-doc.xml</tt>. It starts by
dereferencing that URI, noting that RDF/XML, HTML, and XML are
acceptable representations:</p>
<pre>
[00:00.000 - client connection from 127.0.0.1:39645]
GET <b>http://www.w3.org/2003/g/po-doc.xml</b> HTTP/1.1
Host: www.w3.org
Accept: <b>application/rdf+xml,application/xml,text/xml,application/xhtml+xml,text/html</b>
[00:00.055 - server connected]
HTTP/1.1 200 OK
Last-Modified: Tue, 07 Dec 2004 22:59:02 GMT
Content-Length: 1302
Content-Type: application/xml; qs=0.9
<purchaseOrder orderDate="1999-10-20"
<b>xmlns="http://www.w3.org/2003/g/po-ex"</b>>
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<em>...</em>
</pre>
<p>The XML document that comes back has no explicit transformation markup,
but the rules in <a href="#ns-bind">the XML Namespaces section</a> suggest
looking up results from the namespace document:</p>
<pre>
[00:00.000 - client connection from 127.0.0.1:39647]
GET <b>http://www.w3.org/2003/g/po-ex</b> HTTP/1.1
Host: www.w3.org
Accept: application/rdf+xml,application/xml,text/xml,application/xhtml+xml,text/html
[00:00.051 - server connected]
HTTP/1.1 200 OK
Content-Location: po-ex.xsd
Last-Modified: Tue, 07 Dec 2004 23:18:25 GMT
Content-Length: 2624
Content-Type: application/xml; qs=0.9
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:po="http://www.w3.org/2003/g/po-ex"
targetNamespace="http://www.w3.org/2003/g/po-ex"
elementFormDefault="qualified"
attributeFormDefault="unqualified"
xmlns:data-view="http://www.w3.org/2003/g/data-view#"
data-view:transformation="http://www.w3.org/2003/g/embeddedRDF.xsl"
>
<xs:annotation>
<xs:appinfo>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about="http://www.w3.org/2003/g/po-ex">
<data-view:namespaceTransformation
rdf:resource="grokPO.xsl" />
</rdf:Description>
</rdf:RDF>
</xs:appinfo>
</xs:annotation>
<em>...</em>
</pre>
<p>We don't yet have a result in the form of an RDF/XML document,
but this time we find an explicit <tt>transformation</tt>
attribute in the GRDDL namespace, so we follow that link,
noting that we accept XML representations:</p>
<pre>
00:00.000 - client connection from 127.0.0.1:39649]
GET <b>http://www.w3.org/2003/g/embeddedRDF.xsl</b> HTTP/1.1
Host: www.w3.org
Accept: <b>application/xml</b>
[00:00.054 - server connected]
HTTP/1.1 200 OK
Last-Modified: Wed, 23 Mar 2005 18:49:12 GMT
Content-Length: 797
Content-Type: application/xml; qs=0.9
<xsl:transform
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
<em>...</em>
</pre>
<p>Applying that transformation yields...</p>
<pre>
<rdf:RDF
xmlns:data-view="http://www.w3.org/2003/g/data-view#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
<rdf:Description rdf:about="http://www.w3.org/2003/g/po-ex">
<data-view:namespaceTransformation rdf:resource="http://www.w3.org/2003/g/grokPO.xsl"/>
</rdf:Description>
</rdf:RDF>
</pre>
<p>... which tells us that <tt>.../grokPO.xsl</tt> is a transformation for
all documents in the <tt>.../po-ex</tt> namespace.</p>
<p>Continuing recursively, we examine the namespace document
for <tt>po-ex.xsd</tt>. As this is a well-known namespace document,
following the <a href="#sec">Security considerations section</a>,
we note the last modified date of our cached copy in the request,
and the origin server lets us know that our copy is current:
</p>
<pre>
[00:00.000 - client connection from 127.0.0.1:39651]
GET http://www.w3.org/2001/XMLSchema HTTP/1.1
Host: www.w3.org
Accept: application/rdf+xml,application/xml,text/xml,application/xhtml+xml,text/html
<b>If-modified-since: Fri, 16 Dec 2005 14:19:38 GMT</b>
[00:00.047 - server connected]
HTTP/1.1 304 Not Modified
Content-Location: XMLSchema.html
Expires: Wed, 07 Feb 2007 15:09:29 GMT
Cache-Control: max-age=21600
Vary: negotiate, accept, accept-charset
</pre>
<p>Since our cached copy of the XML Schema namespace document
shows no associated GRDDL transformation, we return
to the namespace transformation from <tt>po-ex</tt>,
i.e. <tt>grokPO.xsl</tt>:</p>
<pre>
[00:00.000 - client connection from 127.0.0.1:39653]
GET http://www.w3.org/2003/g/grokPO.xsl HTTP/1.1
Host: www.w3.org
Accept: application/xml
[00:00.048 - server connected]
HTTP/1.1 200 OK
Last-Modified: Tue, 07 Dec 2004 23:33:28 GMT
Content-Length: 1739
Content-Type: application/xml; qs=0.9
<xsl:transform
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:po="http://www.w3.org/2003/g/po-ex"
xmlns:poF="http://www.w3.org/2003/g/po-ex#"
>
<xsl:output method="xml" indent="yes" />
<div xmlns="http://www.w3.org/1999/xhtml">
<h1>grokPO.xsl -- interpret purchase order format as RDF</h1>
<em>...</em>
</pre>
<p>Applying this transformation to <tt>po-doc.xml</tt> yields RDF/XML;
we parse this to an RDF graph (using the URI of the source document,
<tt>http://www.w3.org/2003/g/po-doc.xml</tt>, as the base URI) and
return the graph as a GRDDL result of <tt>po-doc.xml</tt>:</p>
<pre>
<rdf:RDF
xmlns:poF="http://www.w3.org/2003/g/po-ex#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
<rdf:Description rdf:nodeID="hOhqYGhx9">
<poF:city>Mill Valley</poF:city>
<poF:state>CA</poF:state>
<poF:zip>90952</poF:zip>
<poF:street>123 Maple Street</poF:street>
<poF:name>Alice Smith</poF:name>
</rdf:Description>
<em>...</em>
</pre>
<p>HTTP trace data was collected via <a
href="http://hathawaymix.org/Software/TCPWatch">TCPWatch</a> by Shane
Hathaway. For more details, see <a
href="http://www.w3.org/2001/sw/grddl-wg/td/testlist1#http_tracing">HTTP
tracing in the GRDDL test materials</a>.</p>
</div>
</div>
<div>
<h2 id="sec"><span class="gen">8. </span>Security considerations</h2>
<p>The execution of general-purpose programming languages as
interpreters for transformations exposes serious security risks.
Designers of GRDDL-aware agents are advised to guard against simply
sending GRDDL transformations to "off-the-shelf" interpreters. While
it is usually safe to pass documents from trusted sources through a
GRDDL transformation, implementors should consider all of the
following before adding the ability to execute arbitrary GRDDL
transformations linked from arbitrary Web documents.</p>
<p>GRDDL, like many Web technologies, fundamentally relies on the dereferencing of URIs.
Writers of GRDDL transformations are advised against employing URL operations
which are potentially dangerous, because these operations are more likely to be
unavailable in secure GRDDL implementations. Software executing GRDDL transformations
are advised to either completely disable all potentially dangerous URL operations or
take special care not to delegate any special authority to their operation. In particular,
operations to read or write URLs are more safely executed with the privileges associated
with an untrusted party, rather than the current user. Such disabling and/or checking
should be done completely outside of the reach of the transformation language itself;
care should be taken to insure that no method exists for re-enabling full-function versions
of these operators.</p>
<p>The remainder of this section outlines some, though probably not
all, of the possible problems with the execution of GRDDL transformations,
with particular reference to transformations in XSLT.</p>
<ol>
<li>With unconstrained use of GRDDL, untrusted
transformations may access URLs which the end-user has read or write
permission, while the author of the transformation does not. This is
particularly pertinent for URLs from the file: scheme; but many other
schemes are also impacted. The untrusted code may, having read
documents which the author did not have permission to access, transmit
the content of the documents, to arbitrary Web servers by encoding the
contents within a URL, that may be passed to the server.
</li>
<li>Dangerous operations in the XSLT language include, but may not be
limited to, the operations involving getting a URL:
<tt>document()</tt>, <tt>doc()</tt>, <tt>unparsed-text()</tt> and
<tt>unparsed-text-available()</tt>, and <tt>xsl:result-document</tt>
which involves writing to a URL. <tt>xsl:include</tt> and
<tt>xsl:import</tt> present fewer risks if they are processed before
execution of the transformation, rather than during it.
</li>
<li>Some transformation language implementations may provide facilities for loading
and executing other programming language code. For example,
an XSLT implementation may provide a method for executing Java code.
Such facilities are obviously open to abuse.
Designers of GRDDL transformations are advised against making use of
such features. Besides being implementation-specific, they are more likely to be
unavailable in secure implementations of the transformation language. The use of
such operators in software executing GRDDL transformations should protect against
such operators in case they are encountered.</li>
<li>XSLT implementations often provide their own extensions.
Designers of GRDDL transformations are advised not make use of extensions
because they are not guaranteed to be present in all implementations.
Software executing GRDDL transformations should make sure that extensions
are secure and do not present any kind of threat.
</li>
<li>Since it is possible to write transformations that inordinately consume system resources
or that loop indefinitely. Both types of transformations have the potential to cause damage
if sent to unsuspecting recipients. Designers of GRDDL transformations are advised
to avoid the construction and dissemination of such transformations.
Software executing GRDDL transformations should provide appropriate mechanisms
to abort processing after a reasonable amount of time has elapsed. In addition,
GRDDL software should be limited to the consumption of only a reasonable amount
of any given system resource.</li>
<li>Finally, bugs may exist in some interpreters of a transformation language which
might be exploited to gain unauthorized access to a recipient's system.
Apart from noting this possibility, no specific action is advised to take to prevent this
aside from timely correction of such bugs as they are discovered.
</li>
</ol>
</div>
<div><h2 id="grddlvocab"><span class="gen">9.</span> The GRDDL Vocabulary</h2>
<p>The following is excerpted from the GRDDL profile/namespace
document:</p>
<blockquote>
<p>This document, <a rel="ns-claim" href="http://www.w3.org/2003/g/data-view">http://www.w3.org/2003/g/data-view</a>,
is a metadata profile in the sense of the HTML specification, in section
<a href="/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3">7.4.4.3 Meta data profiles</a>.</p>
<p>The following term is introduced here as an XHTML link relationship
name and RDF property name:</p>
<ul>
<li id="transformation" class="-rdf-Property">
<tt class="rdfs-label">transformation</tt>: <span
class="rdfs-comment">relates a source document to a
transformation, usually represented in <a
href="/TR/xslt">XSLT</a>, that relates the source document syntax
to the RDF graph syntax</span>. domain: <a rel="rdfs-domain"
href="#RootNode">RootNode</a>; range: <a
rel="rdfs-range" href="#Transformation">Transformation</a>
</li>
</ul>
<p>The following terms are introduced here as RDF properties:</p>
<ul>
<li id="namespaceTransformation" class="-rdf-Property">
<tt class="rdfs-label">namespaceTransformation</tt>: <span
class="rdfs-comment">relates a namespace to a transformation for
all documents in that namespace</span>. range: <a
rel="rdfs-range" href="#Transformation">Transformation</a>
</li>
<li id="profileTransformation" class="-rdf-Property">
<tt class="rdfs-label">profileTransformation</tt>: <span
class="rdfs-comment">relates a profile document to a
transformation for all documents bearing that profile</span>.
range: <a rel="rdfs-range"
href="#Transformation">Transformation</a>
</li>
<li id="result" class="-rdf-Property">
<tt class="rdfs-label">result</tt>: <span class="rdfs-comment">an
RDF graph obtained from an information resource by directly
parsing a representation in the standard RDF/XML syntax or
indirectly by parsing some other dialect using a transformation
nominated by the document</span>. domain: <a rel="rdfs-domain"
href="#InformationResource">InformationResource</a>; range: <a
rel="rdfs-range" href="#RDFGraph">RDFGraph</a>
</li>
<li id="transformationProperty" class="-owl-FunctionalProperty">
<tt class="rdfs-label">transformationProperty</tt> <span
class="rdfs-comment">relates a transformation to the algorithm
specified by the property that computes an RDF graph from an XML
document node</span> domain: <a rel="rdfs-domain"
href="#Transformation">Transformation</a> range: <a
rel="rdfs-range"
href="#TransformationProperty">TransformationProperty</a>
</li>
<li id="Transformation" class="-rdfs-Class">
<tt class="rdfs-label">Transformation</tt> <span
class="rdfs-comment">an <a rel="rdfs-subClassOf"
href="#InformationResource">InformationResource</a> that specifies
a transformation from a set of XML documents to RDF graphs</span>
Each Transformation has at least one <a rel="owl-onProperty"
href="#transformationProperty">transformationProperty</a> that is
a <a rel="owl-someValuesFrom"
href="#TransformationProperty">TransformationProperty</a>.
</li>
<li id="TransformationProperty" class="-rdfs-Class">
<tt class="rdfs-label">TransformationProperty</tt>
<span class="rdfs-comment">a <a rel="rdfs-subClassOf"
href="http://www.w3.org/2002/07/owl#FunctionalProperty"
>FunctionalProperty</a> that relates
<a href="#RootNode">XML document root nodes</a> to
<a href="#RDFGraph">RDF graphs</a></span>
</li>
</ul>
<p>The following terms are bound to concepts from existing standards:</p>
<ul>
<li id="RootNode" class="-rdfs-Class">
<tt class="rdfs-label">RootNode</tt> <span
class="rdfs-comment">the root of the tree in the XPath data
model</span>, per <a rel="rdfs-isDefinedBy"
href="http://www.w3.org/TR/1999/REC-xpath-19991116#root-node">section
5.1 Root Node in <cite>XML Path Language (XPath) Version
1.0</cite></a>
</li>
<li id="RDFGraph" class="-rdfs-Class">
<tt class="rdfs-label">RDFGraph</tt> <span class="rdfs-comment">a
set of RDF triples</span>, per <a rel="rdfs-isDefinedBy"
href="http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#dfn-rdf-graph">definition
in <cite>Resource Description Framework (RDF): Concepts and
Abstract Syntax</cite></a>
</li>
<li id="InformationResource" class="-rdfs-Class">
<tt class="rdfs-label">InformationResource</tt>
<span class="rdfs-comment">A resource which has the property that all of its essential characteristics can be conveyed in a message</span>, per <a rel="rdfs-isDefinedBy" href="http://www.w3.org/TR/2004/REC-webarch-20041215/#def-information-resource">definition in <cite>Architecture of the World Wide Web, Volume One</cite></a>
</li>
</ul>
</blockquote>
<p>The namespace document includes RDF data about the terms in the
GRDDL Vocabulary, but these RDF data do not include any triples whose
predicate is <tt>grddl:profileTransformation</tt>.</p>
<p>In the section on <a href="#ns-bind">Using GRDDL with XML Namespace
Documents</a>, only explicit <tt>grddl:namespaceTransformation</tt>
triples satisfy the premise of the rule. Likewise,
<tt>grddl:profileTransformation</tt> triples must be explicit in the
GRDDL result of a profile document in order to satisfy the premise of
the rule in the section on and on <a href="#profile-bind">GRDDL for
HTML Profiles</a>. Authors of GRDDL source documents are advised
against using RDFS or OWL expressions which imply such triples but do
not explicitly state them.
</p>
</div>
<div><h2 id="bib"><span class="gen">10. </span>References</h2>
<h3 id="normativeRefs">Normative References</h3>
<dl class="bib">
<dt id="rfc3987">RFC3987</dt>
<dd><cite><a href="http://www.ietf.org/rfc/rfc3987.txt">Internationalized Resource Identifiers (IRIs)</a></cite> Internet RFC 3987 January 2005. Duerst, Suignard
</dd>
<dt id="rfc3986">RFC3986</dt>
<dd><cite><a href="http://www.apps.ietf.org/rfc/rfc3986.html">Uniform Resource Identifier (URI): Generic Syntax</a></cite> Internet RFC3986 January 2005. Berners-Lee, Fielding, Masinter
</dd>
<dt>
<a name="WEBARCH" id="WEBARCH">WEBARCH</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2004/REC-webarch-20041215/">Architecture of the World Wide Web, Volume One</a>
</cite>, N. Walsh, I. Jacobs, Editors, W3C Recommendation, 15 December 2004, http://www.w3.org/TR/2004/REC-webarch-20041215/ . <a href="http://www.w3.org/TR/webarch/">Latest version</a> available at http://www.w3.org/TR/webarch/ .</dd>
<dt>
<a name="RDFC04" id="RDFC04">RDFC04</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/">Resource Description Framework (RDF): Concepts and Abstract Syntax</a>
</cite>, G. Klyne, J. J. Carroll, Editors, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ . <a href="http://www.w3.org/TR/rdf-concepts/">Latest version</a> available at http://www.w3.org/TR/rdf-concepts/ .</dd>
<dt>
<a name="RDF-MT" id="RDF-MT">RDF-MT</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2004/REC-rdf-mt-20040210/">RDF Semantics</a>
</cite>, P. Hayes, Editor, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ . <a href="http://www.w3.org/TR/rdf-mt/" title="Latest version of RDF Semantics">Latest version</a> available at http://www.w3.org/TR/rdf-mt/ .</dd>
<dt id="RDFX">RDFX</dt>
<dd>
<cite>
<a href=
"http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">RDF/XML
Syntax Specification (Revised)</a></cite>, D. Beckett, Editor, W3C
Recommendation, 10 February 2004,
http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/ . <a
href="http://www.w3.org/TR/rdf-syntax-grammar" title="Latest
version of RDF/XML Syntax Specification (Revised)">Latest
version</a> available at http://www.w3.org/TR/rdf-syntax-grammar .
</dd>
<dt>
<a name="XMLBASE" id="XMLBASE">XMLBASE</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2001/REC-xmlbase-20010627/">XML Base</a>
</cite>, J. Marsh, Editor, W3C Recommendation, 27 June 2001, http://www.w3.org/TR/2001/REC-xmlbase-20010627/ . <a href="http://www.w3.org/TR/xmlbase/" title="Latest version of XML Base">Latest version</a> available at http://www.w3.org/TR/xmlbase/ .</dd>
<dt>
<a name="XHTML" id="XHTML">XHTML</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/">Modularization of XHTML™</a>
</cite>, S. Schnitzenbaumer, F. Boumphrey, T. Wugofski, S. McCarron, M. Altheim, S. Dooley, Editors, W3C Recommendation, 10 April 2001, http://www.w3.org/TR/2001/REC-xhtml-modularization-20010410/ . <a href="http://www.w3.org/TR/xhtml-modularization/">Latest version</a> available at http://www.w3.org/TR/xhtml-modularization/ .</dd>
<dt>
<a name="HTML4" id="HTML4">HTML4</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/1999/REC-html401-19991224">HTML 4.01 Specification</a>
</cite>, D. Raggett, A. Le Hors, I. Jacobs, Editors, W3C Recommendation, 24 December 1999, http://www.w3.org/TR/1999/REC-html401-19991224 . <a href="http://www.w3.org/TR/html401">Latest version</a> available at http://www.w3.org/TR/html401 .</dd>
<dt id="XPATH">XPATH</dt>
<dd>
<cite><a href="http://www.w3.org/TR/1999/REC-xpath-19991116">XML
Path Language (XPath) Version 1.0</a> </cite>, J. Clark,
S. J. DeRose, Editors, W3C Recommendation, 16 November 1999,
http://www.w3.org/TR/1999/REC-xpath-19991116 . <a
href="http://www.w3.org/TR/xpath" title="Latest version of XML Path
Language (XPath) Version 1.0">Latest version</a> available at
http://www.w3.org/TR/xpath .
</dd>
<dt>
<a name="XSLT1" id="XSLT1">XSLT1</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/1999/REC-xslt-19991116">XSL Transformations (XSLT) Version 1.0</a>
</cite>, J. Clark, Editor, W3C Recommendation, 16 November 1999, http://www.w3.org/TR/1999/REC-xslt-19991116 . <a href="http://www.w3.org/TR/xslt">Latest version</a> available at http://www.w3.org/TR/xslt .</dd>
</dl>
<h3 id="informativeRefs">Informative references</h3>
<p>The following documents provide additional background but are not
part of this specification.</p>
<dl class="bib">
<dt>
<a name="primer" id="primer">primer</a>
</dt>
<dd>
<cite>
<a
href="http://www.w3.org/TR/2006/WD-grddl-primer-20061002/">GRDDL
Primer</a>
</cite>, I. Davis, Editor, W3C Working Draft (work in progress), 2 October 2006, http://www.w3.org/TR/2006/WD-grddl-primer-20061002/ . <a href="http://www.w3.org/TR/grddl-primer/"
title="Latest version of GRDDL Primer">Latest version</a> available at http://www.w3.org/TR/grddl-primer/ .
</dd>
<dt>
<a name="usecases" id="usecases">usecases</a>
</dt>
<dd>
<cite>
<a
href="http://www.w3.org/TR/2007/NOTE-grddl-scenarios-20070406/">GRDDL
Use Cases: Scenarios of extracting RDF data from XML
documents</a> </cite>, F. Gandon, Editor, W3C Working Group
Note, 6 April 2007,
http://www.w3.org/TR/2007/NOTE-grddl-scenarios-20070406/ . <a
href="http://www.w3.org/TR/grddl-scenarios/" title="Latest
version of GRDDL Use Cases: Scenarios of extracting RDF data
from XML documents">Latest version</a> available at
http://www.w3.org/TR/grddl-scenarios/ .
</dd>
<dt>
<a name="GRDDL-TESTS" id="GRDDL-TESTS">GRDDL-TESTS</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2007/REC-grddl-tests-20070911/">GRDDL Test Cases</a>
</cite>, C. Ogbuji, Editor, W3C Recommendation, 11 September 2007, http://www.w3.org/TR/2007/REC-grddl-tests-20070911/ . <a href="http://www.w3.org/TR/grddl-tests/"
title="Latest version of GRDDL Test Cases">Latest version</a> available at http://www.w3.org/TR/grddl-tests/ .</dd>
<dt id="SPARQL">SPARQL</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/">SPARQL Query Language for RDF</a>
</cite>, E. Prud'hommeaux, A. Seaborne, Editors, W3C Working Draft (work in progress), 26 March 2007, http://www.w3.org/TR/2007/WD-rdf-sparql-query-20070326/ . <a href="http://www.w3.org/TR/rdf-sparql-query/"
title="Latest version of SPARQL Query Language for RDF">Latest version</a> available at http://www.w3.org/TR/rdf-sparql-query/ .</dd>
<dt>
<a name="XSLT2" id="XSLT2">XSLT2</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2007/REC-xslt20-20070123/">XSL Transformations (XSLT) Version 2.0</a>
</cite>, M. Kay, Editor, W3C Recommendation, 23 January 2007, http://www.w3.org/TR/2007/REC-xslt20-20070123/ . <a href="http://www.w3.org/TR/xslt20"
title="Latest version of XSL Transformations (XSLT) Version 2.0">Latest version</a> available at http://www.w3.org/TR/xslt20 .</dd>
<dt id="RFC2731">RFC2731</dt>
<dd>J. Kunze <cite><a
href="http://www.ietf.org/rfc/rfc2731.txt">Encoding Dublin Core
Metadata in HTML</a></cite> in 1999</dd>
<dt id="XFN">XFN</dt>
<dd>
<cite><a href="http://gmpg.org/xfn/intro">XFN: Introduction and
Examples</a>
</cite>
copyright GMPG 2003-2007. Eric, Tantek, and Matt
</dd>
<dt id="DCRDF">DCRDF</dt>
<dd><cite><a href="http://dublincore.org/documents/2002/07/31/dcmes-xml/">Expressing Simple Dublin Core in RDF/XML</a></cite>
Beckett, Miller, Brickley 2002-07-31</dd>
<dt>
<a name="P3P" id="P3P">P3P</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2002/REC-P3P-20020416/">The Platform for Privacy Preferences 1.0 (P3P1.0)
Specification</a>
</cite>, M. Marchiori, Editor, W3C Recommendation, 16 April 2002, http://www.w3.org/TR/2002/REC-P3P-20020416/ . <a href="http://www.w3.org/TR/P3P/">Latest version</a> available at http://www.w3.org/TR/P3P/ .</dd>
<dt>
<a name="STYPI" id="STYPI">STYPI</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/1999/06/REC-xml-stylesheet-19990629">Associating Style Sheets with XML documents</a>
</cite>, J. Clark, Editor, W3C Recommendation, 29 June 1999, http://www.w3.org/1999/06/REC-xml-stylesheet-19990629 . <a href="http://www.w3.org/TR/xml-stylesheet"
title="Latest version of Associating Style Sheets with XML documents">Latest version</a> available at http://www.w3.org/TR/xml-stylesheet .</dd>
<dt>
<a name="XPROC" id="XPROC">XPROC</a>
</dt>
<dd>
<cite>
<a href="http://www.w3.org/TR/2006/WD-xproc-20060928/">XProc: An XML Pipeline Language</a>
</cite>, N. Walsh, Editor, W3C Working Draft (work in progress), 28 September 2006, http://www.w3.org/TR/2006/WD-xproc-20060928/ . <a href="http://www.w3.org/TR/xproc/"
title="Latest version of XProc: An XML Pipeline Language">Latest version</a> available at http://www.w3.org/TR/xproc/ .</dd>
<dt id="MF-RDF-FAQ">MF-RDF-FAQ</dt>
<dd><cite><a href="http://microformats.org/wiki/faqs-for-rdf"> Microformat FAQs for RDF Fans</a></cite>, last modified 17:57, 30 May 2006</dd>
</dl>
</div>
<div><h2 id="stylepi">Appendix: Transformations for Styling versus data extraction (Informative)</h2>
<p>The xml-stylesheet processing instruction<a class="inform"
href="#STYPI">[STYPI]</a> is generally deployed for automated
presentation processing. This type of link is different from links to
GRDDL transformation algorithms, which are intended to facilitate
extracting data. Also, parsing the content of processing instructions
is not supported by XML tools such as XSLT processors, and grounding
processing instructions in URI space is not as straightforward as
using namespaces with attributes.
</p>
</div>
<div>
<h2 id="base_misc">Appendix: Base IRI considerations</h2>
<p>
In the <a href="#grddl-xml">Adding GRDDL to well-formed XML</a> section,
we have:
</p>
<blockquote>
<p>
The base IRI for interpretting relative IRI references in a
serialization of a graph produced by a GRDDL transformation
is the base IRI of the source document.
</p>
</blockquote>
<p>
This corresponds to RFC 3986, particularly
<a href="http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1">section 5.1</a>,
which illustrates the identification of a base URI, with the following picture:
</p>
<pre>
.----------------------------------------------------------.
| .----------------------------------------------------. |
| | .----------------------------------------------. | |
| | | .----------------------------------------. | | |
| | | | .----------------------------------. | | | |
| | | | | <relative-reference> | | | | |
| | | | `----------------------------------' | | | |
| | | | (<a href="http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1.1">5.1.1</a>) Base URI embedded in content | | | |
| | | `----------------------------------------' | | |
| | | (<a href="http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1.2">5.1.2</a>) Base URI of the encapsulating entity | | |
| | | (message, representation, or none) | | |
| | `----------------------------------------------' | |
| | (<a href="http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1.3">5.1.3</a>) URI used to retrieve the entity | |
| `----------------------------------------------------' |
| (<a href="http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1.4">5.1.4</a>) Default Base URI (application-dependent) |
`----------------------------------------------------------'
</pre>
<p>
During typical GRDDL processing, an intermediate RDF/XML serialization is
produced as the output of a transform.
To convert this serialization into an RDF graph, any relative references
in the serialization are resolved to IRIs. To identify
the appropriate base IRI for resolving a given relative reference,
first
check for a base URI embedded within this RDF/XML,
following XML Base, as permitted by RDF Syntax.
If there is no base URI embedded within this RDF/XML, then section
5.1.2 of RFC 3986 may apply, because the <em>encapsulating entity</em>
of this serialization is the root element of the input document. If
this element does not define a base URI, then its encapsulating
entity, the input document, may define a base IRI.
</p>
<p>
The original document may be an XHTML family document, or
it may be some other XML document.
</p>
<h3 id="base_xhtml">The Base IRI of an XHTML Family document</h3>
<p>For an XHTML family document,
the base IRI of the input document may be specified as the value
of the <code>href</code> attribute of the <code><base></code>
element (if any).
This is in accordance with section 5.1.1 of RFC 3986.
</p>
<p>
In many other cases, section 5.1.2 does not apply, and section 5.1.3
does apply.
Section 5.1.3 specifies the use of
the retrieval IRI as the base IRI.
Furthermore,
<a href="http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1.3">section 5.1.3</a>
of RFC 3986 specifies that:
</p>
<blockquote>
<p>
if the retrieval was the result of a redirected request,
the last URI used (i.e., the URI that
resulted in the actual retrieval of the representation)
is the base URI.
</p>
</blockquote>
<p>
The resulting IRI is used as the base IRI parameter for processing
the intermediate RDF/XML serialization.
</p>
<h3 id="base_other_xml">The base IRI of other XML documents</h3>
<p>Other XML documents may use XML Base.
This is only recommended when the specific document format
permits the use of XML Base.
</p>
<p>When an <code>xml:base</code> attribute is present
on the root element of an XML document, this
specifies the base IRI for that document,
following section 5.1.1 of RFC 3986.
</p>
<p>When there is no <code>xml:base</code> attribute
on the root element, even if there is such an attribute on
a descendent element, then section 5.1.1 of RFC 3986 does not apply.
</p>
<p>
As in the XHTML case, we then have to consider sections
5.1.2, 5.1.3 and 5.1.4 of RFC 3986.
</p>
<p>
Of these, sections 5.1.3 is the most common case,
and the note about redirected retrieval also applies.
</p>
<h3 id="pipeline">The base IRI in a processing pipeline</h3>
<p>
A GRDDL aware agent computes GRDDL results when
</p>
<blockquote>
<p>
given a URI <var>I</var> of an information resource <var>IR</var>, and
an XPath node <var>N</var> for a representation of <var>IR</var>
</p>
</blockquote>
<p>
To use a GRDDL aware agent in a processing pipeline,
as well as the XPath node <var>N</var>, it is also necessary
to specify a corresponding IRI <var>I</var>.
This is used as the base IRI when the other mechanisms
do not apply.
This corresponds to section 5.1.4 of RFC 3986.
It is even possible for the default IRI used to bear
no relationship with the XPath node <var>N</var>,
but in such a case, we
<a href="http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1.4">read</a>:
</p>
<blockquote>
<p>
As this definition is necessarily application-dependent, failing to define a base URI by using one of the other methods may result in the same content being interpreted differently by different types of applications.
</p>
<p>
A sender of a representation containing relative references is responsible for ensuring that a base URI for those references can be established.
</p>
</blockquote>
<h3 id="rcpbase">Responsibilities for correct processing of base IRIs</h3>
<h4 id="bdoc_auth">Document authors, including profile and namespace documents</h4>
<p>Document authors should, in general, include a base URI
if the document is retrievable from some other URI.</p>
<p>For an XHTML family document<a href="#XHTML">[XHTML]</a>, this is done using the <code>base</code> element.</p>
<p>For other XML documents, if the format supports <code>xml:base</code>
then this should be used. In general, experience suggests that there is
least confusion when this is done on the root element.
Document authors may also use <code>xml:base</code> attributes
elsewhere in their documents, as permitted by the document format,
with semantics as defined by XML Base<a href="#XMLBASE">[XMLBASE]</a>.
</p>
<p>For XML documents in formats that do not support <code>xml:base</code>,
and are not XHTML family documents, there is no support in GRDDL for
specifying an in-line base URI.</p>
<p>When a profile or namespace document can be accessed via multiple URIs,
for instance by a redirect, document authors should, in general,
provide a GRDDL result that specifies profile transformations or
namespace transformations for each of these URIs.
</p>
<h4 id="base_agt">GRDDL aware agents</h4>
<p>
When a GRDDL result represented in RDF/XML
using the <a href="#rule_rdfxbase">rule for RDF/XML</a>,
a base URI may be needed for this representation, in order to convert it
into a RDF Graph, following the rules in the RDF/XML Syntax Specification<a href="#RDFX">[RDFX]</a>.
</p>
<p>
GRDDL results represented in other ways may also need a base URI.
</p>
<p>
Following the analysis above, a base URI for resolving a relative
reference is defined by following section 5.1 of RFC 3986.</p>
<p>
In many applications, it is highly undesirable
that GRDDL results may depend on an application default URI,
from section 5.1.4 of RFC 3986, ; some GRDDL
aware agents may treat this possibility as an error.
</p>
<h4 id="base_auth">GRDDL transformation authors</h4>
<p>
In general, when writing a GRDDL transformation for
an XHTML family document to RDF/XML the best advice is to ignore
issues to do with the base URI.
The easiest approach is to produce relative URIs in the output,
corresponding to any relative URIs in the input,
and absolute URIs corresponding to any concepts built into
the transform.
Such relative URIs will be resolved, during the processing
performed by a GRDDL aware agent, against the correct base URI.
</p>
<p>
When writing a GRDDL transformation for an XML document format
that does not support xml:base, and has no means to represent
an in-line base URI, there is little choice but to ignore issues
of the correct base.
</p>
<p>
When writing a GRDDL transformation for an XML document format,
other than an XHTML family document,
that does not support xml:base, but has some other means to represent
an in-line base URI, then a GRDDL aware agent will be ignorant
of this means, and a well-written GRDDL transformation will attempt
to correct for this. When a base URI is specified in such a way,
one approach is to insert the base URI into the RDF/XML output as
the value of an <code>xml:base</code> attribute, so that
the RDF/XML parser will resolve relative URIs against that base,
and ignore the base URI passed by the GRDDL aware agent, which
will have been computed ignoring the conventions specific to this format.
</p>
<p>
When writing a GRDDL transformation for an XML document format
that does support xml:base, then it must be remembered that
a GRDDL aware agent
has responsibility to handle an xml:base on the root element.
If there is such an xml:base attribute, then the simplest
behaviour for a GRDDL transformation, is to ignore it.
</p>
<p>
However, other xml:base attributes, not on the root element,
are the responsibility of the transform, since the GRDDL aware
agent ignores these.
Thus, these lower level xml:base attributes should be honored,
most easily by copying them into the output graph
in the appropriate place.
However, in general, xml:base attributes on ancestor nodes
also have to be taken into account, unless there is an intervening
xml:base attribute with an absolute URI as its value.
This is clearly non-trivial to get right: to assist,
the GRDDL library provides a module to be imported into your stylesheet,
see below.
</p>
<p>
In all cases,
while often unnecessary,
if a transform is aware of an absolute
base URI, specified in its input, for the whole document,
it is never incorrect to use this base URI as the base URI for
the output, for example, by adding an appropriate <code>xml:base</code>
attribute to the <code>rdf:RDF</code> element.
</p>
<p>
Transforms that do this need to guard against the possibly incorrect
similar treatment of relative base URIs. For example a
<code>xml:base=".."</code> on the root element might, in the
interaction between a correct GRDDL aware agent and a poorly written
transform, be applied twice, resulting in relative references being
resolved at the wrong level in the directory hierarchy.
</p>
</div>
<div class="changes">
<h2 id="changes">Acknowledgements and Change History</h2>
<p>A companion <cite><a
href="http://www.w3.org/2004/01/rdxh/specbg.html">GRDDL design history
and rationale</a></cite> discusses this design in the context of HTML,
PICS, and RDF since about 1997. The editor greatfully acknowledges the
many contributions of community members in the development of
GRDDL:</p>
<ul>
<li>In Dec 2000, Ann Navarro raised the <a
href="http://www.w3.org/2000/03/rdf-tracking/#faq-html-compliance">faq-html-compliance</a>
issue: <q>The suggested way of including RDF meta data in HTML is
not compliant with HTML 4.01 or XHTML</q>; in Apr 2001, Lee Jonas
raised issue <a
href="http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf">rdfms-validating-embedded-rdf</a>:
<q>RDF embedded in XHTML and other XML documents is hard to
validate</q>.</li>
<li>In May 2003, Joseph Reagle convened a task force with a a <a
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003May/0001.html">Kickoff
of public-rdf-in-xhtml-tf@w3.org</a> message. Dan Connolly
sent a <a
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2003May/0002.html">relational data views of XHTML via XSLT</a> design sketch.
</li>
<li>In Nov 2003, <a href="/People/Dom/">Dominique
Hazaël-Massieux</a> wrote <cite><a
href="/2003/11/rdf-in-xhtml-proposal">An RDF-in-XHTML Proposal</a></cite>,
a predecessor of this spec.
</li>
<li>In Jan 2004, Dan Connolly integrated that draft into this one
and sent <a
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2004Jan/0011.html">a
message calling for review</a>. Discussion with Tim Berners-Lee
led to generalizing from XHTML to all of XML and to
indirection via namespace/profile document.</li>
<li>In February 2004, the RDF Core specifications became W3C
Recommendations; the issues <a
href="http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf">rdfms-validating-embedded-rdf</a>
and <a
href="http://www.w3.org/2000/03/rdf-tracking/#faq-html-compliance">faq-html-compliance</a>
were postponed.</li>
<li>A <a href="/TR/2004/NOTE-grddl-20040413/">13 April 2004 snapshot</a>
was published as a W3C Coordination Group Note to faciliate
exchange between the Semantic Web
Best Practices and Deployment Working
Group and the HTML Working Group.
</li>
<li>Also in February 2004, Connolly presented to the TAG a <a
href="http://www.w3.org/2004/01/rdxh/specbg.html">GRDDL design
history and rationale</a> which discusses contribution of this
design to Web Architecture issues such as <a
href="http://www.w3.org/2001/tag/issues.html?type=1#RDFinXHTML-35">RDFinXHTML-35</a>
and <a
href="http://www.w3.org/2001/tag/issues.html?type=1#namespaceDocument-8">namespaceDocument-8</a>.
Feedback from Norm Walsh has been valuable, and Noah Mendelsohn
noted a connection to the <cite>Cambridge Communiqué</cite> in a <a
href="http://lists.w3.org/Archives/Public/www-tag/2005Mar/0090.html">message
of 22 March</a>.
</li>
<li>Ben Adida started contributing use cases from Creative Commons
in a <a href="http://www.w3.org/2004/03/04-SWBPD">March 2004 meeting
of the Semantic Web Best Practices & Deployment Working
Group</a>.</li>
<li>A <a
href="http://www.w3.org/TeamSubmission/2005/SUBM-grddl-20050516/">16
May 2005 snapshot</a> was published as a W3C Team Submission by Dom
and Dan.</li>
<li>In a <a
href="http://esw.w3.org/topic/SwigAtTp2006">March 2006 Semantic
Web Interest Group meeting</a>, Murray Maloney took and
interest in the connection with XML Schemas and the readability of
the specification, Brian McBride demonstrated some related
implementation experience with transforming documents to RDF,
and Ian Davis contributed the eRDF use case and profile.
</li>
</ul>
<p>The GRDDL Working Group convened August 2006 with Harry Halpin as
chair and several of the contributors and implementors above
participating, plus Chimezie Ogbuji, Fabien Gandon, Brian Suda, and
Rachel Yager.</p>
<p>Jeremy Carroll provided detailed security considerations based on
<a class="inform" href="http://www.faqs.org/rfcs/rfc2046.html">RFC
2046</a> and implemented the HTTP header linking as proposed by Ian
Davis.</p>
<p>The Working Group published a <a
href="http://www.w3.org/TR/2006/WD-grddl-20061024/">24 October 2006
draft</a>. The <a
href="http://www.w3.org/2001/sw/grddl-wg/issues">issues list</a> shows
the major design decisions since then.</p>
<p>The only changes since the 16 July 2007 release, outside
the status section, are:</p>
<ul>
<li>a typo in the section
<a href="#sec_dubc_ex">An example Dublin Core META transformation</a>
and</li>
<li>the addition of XQuery among the languages mentioned in
section
<a href="#txforms"><span class="gen">6. </span>GRDDL Transformations</a>.
</li>
</ul>
</div>
</body>
</html>