index.html 14.2 KB
<?xml version="1.0" encoding="UTF-8"?><!--*- nxml -*-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <title>Gleaning Resource Descriptions from Dialects of Languages
  (GRDDL)</title>
  <style type="text/css">
.issue {
  background-color:#dfd;
  border: thin solid black;
  color:black;
}

.designSketch {
  background-color:#fdf;
  border: thin solid black; 
  color:black;
}

.illustration {
 margin-left:auto;
 margin-right:auto;
 text-align:center; 
}

.example {
 margin-left:auto;
 margin-right:auto;
 padding-top:0.5em;
 padding-bottom:0.5em;
 width:70%;
 border-top:thin dashed black;
 border-bottom:thin dashed black;
}</style>
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-CG-NOTE" />
</head>

<body xml:lang="en" lang="en">

<div class="head">
<a href="http://www.w3.org/"><img alt="W3C" src="http://www.w3.org/Icons/w3c_home"
height="48" width="72" /></a>

<h1>Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</h1>

<h2>W3C Coordination Group Note 13 April 2004</h2>
<dl>
  <dt>This Version:</dt>
    <dd><a href="http://www.w3.org/TR/2004/NOTE-grddl-20040413/">http://www.w3.org/TR/2004/NOTE-grddl-20040413/</a></dd>
  <dt>Latest Version:</dt>
    <dd><a
      href="http://www.w3.org/TR/grddl/">http://www.w3.org/TR/grddl/</a></dd>
  <dt>Authors:</dt>
    <dd><a href="/People/Dom/">Dominique Hazaël-Massieux</a></dd>
    <dd><a
      href="/People/Connolly/">Dan Connolly</a></dd>
</dl>

<p class="copyright"><a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a>
&#169; 2003, 2004 <a href="http://www.w3.org/"><acronym
title="World Wide Web Consortium">W3C</acronym></a><sup>&#174;</sup> (<a
href="http://www.csail.mit.edu/"><acronym
title="Massachusetts Institute of Technology">MIT</acronym></a>, <a
href="http://www.ercim.org/"><acronym
title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>,
<a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a
href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>,
<a
href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a>,
<a href="http://www.w3.org/Consortium/Legal/copyright-documents">document
use</a> and <a
href="http://www.w3.org/Consortium/Legal/copyright-software">software
licensing</a> rules apply.</p>
</div>
<hr />

<h2>Abstract</h2>

<p>This document presents GRDDL, a mechanism for encoding RDF statements in
XHTML and XML to be extracted by programs such as XSLT transformations.</p>

<div>
<h2>Status of This Document</h2>

<p><em>This section describes the status of this document at the time
of its publication. Other documents may supersede this document. A
list of current W3C publications and the latest revision of this
technical report can be found in the <a
href="http://www.w3.org/TR/">W3C technical reports index</a> at
<tt>http://www.w3.org/TR/</tt>.</em></p>


<p>As part of the work of the <a
href="http://www.w3.org/2001/sw/Activity">W3C Semantic Web
Activity</a>, the <a href="/2001/sw/CG/">Semantic Web Coordination Group</a> (Member-only) and the <a href="/MarkUp/">HTML Working
Group</a> started a task force on RDF in XHTML. This draft is a snapshot
of one of the designs discussed in that task force.</p>

<p>Please send review comments, implementation experience reports,
etc.  to <a href= "mailto:public-rdf-in-xhtml-tf@w3.org"
>public-rdf-in-xhtml-tf@w3.org</a>, a mailing list with <a
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/">public
archive</a>.</p>

<p>The <a
href="http://esw.w3.org/topic/EmbeddingRDFinHTML">EmbeddingRDFinHTML</a>
wiki topic is also available as a shared space for collected wisdom on
related topics.</p>

<p>A related <a
href="http://www.w3.org/2004/01/rdxh/specbg.html">design history and
rationale</a> discusses contribution of this draft to RDF issues such
as <a
href="http://www.w3.org/2000/03/rdf-tracking/#faq-html-compliance"
>faq-html-compliance</a> and <a
href="http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf"
>rdfms-validating-embedded-rdf</a> and Web Architecture issues such as
<a href="http://www.w3.org/2001/tag/issues.html?type=1#RDFinXHTML-35"
>RDFinXHTML-35</a> and <a
href="http://www.w3.org/2001/tag/issues.html?type=1#namespaceDocument-8"
>namespaceDocument-8</a>.</p>


<p>This is something of a design sketch, but it is backed by running
code. We provide pair of online services, <a
href="http://www.w3.org/2003/11/rdf-in-xhtml-demo">one demo for
XHTML</a> and <a
href="http://www.w3.org/2004/01/rdxh/grddl-xml-demo">one demo for
generic XML</a> on an experimental, best-effort basis.</p>

<p>The editors are aware of a few <span class="issue">remaining issues,
marked up like this <q>@@@</q></span>.</p>

<p>A <a href="#changes">log of changes</a> is appended.</p>

<p><em>Publication as a Coordination Group Note does not imply
endorsement by the W3C Membership. This is a draft document and may be
updated, replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than work in
progress.</em></p>


</div>

<div>
<h2 id="toc">Contents</h2>
<ol>
  <li><a href="#intro">Introduction</a></li>
  <li><a href="#grddl-xhtml">GRDDL for XHTML</a></li>
  <li><a href="#grddl-xml">GRDDL for XML</a></li>
  <li><a href="#ns-bind">GRDDL for XML Namespace Documents</a></li>
  <li><a href="#sec">Security Considerations</a></li>
  <li class="issue">@@ References</li>
</ol>
<ul>
  <li><a href="#changes">Changelog</a></li>
</ul>

<h3 id="toc-app">Supplementary Material</h3>
<ul>
  <li><a
    href="http://www.w3.org/2004/lambda/Sites/index.html">Example
    Homepage with Dublin Core, GeoURL, RSS, Creative Commons, etc.</a></li>
  <li><a id="notes" href="http://www.w3.org/2004/01/rdxh/specbg.html">Design Histoy and Rationale</a></li>
</ul>
</div>

<div>
<h2 id="intro"><span class="gen">1.</span> Introduction</h2>

<p>An article by J. Kunze in 1999, <cite><a
href="http://www.ietf.org/rfc/rfc2731.txt">Encoding Dublin Core Metadata in
HTML</a></cite>, explains one way that the Dublin Core community encodes its
metadata in HTML documents. This metadata can also be expressed in the
Resource Description Framework (<a href="http://www.w3.org/RDF/">RDF</a>).</p>

<p>The mapping between the HTML encoding and the RDF encoding can be
represented as an XSLT transformation, <a
href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl">dc-extract.xsl</a>:</p>

<div class="illustration">
<img src="dc-extract.png" alt="diagram: HTML to RDF via dc-extract.xsl" /><br
/>
Decoding HTML metadata to RDF <br />
<small>(<a href="dc-extract.svg">svg</a>)</small></div>

<p>If the HTML author understood and agreed to these encoding conventions,
then their HTML document will conform to the syntactic conventions. In this
case, the mapping preserves the author's meaning. But an author may have
<em>accidentally</em> conformed to the syntactic conventions without any
knowledge of Dublin Core at all. In that case, the mapping most likely does
<em>not</em> preserve the author's meaning.</p>

<h2 id="grddl-xhtml"><span class="gen">2.</span> The GRDDL profile for
XHTML</h2>

<p>The HTML specification, in section <a href=
"http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#h-7.4.4.3"
>7.4.4.3 Meta data profiles</a> provides a mechanism for authors to
use particular metadata vocabularies and thereby indicate the author's
intent to use those terms in accordance with the conventions of the
community that originated the terms.</p>

<blockquote>
  <p>Authors may wish to define additional link types not described in this
  specification. If they do so, they should use a <a
  href="http://www.w3.org/TR/1999/REC-html401-19991224/struct/global.html#profiles">profile</a>
  to cite the conventions used to define the link types.</p>
</blockquote>

<p><dfn>GRDDL</dfn> is such a profile; it's a mechanism for <b>G</b>leaning
<b>R</b>esource <b>D</b>escriptions from <b>D</b>ialects of <b>L</b>anguages.
Use of the <tt><a
href="/2003/g/data-view">http://www.w3.org/2003/g/data-view</a></tt> profile
indicates that <em>RDF statements that result from transformation of the HTML
document to RDF by designated algorithms are part of the document's
meaning.</em></p>

<p>In this profile, the <tt>transformation</tt> link relationship relates a
document to an algorithm for for gleaning resource descriptions from the
dialect the document is written in.</p>

<div class="illustration">
<img src="processing.png" alt="diagram: link to transformation" /><br />
Decoding HTML metadata to RDF <br />
<small>(<a href="processing.svg">svg</a>)</small>

</div>

<p class="issue">@@@ Should we namespace-qualify token used in
<code>rel</code>?cf <a
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2004Jan/0005.html">Profiles
attribute: A format to be defined</a> Karl Dubost 15 Jan 2004.</p>

<p>For example:</p>
<pre class="example">&lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
  &lt;head profile="http://www.w3.org/2003/g/data-view"&gt;
    &lt;title&gt;Some Document&lt;/title&gt;
    &lt;link rel="transformation"
       href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" /&gt;
    &lt;meta name="DC.Subject"
       content="ADAM; Simple Search; Index+; prototype" /&gt;
    ...
  &lt;/head&gt;
  ...
&lt;/html&gt;</pre>

<p>The following RDF statement is part of the meaning of this document:</p>
<pre class="example">&lt;rdf:RDF
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  &gt;
  &lt;rdf:Description rdf:about=""&gt;
    &lt;dc:subject&gt;ADAM; Simple Search; Index+; prototype&lt;/dc:subject&gt;
  &lt;/rdf:Description&gt;
&lt;/rdf:RDF&gt;</pre>

<p>Transformation algorithms <b>should</b> be represented in XSLT. While
javascript, C, or any other programming language technically expresses the
relevant information, XSLT is specifically designed to express XML to XML
transformations and has some good safety characteristics. Other
representations <b>may</b> be used by prior agreement of all concerned
parties.</p>

<p>Transformation algorithms <b>should</b> be well-defined functions whose
only input is the source document. The use of the XSLT
<code>document()</code> function to incorporate other data at transformation
time is an <b>error</b>.</p>

<p class="issue">Limitations on <code>xsl:import</code>?</p>

<p>Note that an XHTML document may conform to a number of dialects
simultaneously and link to more than one decoding algorithm. For example, the
fictional <a
href="http://www.w3.org/2004/lambda/Sites/index.html">Joe
Lambda's Homepage</a> demonstrates a mixture of Dublin Core, Creative
Commons, RSS, FOAF, and geoURL dialects.</p>
</div>

<div>
<h2 id="grddl-xml"><span class="gen">3.</span> The GRDDL attribute in XML</h2>

<p>The GRDDL profile mechanism is a special case of GRDDL designed to fit
within the syntax of XHTML 1.0. The general form of GRDDL is an attribute
suitable for use with a wide variety of XML dialects.</p>

<p>Use of the <code>interpreter</code> attribute in the
<code>http://www.w3.org/2003/g/data-view#</code> namespace on the root
element of an XML document indicates that <em>RDF statements that result from
transformation of the HTML document to RDF by designated algorithms are part
of the document's meaning.</em></p>

<p>The value of the <code>grddl:interpreter</code> attribute designates a
list of algorithms by URI reference. <span class="issue">@@@IRI
reference?</span></p>
<p>For example: <em class="issue">update to P3Q example?</em></p>
<pre class="example"><code>&lt;svg xmlns="http://www.w3.org/2000/svg" 
   xmlns:data-view="http://www.w3.org/2003/g/data-view#" 
   data-view:interpreter="http://www.example.org/2004/01/svg2dc.xsl"
    width="4cm" height="8cm" 
    version="1.1" baseProfile="tiny" &gt;</code></pre>
</div>

<div>
<h2 id="ns-bind"><span class="gen">4.</span> XML Namespaces and embedded RDF</h2>

<p>The RDF property
<code>http://www.w3.org/2003/g/data-view#namespaceTransformation</code>
links an XML Namespace to an interpreter that may be applied to any document
which has its root element in that namespace, such that the output of the
interpreter will be an RDF/XML form of some (or all) of the information
content of the document.</p>

<p>For instance, given the XML Namespace
<code>http://www.example.net/fooML</code>,</p>
<div class="example">
<pre><code>&lt;rdf:Description rdf:about="http://www.example.net/fooML"&gt;
 &lt;namespaceTransformation xmlns='http://www.w3.org/2003/g/data-view#'
     rdf:resource='http://www.example.net/fooML2rdf.xsl' /&gt;
&lt;/rdf:Description&gt;</code></pre>
</div>
<p>asserts that if an XML document has a root element in the
<code>http://www.example.net/fooML</code> namespace, and it is run through
the XSLT style sheet <code>http://www.example.net/fooML2rdf.xsl</code>
then the result will be valid RDF/XML which is information which can be
considered to have been expressed by the document.</p>
</div>

<div>
<h2 id="sec"><span class="gen">5.</span> Security considerations</h2>

<p><a href="http://www.faqs.org/rfcs/rfc2046.html">RFC 2046</a>, in
section 9.  Security Considerations says:</p>

<blockquote>
<p>Implementors should pay special attention to the
security implications of any media types that can cause the remote
execution of any actions in the recipient's environment.  In such
cases, the discussion of the "application/postscript" type may serve
as a model for considering other media types with remote execution
capabilities.</p>
</blockquote>

<p>Given the expressive power of XSLT, and the possibility to access external
resources from a XSLT style sheet (e.g. through the <code>document</code>
function or the <code>xsl:import</code> mechanism), implementors should take
the appropriate measures to prevent malicious usage of this mechanism.</p>
</div>

<div>
<h2 id="changes"><em>Change History</em></h2>

<p>The <a href="http://www.w3.org/2003/11/rdf-in-xhtml-proposal">Nov
2003 draft</a> is a predecessor of this spec.</p>

<p>An <a href="http://www.w3.org/2004/01/rdxh/spec">editor's working draft</a> is also available; v1.11 was announced in <a
href="http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2004Jan/0011.html">a
message of 16Jan</a>.</p>

</div>
</body>
</html>