index.html
11.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
<html xmlns="http://www.w3.org/1999/xhtml"> <!--*- nxml -*-->
<head>
<title>Transforming XHTML to LaTeX and BibTeX</title>
<link rel="stylesheet" href="article.css"/>
<link rel="documentclass" title="llncs"/><!-- href? where does that come from? -->
<link rel="bibliographystyle" title="splncs" /> <!-- href? -->
<link rel="usepackage" title="graphicx" /><!-- href? -->
<link rel="usepackage" title="url" href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty" />
</head>
<body>
<div class="online"><a href="/">W3C</a></div>
<div class="maketitle">
<h1>Transforming XHTML to LaTeX and BibTeX</h1>
<address><a rel="author" href="http://www.w3.org/People/Connolly/">Dan Connolly</a><br />
<small class="online">$Revision: 1.23 $ of $Date: 2008/04/24 21:28:36 $</small>
</address>
</div>
<div class="abstract"><h4>Abstract</h4>
<p>We transform XHTML to LaTeX and BibTeX to allow technical articles
to be developed using familiar XHTML authoring tools and
techniques.</p>
</div>
<div>
<h2>Introduction</h2>
<p>Occasionally a web page turns the corner from a casually drafted
idea to an article worthy of publication. Computer science conferences
often require submissions using specific LaTeX styles; for example,
the <a
href="http://iswc2004.semanticweb.org/submission/authors_instruction.php">ISCW2004
submission instructions</a> require that submitted papers be formatted
in the style of the Springer publications format for <a
href="http://www.springeronline.com/sgw/cda/frontpage/0,10735,5-164-2-72376-0,00.html">Lecture
Notes in Computer Science (LNCS)</a>.
<a href="http://www.w3.org/Style/XSL/">XSLT</a> is
a convenient notation to express a transformation from
XHTML to LaTeX.</p>
<p>Tools to transform from LaTeX to HTML are commonplace, but there
are far fewer to go the other way. A little bit of searching yielded
some work<a href="#Gur00">[Gur00]</a> that was designed to undo a
transformation to XHTML. It used an odd XHTML namespace and exhibited
various other quirks specific to reversing that transformation, but it
provided quite a boost up the LaTeX learning curve<a
href="#Mann94">[Mann94]</a>.</p>
<p>That code did not integrate with the BibTeX. In order to take
advantage of automatic bibliography formatting traditionally provided
by LaTeX styles, after studying the <a
href="http://www.cc.gatech.edu/classes/RWL/Projects/citation/Docs/UserManuals/Reference_Pages/bibtex_doc.html">BibTeX
format</a><a href="#Spen98">[Spen98]</a> for a bit, <tt><a
href="xh2bib.xsl">xh2bibl.xsl</a></tt> was born.</p>
<p>Together with tradtional <tt>pdflatex</tt> and <tt>bibtex</tt>
tools<a href="#tetex">[tetex]</a> and and XSLT processor such as
xsltproc<a href="#XSLTPROC">[XSLTPROC]</a>, this transformation can
turn ordinary web pages with just a bit of special markup into
camera-ready PDF in specialized LaTeX styles.</p>
</div>
<div><h3>A Quick Example</h3>
<p>This article demonstrates the basic features. See:</p>
<ul>
<li><tt><a href="Overview.pdf">Overview.pdf</a></tt></li>
<li><tt><a href="Overview.tex">Overview.tex</a></tt></li>
<li><tt><a href="Overview.tex">Overview.bib</a></tt></li>
</ul>
<p>They are produced ala:</p>
<pre>
$ make Overview.pdf
xsltproc --novalid --stringparam DocClass llncs \
--stringparam Bib Overview --stringparam BibStyle splncs \
--stringparam Status prepub \
-o Overview.tex xh2latex.xsl Overview.html
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview.tex
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
<em>...</em>
Output written on Overview.pdf (3 pages, 62474 bytes).
Transcript written on Overview.log.
xsltproc --novalid -o Overview.bib xh2bib.xsl Overview.html
BSTINPUTS=.:../../../2004/LLCS: bibtex Overview
This is BibTeX, Version 0.99c (Web2C 7.4.5)
The top-level auxiliary file: Overview.aux
The style file: splncs.bst
Database file #1: Overview.bib
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
<em>...</em>
Output written on Overview.pdf (3 pages, 67583 bytes).
Transcript written on Overview.log.
TEXINPUTS=.:../../../2004/LLCS: pdflatex Overview
This is pdfTeX, Version 3.14159-1.10b (Web2C 7.4.5)
<em>...</em>
Output written on Overview.pdf (3 pages, 67167 bytes).
Transcript written on Overview.log.
</pre>
</div>
<div>
<h2>Features</h2>
<p>The transformation <tt><a href="xh2latex.xsl">xh2latex.xsl</a></tt>
works in the obvious way for many idioms:</p>
<ul>
<li>sections headings: <tt>h2</tt>, <tt>h3</tt>, <tt>h4</tt></li>
<li>paragraphs: <tt>p</tt></li>
<li>itemized lists: <tt>ul</tt>, <tt>dl</tt></li>
<li>enumerated (numbered) lists: <tt>ol</tt></li>
<li>tables: <tt>table border="1"</tt>, <tt>tr</tt>, <tt>td</tt></li>
<li>verbatim: <tt>pre</tt></li>
<li>phrase markup: <tt>em</tt>, <tt>code</tt>, <tt>tt</tt>,
<tt>i</tt>, <tt>b</tt></li>
</ul>
<p>Table support is limited to tables with <tt>border="1"</tt>
and where all rows have the same number of cells. For example:</p>
<table border="1">
<tr><th>Name</th><th>Address</th><th>Phone</th></tr>
<tr><td>John Doe</td><td>123 High St.</td><td>555-1212</td></tr>
<tr><td>Jane Smith</td><td>456 Low St.</td><td>555-1234</td></tr>
</table>
<p>Specialized markup is required for other idioms. An <a
href="article.css">article.css</a> stylesheet provides
visual feedback for this special markup.</p>
<p>To use a latex package, add a link to the head of your document a la:</p>
<pre>
<link rel="usepackage" title="url"
href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty" />
</pre>
<p>The package name is taken from the title attrbute. The href attribute is not used in the LaTeX conversion.</p>
<p>We recommend the <a
href="ftp://cam.ctan.org/tex-archive/macros/latex/contrib/misc/url.sty">url.sty</a>
package, per <a
href="http://www.tex.ac.uk/cgi-bin/texfaq2html?label=setURL">a TeX
FAQ</a>. For example: <tt
class="url">http://www.w3.org/People/Connolly/</tt>.</p>
<div><h3>Front Matter</h3>
<p>The following patterns are used to extract the
title page material:</p>
<ul>
<li><tt>div/@class="maketitle"</tt>
<ul>
<li>title: <tt>h1</tt></li>
<li>abstract: <tt>div/@class="abstract"</tt></li>
<li>author: <tt>address/a[@rel="author"]</tt></li>
</ul>
</li>
<li>keywords: <tt>div[@class="keywords"]</tt></li>
<li>terms: <tt>div[@class="terms"]</tt></li>
</ul>
<p><em>support for WWW2006 style authors, following
<a href="http://www.acm.org/sigs/pubs/proceed/sigfaq.htm">ACM style</a>,
is in progress.</em></p>
</div>
<div><h3>Cross references and footnotes</h3>
<p>The <tt>a[@rel="ref"]</tt> pattern is transformed to the LaTeX
<tt>\ref{<var>label</var>}</tt> idiom, assuming the reference takes
the form <tt>href="#<var>label</var>"</tt>. <em>@@needs testing</em></p>
<p>The footnote pattern is <tt>*[@class="footnote"]</tt>.</p>
</div>
<div><h3>Figures</h3>
<p>The <tt>div[@class="figure"]</tt> pattern is transformed to a
figure environment; any <tt>div/@id</tt> is used as a figure
label. The file pattern is <tt>object/@data</tt>. <em>Figures are
currently assumed to be PDF; the <tt>object/@height</tt> attribute is
copied over.</em> The caption pattern is <tt>p[@class="caption"]</tt>.
<em>@@need to test this.</em>
Be sure to include the <tt>epsfig</tt> package a la:
</p>
<pre>
<link rel="usepackage" title="epsfig" />
</pre>
</div>
<div><h3>Citations and Bibliography</h3>
<p>An <tt>a</tt> element starting with an open square bracket
<tt>[</tt> is interpreted as a citation reference. The <tt>href</tt>
is assumed to be a local link ala <tt>#<var>tag</var></tt>.</p>
<p>The pattern <tt>dl/@class="bib"</tt> is used to find the
bibliography.
Each item marked up ala...</p>
<pre>
<dt class="misc">[<a name="tetex">tetex</a>]</dt>
<dd>
<span class="author">Thomas Esser</span>
<cite><a
href="http://www.tug.org/tex-archive/help/Catalogue/entries/tetex.html"
>The TeX distribution for Unix/Linux</a></cite>
February <span class="year">2003</span>
</dd>
</pre>
<p>or</p>
<pre>
<dt class="misc" id="tetex">[tetex]</dt>
...
</pre>
<p>Note the placement of the bibtex item type <tt>misc</tt> and the
tag <tt>tetex</tt> and keep in mind that <tt>bibtex</tt> ignores
works in the bibliography that are not cited from the body.</p>
<p>The <tt><a href="xh2bib.xsl">xh2bibl.xsl</a></tt> transformation
turns this markup into BibTeX format. <tt>xh2latex.xsl</tt> transforms
the entire bibliography <tt>dl</tt> to a <tt>\bibliography{...}</tt>
reference.</p>
<p><em>capitalization of titles seems to get mangled. I'm not sure if
that's a feature of certain bibliography styles or what.</em></p>
</div>
<div><h3>Bugs/Caveats/Misfeatures</h3>
<ul>
<li>Composed characters and such in the bibliography are
handled with a sort of kludge, e.g.
<tt>K<span title='\"o'>ö</span>bler</tt>
</li>
<li>The <tt>samp</tt> element is used to pass LaTeX
math markup thru, e.g.
<tt><samp>\Delta</samp></tt>
</li>
</ul>
</div>
</div>
<div><h2>Makefile support</h2>
<p>Formatting a LaTeX document is done in several passes. One <a
href=
"http://amath.colorado.edu/documentation/LaTeX/basics/steps/help_latex.html"
>typical manual</a> shows:</p>
<pre>
ucsub> latex MyDoc.tex
ucsub> bibtex MyDoc
ucsub> latex MyDoc.tex
ucsub> latex MyDoc.tex
</pre>
<p>The follwing excerpt from <tt><a
href="html2latex.mak">html2latex.mak</a></tt> shows
some rules to accomplish this using make:</p>
<pre>
.html.tex:
$(XSLTPROC) --novalid $(HLPARAMS) \
-o $@ xh2latex.xsl $<
.html.bib:
$(XSLTPROC) --novalid -o $@ xh2bib.xsl $<
.tex.aux:
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $<
.tex.bbl:
BSTINPUTS=$(BSTINPUTS) $(BIBTEX) $*
.aux.pdf:
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
TEXINPUTS=$(TEXINPUTS) $(PDFLATEX) $*
</pre>
<p>Sources:</p>
<ul>
<li><tt><a href="xh2latex.xsl">xh2latex.xsl</a></tt></li>
<li><tt><a href="xh2bib.xsl">xh2bib.xsl</a></tt></li>
<li><tt><a href="article.css">article.css</a></tt></li>
</ul>
</div>
<div>
<h2>References</h2>
<dl class="bib">
<dt class="misc">[<a name="tetex">tetex</a>]</dt>
<dd>
<span class="author">Thomas Esser</span>
<cite><a href="http://www.tug.org/tex-archive/help/Catalogue/entries/tetex.html">The TeX distribution for Unix/Linux</a></cite>
February <span class="year">2003</span>
</dd>
<dt class="misc">[<a name="Mann94">Mann94</a>]</dt>
<dd><span class="author">Shannon Mann</span>
<cite><a href="http://www.csclub.uwaterloo.ca/u/sjbmann/tutorial.html">Beginner's LaTeX Tutorial</a></cite>
<span class="year">1994</span>-06-16T15:32:27
</dd>
<dt class="misc">[<a name="Spen98">Spen98</a>]</dt>
<dd><span class="author">Spencer Rugaber</span>
<cite>
<a href="http://www.cc.gatech.edu/classes/RWL/Projects/citation/">The Citation project</a>
</cite>
Summer <span class="year">1998</span>.
</dd>
<dt class="misc">[<a name="Gur00" id="Gur00">Gur00</a>]</dt>
<dd><span class="author">Eitan M. Gurari</span>
<cite><a href="http://www.cse.ohio-state.edu/~gurari/docs/mml-00/xhm2latex.html">XSLT from XHTML+MathML to LATEX</a></cite>
<span class="month">July</span> 19, <span class="year">2000</span>
</dd>
<dt class="misc">[<a name="XSLTPROC" id="XSLTPROC">XSLTPROC</a>]</dt>
<dd><span class="author">Daniel Veillard</span>
<cite><a href="http://xmlsoft.org/XSLT/xsltproc2.html">The xsltproc tool</a></cite>
in <a href="http://xmlsoft.org/XSLT/">libxslt: The XSLT C library for Gnome</a>
1.1.2 <span class="month">Dec</span> 24 <span class="year">2003</span>
</dd>
</dl>
</div>
</body>
</html>