travel.html
15.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Semantic Web Application Integration: Travel Tools</title>
<link rel="stylesheet" href="../doc/style.css" />
<style type="text/css">
.mechanics { background-color: #b6b6b6; }
.figure { text-align: center }
</style>
</head>
<body>
<div class="noprint">
<a href="../../../../">W3C</a> | <a href="../../../01/sw/">SWAD</a> | <a
href="../">SWAP</a> | <a href="../doc/">Tutorial</a>
</div>
<h1>Semantic Web Application Integration: Travel Tools</h1>
<p><em>The bane of my existence is doing things I know the computer
could do for me.</em> When I got my proposed July 2001 travel
itinerary in email, I just couldn't bear the thought of manually
copying and pasting each field from the itinerary into my PDA calendar. I
started putting the Semantic Web approach to application integration
to work.</p>
<p>The Semantic Web approach to application integration emphasizes
data about real-world things like people, places, and events over
document structure. Documents are important real-world things too, of
course. And Semantic Web data formats benefit from the
internationalization support in XML and the growing infrastructure of
tools. But most XML schemas are too constrained, syntactically, and
not constrained enough, semantically, to accomplish these integration
tasks:</p>
<ul>
<li><a href="#map-viz">plot an itinerary on a map</a></li>
<li><a href="#ical-evo">import travel itineraries into my iCalendar-happy desktop PIM</a></li>
<li><a href="#pda-in">import travel itineraries into my PDA calendar</a></li>
<li><a href="#plain-text-sum">produce a brief summary of an itinerary for use in plain text email</a></li>
<li><a href="#ckcn">check proposed work travel itineraries against family constraints</a></li>
<li><a href="#fw">tell me when my travel schedule brings me unusually near a friend/colleague</a></li>
<li><a href="#fw">produce animated views of my travel schedule or past trips</a></li>
<li><a href="#fw">find conflicts between teleconferences and flights</a></li>
</ul>
<h2 id="grokLeg">Working with legacy data</h2>
<p>While more and more of the data in our lives is available
in the Semantic Web, there will always be a place for mechanisms
that extract the statements implicit in legacy data.</p>
<p>The data comes from the travel agency like this, probably
dumped from their database system:</p>
<pre>07 APR 03 - MONDAY
AIR AMERICAN AIRLINES FLT:3199 ECONOMY
OPERATED BY AMERICAN AIRLINES
LV KANSAS CITY INTL 940A EQP: MD-80
DEPART: TERMINAL BUILDING B 01HR 36MIN
AR DALLAS FT WORTH 1116A NON-STOP</pre>
<p>I hope that before too long they'll dump it from their
database directly into RDF
or perhaps in XML using some travel industry vocabulary, but</p>
<ul>
<li>before they'll do so, somebody will have to show them why it's
valuable</li>
<li>sometimes, for the short term, reverse-engineering the structure
of their data is cheaper than getting them to change their
processes</li>
</ul>
<p>So I wrote a perl script (<a
href="grokTravItin.pl">grokTravItin.pl</a>) to extract
statements from the data:</p>
<p><img src="grokLeg.png" alt="itin.txt -> itin.n3/rdf" /></p>
<p>The output of the perl script, <tt>itin.nt</tt>,
is in <a
href="http://www.w3.org/TR/rdf-testcases/#ntriples">n-triples</a>, a
line-oriented serialization developed in the RDF Core working group
for testing parsers. For visual inspection and debugging, we use cwm
to pretty-print it in N3. The results look like this:</p>
<pre>
:_gflt3199_3 a :_gECONOMY_5;
k:endingDate :_gdayMONDAY07_2;
k:fromLocation <http://www.daml.org/cgi-bin/airport?MCI>;
k:startingDate :_gdayMONDAY07_2;
k:toLocation <http://www.daml.org/cgi-bin/airport?DFW>;
t:arrivalTime "11:16";
t:carrier :_gAMERICANAIRLINES_4;
t:departureTime "09:40";
t:flightNumber "3199" .
:_gAMERICANAIRLINES_4 a k:AirlineCompany;
k:nameOfAgent "AMERICAN AIRLINES" .
:_gECONOMY_5 r:value "ECONOMY" .
:_gdayMONDAY07_2 a k:Monday;
dt:date "2003-04-07" .
</pre>
<h3>Choosing a Vocabulary: Build Or Buy?</h3>
<p>The import script not only bridges the syntactic
gap between the legacy data and RDF, but it also translates
the vocabulary of terms used in the data into URI space.
This raises the classic build-or-buy choice:</p>
<dl>
<dt>use (buy) an existing, general-purpose vocabulary</dt>
<dd>If we can accept the risk of putting word's into the source's
mouth, we can benefit from an economy of scale of shared vocabulary
such as the <a
href="http://www.cyc.com/cycdoc/vocab/transportation-vocab.html">cyc
ontology of common sense transportation terms</a>.</dd>
<dt>build a vocabulary just for this purpose</dt>
<dd>A special-purpose vocabulary isolates the data
from risks of version skew and such.</dd>
</dl>
<p>Early versions of the import script used a special-purpose
vocabulary; rules to relate this vocabulary to other
vocabularies were developed one at a time. But eventually
a pattern of using the general purpose cyc ontology
emerged, and the expected benefit of maintaining the
special-purpose ontology was dominated by the cost.
More recent versions convert directly to terms
in shared ontologies, except in the case where
custom terms were needed:</p>
<div class="figure"><img alt="travel terms" src="travelFig.png" />
<div class="noprint mechanics" >if your desktop is SVG-happy, <a href="travelFig.svg">travelFig.svg</a> is
nicer. I drew it with dia; you can play with <a
href="travelFig.xml">travelFig.xml</a>, the source. I'm working on <a
href="/2002/03owlt/umlp/dia2owl.xsl">dia2owl.xsl</a>, which converts it to
RDF/S: <a href="travelFig.rdf">travelFig.rdf</a>.
<p>TODO: more stuff in the diagram? socialParticipants, eventOccursAt, etc.</p>
</div>
</div>
<ul>
<li style="color: red">cyc terms (prefix <tt>k:</tt>) in red</li>
<li style="color: purple">DAML airport ontology terms in purple</li>
<li style="color: green">custom travelTerms (prefix <tt>t:</tt>) in green.
e.g. <code>departureTime</code>, <code>flightNumber</code>, ...; see <a
href="travelTerms">travelTerms</a>, in <a href="travelTerms.rdf">RDF/xml</a>,
<a href="travelTerms.rdf">RDF/n3</a>.
</li>
<li style="color: blue">RDF standard terms (prefix <tt>r:</tt>)in blue</li>
<li style="color: orange">XML Schema terms (prefix <tt>dt:</tt>)in orange</li>
</ul>
<p>Note that <strong>mixing vocabularies in RDF is easy</strong>; so
easy, compared with the general problem of mixing XML namespaces, that
I hardly notice it at all. Within the basic subject/predicate/object
abstract syntax, terms can be combined freely. Migrating to more
specialized or more generalized terms is cheap, using
<tt>rdfs:subPropertyOf</tt> and the like.</p>
<div class="noprint"> <em>(@@ok to just say that
without demonstrating it?)</em></div>
<h2 id="map-viz">Integration with mapping tools</h2>
<p>Let's exploit the effort we have put into going beyond formalized
document structure into formalized data about the real world.
Folks in the <a href="http://www.daml.org/">DAML project</a> have
imported airport lat/long data into the semantic web; we
can use log:semantics to reach out and get it with rules like
these, excerpted from <a href="airportLookup.n3">airportLookup.n3</a>:
</p>
<pre>
# well-known airports...
{ :X a :Y; #@@kludge...
log:uri [ str:startsWith "http://www.daml.org/cgi-bin/airport?" ] }
log:implies { :X a :AirportKnownToDAML }.
{ :X apt:iataCode :K.
:Y log:uri [ is str:concatenation of
("http://www.daml.org/cgi-bin/airport?" :K) ];
}
log:implies { :Y a apt:Airport; apt:iataCode :K; = :X }.
# we only want to look up certain airports...
{ [ k:toLocation :X ]. }
log:implies { :X a :InterestingPlace }.
{ [ k:fromLocation :X ]. }
log:implies { :X a :InterestingPlace }.
# believe what daml.org says about airport latitutde/longitudes...
:AirportProperty is rdf:type of
apt:latitude,
apt:name,
apt:iataCode,
apt:icaoCode,
apt:location,
apt:latitude,
apt:longitude,
apt:elevation.
{
:P a :AirportProperty.
[ a :AirportKnownToDAML, :InterestingPlace;
log:semantics [
log:includes {
:IT :P :V.
}
] ].
} log:implies {
:IT a apt:Airport; :P :V.
}.
</pre>
<p>For the convenience of consumers (including ourselves), we publish
in RDF/XML the results of reaching out with the rules; i.e. the
itinerary including the lat/long info. Then we use the (<em>little
documented</em>) <tt>cwm --strings</tt> output mode to generate two
files, <tt>itin-arcs</tt> and <tt>itin-markers</tt>, as input to <a
href="http://xplanet.sourceforge.net/">xplanet</a>:</p>
<div class="figure">
<img alt="map viz toolchain" src="mapVizFig.png" />
</div>
<p>The resulting map shows that we have given the
machine a fairly deep understanding of the itinerary:</p>
<div class="figure">
<p><img alt="MCI to YMX and back for Extreme 2002"
src="../../../../2003/04dc-mia/itin-mia.png" /></p>
<div class="noprint mechanics">more details are in the <a href="../../../../2003/04dc-mia/Makefile">Apr 2003 trip Makefile</a></div>
</div>
<h2 id="ical-evo">Integration with iCalendar Tools</h2>
<p>In fact, the published RDF/XML version of the itinerary is joined
not only with latitude/longitude data, but also timezone data, and
elaborated via <a href="itin2ical.n3">itin2ical.n3</a> rules into <a href="../../../../2002/12/cal/">an RDF
representation of the standard iCalendar syntax</a>.</p>
<pre>
{ :FLT
k:startingDate [ dt:date :YYMMDD];
k:endingDate [ dt:date :YYMMDD2];
t:departureTime :HH_MM;
k:fromLocation [ :timeZone [ cal:tzid :TZ] ];
t:arrivalTime :HH_MM2;
k:toLocation [ :timeZone [ cal:tzid :TZ2] ].
:DTSTART is str:concatenation of
(:YYMMDD "T" :HH_MM ":00"). #@@ extra punct in dates
:DTEND is str:concatenation of
(:YYMMDD2 "T" :HH_MM2 ":00").
( :FLT!log:rawUri "@uri-2-mid.w3.org") str:concatenation :UID. #@@hmm... kludge?
}
log:implies {
:FLT a cal:Vevent;
cal:uid :UID;
cal:dtstart [ cal:tzid :TZ; cal:dateTime :DTSTART ];
cal:dtend [ cal:tzid :TZ2; cal:dateTime :DTEND ].
}.
</pre>
<p> The final
syntactic export is more complex than the markers/arcs case, so we
wrote a python program, <tt><a
href="toIcal.py">toIcal.py</a></tt>, using the
cwm API, to generate iCalendar syntax.</p>
<div class="figure">
<img alt="calendar integration toolchain" src="calIntFig.png" />
</div>
<p>We can import the resulting iCalendar file into an of
a number of interoperable tools, such as Ximian Evolution:</p>
<div class="figure">
<img alt="evo screenshot" src="calIntShot.png" />
</div>
<div class="noprint mechanics">
FYI: python/evolution stuff: <a
href="http://heddley.com/edd/2002/03/05/evocal.py">evocal.py</a> from
Edd. relies on debian package: evolution-dev. also: <a
href="http://heddley.com/edd/2002/03/05/python-libversit-0.1.tar.gz">python-libversit</a>
</div>
<h2 id="pda-in">Conversion for PDA import</h2>
<p>
The <a href="itin2ical.n3">itin2ical.n3</a> rules
were actually developed after my Palm Pilot broke
and I switched to an iCalendar tool. The original
rules mapped to
an <a
href="../../../08/palm56/datebook">RDF vocabulary for the palmpilot
datebook</a>, developed for use with <a
href="http://dev.w3.org/cvsweb/2001/palmagent/">palmagent</a>, an
HTTP/RDF interface to my PDA:</p>
<div class="figure">
<img src="travelPdaRulesFig.png" alt="pda rules toolchain" />
<div class="noprint mechanics">see <a
href="../../../../2001/07dc-bos/Makefile">Makefile</a> for (some of
the?) details). The rules, <a
href="../../../../2001/07dc-bos/itin2datebook.n3">itin2datebook.n3</a>,
use some outdated vocabulary.
</div>
</div>
<div class="noprint"><em>[@@TODO: image of palm pilot showing flight
in the datebook]</em></div>
<h2 id="plain-text-sum">Plain Text Summaries</h2>
<p>All this rich integration is great when the tools are all working
and you have plenty of bandwidth and all that, but sometimes, plain
text is necessary and sufficient for the task at hand. For example, if
I get mail asking when I arrive at the meeting site, mailing back a
map is probably overkill, and I can't be 100% sure their desktop is
iCalendar-happy.</p>
<p>The <tt>cwm --strings</tt> output mode can
be really handy in these cases; we can use a few
<a href="../../../../2002/10dc-uk/itinBrief.n3">itinBrief.n3</a> rules
ala...</p>
<pre>python cwm.py itinBrief.n3 itin.nt --think --strings</pre>
<p>to get a summary ala...</p>
<pre>
2003-04-07 09:40 - 11:16 MCI->DFW Monday AMERICAN AIRLINES #3199
2003-04-07 12:03 - 15:49 DFW->MIA Monday AMERICAN AIRLINES #68
2003-04-10 19:12 - 21:32 MIA->ORD Thursday AMERICAN AIRLINES #1477
2003-04-10 22:33 - 23:54 ORD->MCI Thursday AMERICAN AIRLINES #1081
</pre>
<h2 id="ckcn">Checking Constraints</h2>
<p>Now that I have the proposed itinerary formalized, I can
automatically check it against various constraints before
I accept it and before I copy it to my PDA and to all the
other peers that need to know about it.</p>
<p>Rules like "itineraries that have me leaving
before 30 July are no good" are a bit tedious to
formalize, but my confidence in the results is
higher than my confidence in eyeballing it:</p>
<pre>{
?D a k:ItineraryDocument; k:containsInformationAbout-Focally ?TRIP.
?TRIP k:subEvents
[ k:startingDate [ dt:date ?D1 ];
k:fromLocation [ apt:iataCode "MCI" ];
t:departureTime ?T1;
].
?D1 str:lessThan "2001-07-30".
} => {
?TRIP <#leavesDaysTooSoon> ?D1;
<#at> ?T1.
}.</pre>
<p>These constraints can be checked with cwm ala:</p>
<pre>$ python cwm.py proposed-itinerary.nt --think=constraints.n3</pre>
<p>... and look for <tt><#leavesDaysTooSoon></tt> in the output.</p>
<div class="noprint mechanics">
<p>More details: <a href="../../../../2001/08swws67/">SWWS stuff</a>.</p>
</div>
<h2 id="fw">Conclusions and Future Work</h2>
<p>For the first few integration tasks, it might have been less work
to just manually copy the data, field by field. But the return on
investment increases with each trip I take, each system we integrate
with, and each collaborator who develops an interoperable tool.</p>
<p>The DAML project is a source of not just airport lat/long data, but
also tools such as <a
href="http://www.daml.org/2001/06/itinerary/">DAML itinerary</a> by
Mike Dean.</p>
<p>Ideas for future work include:</p>
<ul>
<li>animated views of my travel schedule or <a
href="../../../../People/Connolly/events/">past trips</a></li>
<li>peer-to-peer synchronization, ala "if it's in this schedule
scraped from an HTML page I maintain, but not in my evolution
calendar, print it out in .ics format for import".
<div class="noprint mechanics">
prototyped in <a href="/2002/08dc-ymx/Makefile">Aug 2002 trip Makefile</a>
</div>
</li>
<li>based on my addressbook and my travel schedule (and the travel
schedules of folks in my addressbook...) tell me when my travel
schedule brings me unusually near a friend/colleague</li>
<li>find conflicts between teleconferences and flights</li>
</ul>
<div class="noprint">
<hr />
<address>
<a href="../../../../People/Connolly/">Dan Connolly</a> and friends<br />
<small>started Jun 2002<br />
$Revision: 1.32 $ of $Date: 2004/09/21 15:13:59 $ by $Author: connolly $</small>
</address>
</div>
</body>
</html>