travel.html 15.2 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
  <title>Semantic Web Application Integration: Travel Tools</title>
  <link rel="stylesheet" href="../doc/style.css" />
<style type="text/css">
.mechanics { background-color: #b6b6b6; }
.figure { text-align: center }
</style>
</head>

<body>
<div class="noprint">
<a href="../../../../">W3C</a> | <a href="../../../01/sw/">SWAD</a> | <a
href="../">SWAP</a> | <a href="../doc/">Tutorial</a>
</div>

<h1>Semantic Web Application Integration: Travel Tools</h1>

<p><em>The bane of my existence is doing things I know the computer
could do for me.</em> When I got my proposed July 2001 travel
itinerary in email, I just couldn't bear the thought of manually
copying and pasting each field from the itinerary into my PDA calendar.  I
started putting the Semantic Web approach to application integration
to work.</p>

<p>The Semantic Web approach to application integration emphasizes
data about real-world things like people, places, and events over
document structure. Documents are important real-world things too, of
course. And Semantic Web data formats benefit from the
internationalization support in XML and the growing infrastructure of
tools. But most XML schemas are too constrained, syntactically, and
not constrained enough, semantically, to accomplish these integration
tasks:</p>

<ul>
<li><a href="#map-viz">plot an itinerary on a map</a></li>
<li><a href="#ical-evo">import travel itineraries into my iCalendar-happy desktop PIM</a></li>
<li><a href="#pda-in">import travel itineraries into my PDA calendar</a></li>
<li><a href="#plain-text-sum">produce a brief summary of an itinerary for use in plain text email</a></li>
<li><a href="#ckcn">check proposed work travel itineraries against family constraints</a></li>
<li><a href="#fw">tell me when my travel schedule brings me unusually near a friend/colleague</a></li>
<li><a href="#fw">produce animated views of my travel schedule or past trips</a></li>
<li><a href="#fw">find conflicts between teleconferences and flights</a></li>
</ul>


<h2 id="grokLeg">Working with legacy data</h2>

<p>While more and more of the data in our lives is available
in the Semantic Web, there will always be a place for mechanisms
that extract the statements implicit in legacy data.</p>

<p>The data comes from the travel agency like this, probably
dumped from their database system:</p>

<pre>07 APR 03 - MONDAY
AIR AMERICAN AIRLINES FLT:3199 ECONOMY
OPERATED BY AMERICAN AIRLINES
LV KANSAS CITY INTL 940A EQP: MD-80
DEPART: TERMINAL BUILDING B 01HR 36MIN
AR DALLAS FT WORTH 1116A NON-STOP</pre>

<p>I hope that before too long they'll dump it from their
database directly into RDF
or perhaps in XML using some travel industry vocabulary, but</p>

<ul>

<li>before they'll do so, somebody will have to show them why it's
valuable</li>

<li>sometimes, for the short term, reverse-engineering the structure
of their data is cheaper than getting them to change their
processes</li>

</ul>

<p>So I wrote a perl script (<a
href="grokTravItin.pl">grokTravItin.pl</a>) to extract
statements from the data:</p>

<p><img src="grokLeg.png" alt="itin.txt -&gt; itin.n3/rdf" /></p>

<p>The output of the perl script, <tt>itin.nt</tt>,
is in <a
href="http://www.w3.org/TR/rdf-testcases/#ntriples">n-triples</a>, a
line-oriented serialization developed in the RDF Core working group
for testing parsers. For visual inspection and debugging, we use cwm
to pretty-print it in N3. The results look like this:</p>

<pre>
    :_gflt3199_3     a :_gECONOMY_5;
         k:endingDate :_gdayMONDAY07_2;
         k:fromLocation &lt;http://www.daml.org/cgi-bin/airport?MCI&gt;;
         k:startingDate :_gdayMONDAY07_2;
         k:toLocation &lt;http://www.daml.org/cgi-bin/airport?DFW&gt;;
         t:arrivalTime "11:16";
         t:carrier :_gAMERICANAIRLINES_4;
         t:departureTime "09:40";
         t:flightNumber "3199" .

    :_gAMERICANAIRLINES_4     a k:AirlineCompany;
         k:nameOfAgent "AMERICAN AIRLINES" .
    
    :_gECONOMY_5     r:value "ECONOMY" .

    :_gdayMONDAY07_2     a k:Monday;
         dt:date "2003-04-07" .
    

</pre>

<h3>Choosing a Vocabulary: Build Or Buy?</h3>


<p>The import script not only bridges the syntactic
gap between the legacy data and RDF, but it also translates
the vocabulary of terms used in the data into URI space.
This raises the classic build-or-buy choice:</p>

<dl>
<dt>use (buy) an existing, general-purpose vocabulary</dt>

<dd>If we can accept the risk of putting word's into the source's
mouth, we can benefit from an economy of scale of shared vocabulary
such as the <a
href="http://www.cyc.com/cycdoc/vocab/transportation-vocab.html">cyc
ontology of common sense transportation terms</a>.</dd>

<dt>build a vocabulary just for this purpose</dt>
<dd>A special-purpose vocabulary isolates the data
from risks of version skew and such.</dd>
</dl>

<p>Early versions of the import script used a special-purpose
vocabulary; rules to relate this vocabulary to other
vocabularies were developed one at a time. But eventually
a pattern of using the general purpose cyc ontology
emerged, and the expected benefit of maintaining the
special-purpose ontology was dominated by the cost.
More recent versions convert directly to terms
in shared ontologies, except in the case where
custom terms were needed:</p>

<div class="figure"><img alt="travel terms" src="travelFig.png" />

<div class="noprint mechanics" >if your desktop is SVG-happy, <a href="travelFig.svg">travelFig.svg</a> is
nicer. I drew it with dia; you can play with <a
href="travelFig.xml">travelFig.xml</a>, the source. I'm working on <a
href="/2002/03owlt/umlp/dia2owl.xsl">dia2owl.xsl</a>, which converts it to
RDF/S: <a href="travelFig.rdf">travelFig.rdf</a>.

<p>TODO: more stuff in the diagram? socialParticipants, eventOccursAt, etc.</p>

</div>
</div>


<ul>
<li style="color: red">cyc terms (prefix <tt>k:</tt>) in red</li>
<li style="color: purple">DAML airport ontology terms in purple</li>
<li style="color: green">custom travelTerms (prefix <tt>t:</tt>) in green.
e.g. <code>departureTime</code>, <code>flightNumber</code>, ...; see <a
href="travelTerms">travelTerms</a>, in <a href="travelTerms.rdf">RDF/xml</a>,
<a href="travelTerms.rdf">RDF/n3</a>.
</li>
<li style="color: blue">RDF standard terms (prefix <tt>r:</tt>)in blue</li>
<li style="color: orange">XML Schema terms (prefix <tt>dt:</tt>)in orange</li>
</ul>

<p>Note that <strong>mixing vocabularies in RDF is easy</strong>; so
easy, compared with the general problem of mixing XML namespaces, that
I hardly notice it at all. Within the basic subject/predicate/object
abstract syntax, terms can be combined freely. Migrating to more
specialized or more generalized terms is cheap, using
<tt>rdfs:subPropertyOf</tt> and the like.</p>

<div class="noprint"> <em>(@@ok to just say that
without demonstrating it?)</em></div>

<h2 id="map-viz">Integration with mapping tools</h2>

<p>Let's exploit the effort we have put into going beyond formalized
document structure into formalized data about the real world.
Folks in the <a href="http://www.daml.org/">DAML project</a> have
imported airport lat/long data into the semantic web; we
can use log:semantics to reach out and get it with rules like
these, excerpted from <a href="airportLookup.n3">airportLookup.n3</a>:
</p>

<pre>
# well-known airports...
{ :X a :Y; #@@kludge...
    log:uri [ str:startsWith "http://www.daml.org/cgi-bin/airport?" ] }
 log:implies { :X a :AirportKnownToDAML }.

{ :X apt:iataCode :K.
  :Y log:uri [ is str:concatenation of
               ("http://www.daml.org/cgi-bin/airport?" :K) ];
}
  log:implies { :Y a apt:Airport; apt:iataCode :K; = :X }.

# we only want to look up certain airports...
{ [ k:toLocation :X ]. }
 log:implies { :X a :InterestingPlace }.
{ [ k:fromLocation :X ]. }
 log:implies { :X a :InterestingPlace }.


# believe what daml.org says about airport latitutde/longitudes...
:AirportProperty is rdf:type of
  apt:latitude,
  apt:name,
  apt:iataCode,
  apt:icaoCode,
  apt:location,
  apt:latitude,
  apt:longitude,
  apt:elevation.

{
  :P a :AirportProperty.
  [ a :AirportKnownToDAML, :InterestingPlace;
    log:semantics [
      log:includes {
        :IT :P :V.
      }
    ] ].
} log:implies {
  :IT a apt:Airport; :P :V.
}.
</pre>


<p>For the convenience of consumers (including ourselves), we publish
in RDF/XML the results of reaching out with the rules; i.e. the
itinerary including the lat/long info. Then we use the (<em>little
documented</em>) <tt>cwm --strings</tt> output mode to generate two
files, <tt>itin-arcs</tt> and <tt>itin-markers</tt>, as input to <a
href="http://xplanet.sourceforge.net/">xplanet</a>:</p>

<div class="figure">

<img alt="map viz toolchain" src="mapVizFig.png" />
</div>


<p>The resulting map shows that we have given the
machine a fairly deep understanding of the itinerary:</p>

<div class="figure">

<p><img alt="MCI to YMX and back for Extreme 2002"
src="../../../../2003/04dc-mia/itin-mia.png" /></p>

<div class="noprint mechanics">more details are in the <a href="../../../../2003/04dc-mia/Makefile">Apr 2003 trip Makefile</a></div>
</div>



<h2 id="ical-evo">Integration with iCalendar Tools</h2>

<p>In fact, the published RDF/XML version of the itinerary is joined
not only with latitude/longitude data, but also timezone data, and
elaborated via <a href="itin2ical.n3">itin2ical.n3</a> rules into <a href="../../../../2002/12/cal/">an RDF
representation of the standard iCalendar syntax</a>.</p>

<pre>

{ :FLT
    k:startingDate [ dt:date :YYMMDD];
    k:endingDate [ dt:date :YYMMDD2];
    t:departureTime :HH_MM;
    k:fromLocation [ :timeZone [ cal:tzid :TZ] ];
    t:arrivalTime :HH_MM2;
    k:toLocation [ :timeZone [ cal:tzid :TZ2] ].
  :DTSTART is str:concatenation of
    (:YYMMDD "T" :HH_MM ":00"). #@@ extra punct in dates
  :DTEND is str:concatenation of
    (:YYMMDD2 "T" :HH_MM2 ":00").

  ( :FLT!log:rawUri "@uri-2-mid.w3.org") str:concatenation :UID. #@@hmm... kludge?
}
 log:implies {
  :FLT a cal:Vevent;
    cal:uid :UID;
    cal:dtstart [ cal:tzid :TZ; cal:dateTime :DTSTART ];
    cal:dtend [ cal:tzid :TZ2; cal:dateTime :DTEND ].
}.
</pre>


<p> The final
syntactic export is more complex than the markers/arcs case, so we
wrote a python program, <tt><a
href="toIcal.py">toIcal.py</a></tt>, using the
cwm API, to generate iCalendar syntax.</p>

<div class="figure">
<img alt="calendar integration toolchain" src="calIntFig.png" />
</div>

<p>We can import the resulting iCalendar file into an of
a number of interoperable tools, such as Ximian Evolution:</p>

<div class="figure">
<img alt="evo screenshot" src="calIntShot.png" />
</div>

<div class="noprint mechanics">
FYI: python/evolution stuff: <a
    href="http://heddley.com/edd/2002/03/05/evocal.py">evocal.py</a> from
    Edd. relies on debian package: evolution-dev. also: <a
    href="http://heddley.com/edd/2002/03/05/python-libversit-0.1.tar.gz">python-libversit</a>
</div>

<h2 id="pda-in">Conversion for PDA import</h2>

<p>
The <a href="itin2ical.n3">itin2ical.n3</a> rules
were actually developed after my Palm Pilot broke
and I switched to an iCalendar tool. The original
rules mapped to 
an <a
href="../../../08/palm56/datebook">RDF vocabulary for the palmpilot
datebook</a>, developed for use with <a
href="http://dev.w3.org/cvsweb/2001/palmagent/">palmagent</a>, an
HTTP/RDF interface to my PDA:</p>

<div class="figure">
<img src="travelPdaRulesFig.png" alt="pda rules toolchain" />

<div class="noprint mechanics">see <a
href="../../../../2001/07dc-bos/Makefile">Makefile</a> for (some of
the?) details).  The rules, <a
href="../../../../2001/07dc-bos/itin2datebook.n3">itin2datebook.n3</a>,
use some outdated vocabulary.
</div>

</div>


<div class="noprint"><em>[@@TODO: image of palm pilot showing flight
in the datebook]</em></div>

<h2 id="plain-text-sum">Plain Text Summaries</h2>

<p>All this rich integration is great when the tools are all working
and you have plenty of bandwidth and all that, but sometimes, plain
text is necessary and sufficient for the task at hand. For example, if
I get mail asking when I arrive at the meeting site, mailing back a
map is probably overkill, and I can't be 100% sure their desktop is
iCalendar-happy.</p>

<p>The <tt>cwm --strings</tt> output mode can
be really handy in these cases; we can use a few
<a href="../../../../2002/10dc-uk/itinBrief.n3">itinBrief.n3</a> rules
ala...</p>

<pre>python cwm.py itinBrief.n3 itin.nt --think --strings</pre>

<p>to get a summary ala...</p>

<pre>
2003-04-07 09:40 - 11:16 MCI->DFW Monday AMERICAN AIRLINES #3199
2003-04-07 12:03 - 15:49 DFW->MIA Monday AMERICAN AIRLINES #68
2003-04-10 19:12 - 21:32 MIA->ORD Thursday AMERICAN AIRLINES #1477
2003-04-10 22:33 - 23:54 ORD->MCI Thursday AMERICAN AIRLINES #1081
</pre>


<h2 id="ckcn">Checking Constraints</h2>

<p>Now that I have the proposed itinerary formalized, I can
automatically check it against various constraints before
I accept it and before I copy it to my PDA and to all the
other peers that need to know about it.</p>

<p>Rules like "itineraries that have me leaving
before 30 July are no good" are a bit tedious to
formalize, but my confidence in the results is
higher than my confidence in eyeballing it:</p>

<pre>{
 ?D a k:ItineraryDocument; k:containsInformationAbout-Focally ?TRIP.
 ?TRIP k:subEvents
    [ k:startingDate [ dt:date ?D1 ];
      k:fromLocation [ apt:iataCode "MCI" ];
      t:departureTime ?T1;
    ].
  ?D1 str:lessThan "2001-07-30".
} => {
   ?TRIP &lt;#leavesDaysTooSoon&gt; ?D1;
          &lt;#at&gt; ?T1.
}.</pre>

<p>These constraints can be checked with cwm ala:</p>

<pre>$ python cwm.py proposed-itinerary.nt --think=constraints.n3</pre>

<p>... and look for <tt>&lt;#leavesDaysTooSoon&gt;</tt> in the output.</p>

<div class="noprint mechanics">
<p>More details: <a href="../../../../2001/08swws67/">SWWS stuff</a>.</p>
</div>



<h2 id="fw">Conclusions and Future Work</h2>

<p>For the first few integration tasks, it might have been less work
to just manually copy the data, field by field.  But the return on
investment increases with each trip I take, each system we integrate
with, and each collaborator who develops an interoperable tool.</p>

<p>The DAML project is a source of not just airport lat/long data, but
also tools such as <a
href="http://www.daml.org/2001/06/itinerary/">DAML itinerary</a> by
Mike Dean.</p>

<p>Ideas for future work include:</p>

<ul>

<li>animated views of my travel schedule or <a
href="../../../../People/Connolly/events/">past trips</a></li>

<li>peer-to-peer synchronization, ala "if it's in this schedule
scraped from an HTML page I maintain, but not in my evolution
calendar, print it out in .ics format for import".

<div class="noprint mechanics">
prototyped in <a href="/2002/08dc-ymx/Makefile">Aug 2002 trip Makefile</a>
</div>

</li>

<li>based on my addressbook and my travel schedule (and the travel
schedules of folks in my addressbook...) tell me when my travel
schedule brings me unusually near a friend/colleague</li>

<li>find conflicts between teleconferences and flights</li>

</ul>

<div class="noprint">
<hr />
<address>
  <a href="../../../../People/Connolly/">Dan Connolly</a> and friends<br />
  <small>started Jun 2002<br />
  $Revision: 1.32 $ of $Date: 2004/09/21 15:13:59 $ by $Author: connolly $</small>
</address>
</div>
</body>
</html>