index.html 97.3 KB
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

  <head>
    <meta name="generator" content="Emacs 22" />
    <meta name="RCS-Id" content="$Id: Overview.html,v 1.5
                                 7 2008/01/14 16:10:23 mmarshal Exp $" />
    <title>A Prototype Knowledge Base for the Life Sciences</title>

    <style type="text/css">

      /*<![CDATA[*/
pre	{  }
/* 
  a:link	{ color: green }
  a:visited	{ color: green }
  a:hover	{ color: green }
 */
pre a:link	{ text-decoration: none }

.schema th { text-align: left }
table, td, th	{ border-style: solid;
                  border-width: 1px;
                  border-color: black;
                  border-bottom-color: gray;
                  border-right-color: gray; }
table.dbsTable	{ border-collapse: collapse; border-color: #000000; }
table.dbsTable td:first-child { vertical-align: top; }
table.dbsTable td { padding: 2px 5px 2px 5px; }
table.triplesTable {
	margin-left: 2em;
	border: none;
}
table.triplesTable td {
	border: none;
	table-layout: fixed;
	font-size: 75%; 
	font-family: fixed, monospace; 
 }
table.triplesTable tr.p td {
	font-size: 100%; 
 }
table.triplesTable a:link	{ text-decoration: none }
table.triplesTable tr.p a:link	{ text-decoration: underline }

table.triplesTable th { text-align: left; }
table.triplesTable p { 
	margin-left: -2em;
}
.choice { border-style: dashed; }
.placeholder { visibility: hidden; }
.issue	{ background-color: #fcc; }

/* http://www.w3.org/Style/Examples/007/figures */
div.figure {
/*  float: right; */
/*  width: 25%; */
  border: thin silver solid;
  margin: 0.5em;
  padding: 0.5em;
}
div.figure p {
  text-align: center;
  font-style: italic;
  font-size: smaller;
  text-indent: 0;
}
tt {
  white-space: pre;
}
      /*]]>*/
    </style>
    <link rel="stylesheet" type="text/css" href="local.css" />
    <link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-IG-NOTE" />
  </head>

  <body>

    <div class="head">
      <p><a href="http://www.w3.org/">
      <img src="http://www.w3.org/Icons/w3c_home" alt="W3C" height="48" width="72" /></a></p>

      <h1 id="main">A Prototype Knowledge Base for the Life Sciences</h1>
      <h2  class="no-num no-toc" id="w3c-doctype">W3C Interest Group Note 4 June 2008</h2>
      <dl>
	<!-- dt>Editors working draft.</dt>
	<dd><span class="cvs-id">$Revision: 1.6 $ of $Date: 2008/06/06 00:17:45 $</span></dd>
	<dd>see also <a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">public-semweb-lifesci@w3.org Mail Archives</a></dd>
	<dt>Published W3C Technical Report version:</dt -->
	<dt>This version:</dt>
	<dd><a href="http://www.w3.org/TR/2008/NOTE-hcls-kb-20080604/">http://www.w3.org/TR/2008/NOTE-hcls-kb-20080604/</a></dd>
	<dt>Latest version:</dt>
	<dd><a href="http://www.w3.org/TR/hcls-kb/">http://www.w3.org/TR/hcls-kb/</a></dd>
	<dt>Previous version:</dt>
	<dd><a href="http://www.w3.org/TR/2008/WD-hcls-kb-20080404/">http://www.w3.org/TR/2008/WD-hcls-kb-20080404/</a></dd>
	<dt>Editors:</dt>
	<dd>M. Scott Marshall, University of Amsterdam &lt;<a href="mailto:marshall@science.uva.nl">marshall@science.uva.nl</a>&gt;</dd>
        <dd>Eric Prud&#39;hommeaux, W3C &lt;<a href="mailto:eric@w3.org">eric@w3.org</a>&gt;</dd>

	<dt id="contributors">Contributors:</dt>
	<dd>Alan Ruttenberg, Science Commons &lt;<a href="mailto:alanruttenberg@gmail.com">alanruttenberg@gmail.com</a>&gt;</dd>
	<dd>Jonathan Rees, Science Commons &lt;<a href="mailto:jar@creativecommons.org">jar@creativecommons.org</a>&gt;</dd>
        <dd>Susie Stephens, Lilly &lt;<a href="mailto:Stephens_Susie_M@lilly.com">Stephens_Susie_M@lilly.com</a>&gt;</dd>
        <dd>Matthias Samwald, Yale Center for Medical Informatics; DERI Galway; Semantic Web Company &lt;<a href="mailto:samwald@gmx.at">samwald@gmx.at</a>&gt;</dd>
        <dd>Kei-Hoi Cheung, Yale Center for Medical Informatics &lt;<a href="mailto:kei.cheung@yale.edu">kei.cheung@yale.edu</a>&gt;</dd> 
      </dl>
      <p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> &#169; 2008 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>&#174;</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>
    </div>

    <hr title="Separator for header" />
    <div>
      <h2 class="notoc" id="abstract">Abstract</h2>
      <p>The prototype we describe is a biomedical knowledge base, constructed for a demonstration at <a href="http://www2007.org/prog-W3CTrack.php#thursday">Banff WWW2007</a> , that integrates 15 distinct data sources using currently available Semantic Web technologies such as the W3C standard Web Ontology Language [<a href="#ref-OWL">OWL</a>] and Resource Description Framework [<a href="#ref-RDF">RDF</a>]. This report outlines which resources were integrated, how the knowledge base was constructed using free and open source triple store technology, how it can be queried using the W3C Recommended RDF query language SPARQL [<a href="#ref-SPARQL">SPARQL</a>], and what resources and inferences are involved in answering complex queries. While the utility of the knowledge base is illustrated by identifying a set of genes involved in Alzheimer's Disease, the approach described here can be applied to any use case that integrates data from multiple domains.</p>
    </div>

    <div>
      <h2 id="status">Status of This Document</h2>

<p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a>  at http://www.w3.org/TR/.</em></p>

      <p>
This W3C Interest Group Note describes how one can use the Semantic Web to express and integrate scientific data.
These techniques can be used for modeling any data, and the benefits of integration and model consistency apply to other diverse, distributed data domains.
It is hoped that this document will inspire further contributions to the ongoing work at Neurocommons and the Health Care and Life Sciences Interest Group, as well as inspire those in other domains to exploit the Semantic Web.
      </p>

      <p class="notetoeditor">This document describes the construction and use of the HCLS Knowledgebase used in the <a href="http://esw.w3.org/topic/HCLS/Banff2007Demo">WWW2007 Banff HCLS Demo</a>. It describes the process for creating a bilogical database on the Semantic Web. The companion document, <a href="http://www.w3.org/TR/2008/NOTE-hcls-senselab-20080604/">Experiences with the conversion of SenseLab databases to RDF/OWL</a>, describes the process for integrating new data into this Knowledgebase.</p>

<p>The document was produced by the <a href="http://www.w3.org/2001/sw/hcls/">Semantic Web in Health Care and Life Sciences Interest Group (HCLS)</a>, part of the <a href="http://www.w3.org/2001/sw/">W3C Semantic Web Activity</a> (<a href="http://www.w3.org/2001/sw/hcls/charter">see charter</a>). Comments may be sent to the <a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">publicly archived</a> <a href="mailto:public-semweb-lifesci@w3.org">public-semweb-lifesci@w3.org</a> mailing list. Feedback is encouraged, as is participation in the recently <a href="http://www.w3.org/2008/05/HCLSIGCharter">re-chartered</a> HCLSIG. A <a href="WD2NOTE">list of changes since the last publication</a> is available.</p>

<p>Publication as 
an Interest Group Note
 does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>

<p>This document was produced by a group operating under the disclosure
obligations of the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. The group does 
not expect this document to become a W3C Recommendation. An 
individual who has actual knowledge of a patent which the individual 
believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information to
<a href="mailto:public-semweb-lifesci@w3.org">public-semweb-lifesci@w3.org</a> [<a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">public archive</a>] in accordance with
in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>.</p>

    </div>
    <hr />

    <div class="toc">
      <h2 id="contents">Table of Contents</h2>
      
      <ul class="toc">
	<li class="tocline1"><a href="#introduction">1 Introduction</a> </li>
          <li class="tocline2"><a href="#docscope">1.2 Document Scope and Target Audience</a></li>
	  <li class="tocline2"><a href="#termStability">1.3 Stability of Terms</a></li>
	  <li class="tocline2"><a href="#docConventions">1.4 Document Conventions</a></li>
	  <li class="tocline2"><a href="#docOutline">1.5 Document Outline</a></li>
	<li class="tocline1"><a href="#usecase">2 Use Case</a></li>
	<li class="tocline1"><a href="#dbs">3 Data Sources</a></li>
	<li class="tocline1"><a href="#terms">4 Design Decisions</a></li>
	<li class="tocline1"><a href="#mechanics">5 Importing to RDF - Homologene Example</a></li>

	<li class="tocline1"><a href="#query">6 Query</a></li>
	<li class="tocline1"><a href="#triplemodel">7 Data Model</a><ul>
	    <li class="tocline2"><a href="#preproc">7.1 Precomputing Inferences</a></li>
	  </ul></li>
	<li class="tocline1"><a href="#newsource">8 Adding a New Data Source</a></li>
	<li class="tocline1"><a href="#graphs">9 Named Graphs</a></li>

	<li class="tocline1"><a href="#nextsteps">10 Opportunities for further development</a></li>
      </ul>
    </div>

    <h3 id="appendices">Appendices</h3>
    <ul class="toc">
      <li class="tocline2"><a href="#rdfbundles">A RDF Sources</a></li>
      <li class="tocline2"><a href="#references">B References</a></li>
      <li class="tocline2"><a href="#resources">C Additional Resources</a></li>
      <li class="tocline2"><a href="#acknowledgements">D Acknowledgements</a></li>
    </ul>
    <hr />


    <h2 id="introduction">1 Introduction</h2>
    <p>The life sciences have a rich history of making data available on the Web, because researchers recognized the benefits of sharing data and made it available to other researchers for the benefit of greater science. However, because many of the data repositories were developed in relative isolation, they tend to use different identifier schemes, incompatible terminology, and dissimilar data formats. This makes it hard for researchers to find all data about an entity of interest and to assemble it into a useful block of knowledge. This prototype was built to demonstrate how Semantic Web technologies can integrate such heterogeneous data sets and thereby help scientists to more easily answer interesting scientific questions.</p>
    <p>The key to advancing scientific understanding is empowering scientists with the information that they need to make well-informed decisions. Scientists need to be able to easily gain access to all information about chemical compounds, biological systems, diseases, and the interactions between these entities, and this requires data to be effectively integrated in order to provide a <em>biological systems level</em> view to the user, i.e. a complete view of biological activity. However, achieving this goal has proven to be a formidable challenge in the life sciences, where data and models are found in a large variety of formats and scales that span from the molecular to the anatomical.</p>
    <p>In order to overcome the challenge of gaining insight directly from the Web, a number of laboratories, organizations, and companies have built internal data warehouses from the publicly available data sources. This certainly helps scientists to more easily query for all information related to entities of interest. However, these efforts generally integrate only a subset of publicly available data that is deemed to be of greatest interest, and it has proven difficult to add data sources to the warehouse at a later point. Further, advances in scientific knowledge require regular changes to be made to the underlying data models, and this is not straightforward with a relational model. Organizations that use this approach also typically face challenges with representing data that is at different levels of abstraction, and that includes data of very different quality.</p>
    <p>Many health care and life sciences organizations are interested in the data integration abilities promised by the Semantic Web. More specifically, the benefits include the aggregation of heterogeneous data using explicit semantics, and the expression of rich and well-defined models for data aggregation and search. Semantic Web technologies enable one to more flexibly add additional data sets into the data model, and more easily reuse data in unanticipated ways. Once data has been aggregated, a Semantic Web reasoner computes implied relationships among the aggregated data resulting in tighter integration and the possibility of additional insights.</p>
    <p>This prototype knowledge base imports data from data sources that span multiple domains in the life sciences to make cross-discipline queries. It therefore provides a working (and reproducible) example of the possibilities that become available via knowledge integration. The use of an RDF repository to store RDF and OWL makes it possible to query, manipulate, and reason about the data with standard tools, such as OWL reasoners, and languages, such as the SPARQL Query Language for RDF. Although this document addresses a specific use case, the approach described here can be applied to any use case that integrates data from multiple domains.</p>

    <h3 id="docscope">1.2 Document Scope and Target Audience</h3>

    <p>This document attempts to succinctly describe how this knowledge base was constructed so that interested parties can use the core techniques to create their own knowledge base. We have attempted to write a general description but, unavoidably, the knowledge base makes use of specialized resources, such as those found in the <a href="#dbs">Data Sources</a> section. Some, but not all, of the reasoning behind design decisions is explained. Several technologies such as the Semantic Web standards RDF, OWL, and SPARQL were used, but in order to keep this document to a manageable size, we will not explain all aspects in the depth that would be required for those new to the area. Those interested in a general introduction to the Semantic Web should see <a href="http://www.semanticwebprimer.org/">The Semantic Web Primer</a>. See also the <a href="http://www.co-ode.org/">CO-ODE web site</a> for a <a href="http://www.co-ode.org/resources/tutorials/protege-owl-tutorial.php">hands-on OWL tutorial with Prot&eacute;g&eacute;</a>. For materials introducing ontology see <a href="http://www.bioontology.org/">National Center for Biomedical Ontology</a>(NCBO) <a href="http://www.bioontology.org/wiki/index.php/Introduction_to_Biomedical_Ontologies">Introduction to Biomedical Ontologies</a>. For materials related to reasoning see <a href="http://www.cs.man.ac.uk/~horrocks/Teaching/cs646/">The Semantic Web: Ontologies and OWL</a>.

</p>

    <h3 id="termStability">1.3 Stability of Terms</h3>

    <p>This document uses URLs to identify records about biological entities and processes. The identifiers used in this document are the same as those used in the prototype knowledge base and are not yet stable. Knowledge base implementors should use these terms whenever possible.</p>

    <h3 id="docConventions">1.4 Document Conventions</h3>

    <p>RDF data in this document is expressed in Turtle [<a href="#ref-TURTLE">TURTLE</a>]. Queries on this data are expressed in SPARQL [<a href="#ref-SPARQL">SPARQL</a>]. The following namespace prefix bindings are assumed unless otherwise stated:</p>
    <div style="text-align: center;">
      <table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5">
	<tr><th>Prefix</th>				<th>URI</th>	  <th>Description</th></tr>
	<tr><td><code>rdf:</code></td>			<td><code>http://www.w3.org/1999/02/22-rdf-syntax-ns#</code></td>	<td>The RDF Vocabulary</td></tr>
	<tr><td><code>rdfs:</code></td>			<td><code>http://www.w3.org/2000/01/rdf-schema#</code></td>	<td>The RDF Schema vocabulary</td></tr>
	<tr><td><code>xsd:</code></td>			<td><code>http://www.w3.org/2001/XMLSchema#</code></td>	<td>XML Schema</td></tr>
	<tr><td><code>sc:</code></td>			<td><code>http://purl.org/science/owl/sciencecommons/</code></td>	<td>The <i>ad hoc</i> Science Commons ontology</td></tr>
	<tr><td><code>pubmedRec:</code></td>		<td><code>http://purl.org/commons/record/pmid/</code></td>	<td>PubMed records (not the articles themselves)</td></tr>
	<tr><td><code>article:</code></td>		<td><code>http://purl.org/science/article/pmid</code>/</td>	<td>PubMed articles</td></tr>
	<tr><td><code>ncbi_gene:</code></td>		<td><code>http://purl.org/commons/record/ncbi_gene/</code></td>	<td>Entrez Gene records (not the genes themselves)</td></tr>
	<tr><td><code>proteinsubclass:</code></td>		<td><code>http://purl.org/science/protein/subjects/</code></td>	<td>Proteins of a given gene participating in a given pathway</td></tr>
	<tr><td><code>go:</code></td>			<td><code>http://purl.org/obo/owl/GO#</code></td>	<td>Gene Ontology terms</td></tr>
	<tr><td><code>protein:</code></td>		<td><code>http://purl.org/science/protein/bysequence/</code></td>	<td>National Center for Biotechnology Information (NCBI) records for Genes sequences</td></tr>
	<tr><td><code>ro:</code></td>			<td><code>http://www.obofoundry.org/ro/ro.owl#</code> (<a href="http://www.berkeleybop.org/ontologies/obo-all/ro_proposed/ro_proposed.owl">proposed update</a> may be more complete)</td>	<td>Relation Ontology (RO): Relationships between members of OBO classes</td></tr>
	<tr><td><code>obo:</code></td>			<td><code>http://purl.org/obo/owl/obo#</code></td>	<td>Open Biomedical Ontologies (OBO)</td></tr>

	<tr><td><code>senselab:</code></td>		<td><code>http://purl.org/ycmi/senselab/neuron_ontology.owl#</code></td>	<td>Neuroscience ontology derived from the SenseLab NeuronDB database</td></tr>

	<tr><td><code>dnaGeneProduct:</code></td>	<td><code>http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_</code></td>	<td>Syntactic trick to shorten <em>sc:is_protein...described_by</em></td></tr>
      </table>
    </div>


    <h3 id="docOutline">1.5 Document Outline</h3>

    <p><a href="#introduction">1 <em>Introduction</em></a> motivates and explains this document.</p>

    <p><a href="#usecase">2 <em>Use Case</em></a> introduces an
    interesting scientific question that the knowledge base can be used
    to address.</p>

    <p><a href="#dbs">3 <em>Data Sources</em></a> describes the data sources that have been incorporated into the knowledge base.</p>

    <p><a href="#terms">4 <em>Design Decisions</em></a> explains the reasons for several design choices.</p>

    <p><a href="#mechanics">5 <em>Importing to RDF - Homologene Example</em></a> explains the process of translating data into RDF triples.</p>

    <p><a href="#query">6 <em>Query</em></a> explains the use case query that answers the scientific question.</p>

    <p><a href="#triplemodel">7 <em>Data Model</em></a> explains the basics of RDF triples.</p>

    <p><a href="#newsource">8 <em>Adding a New Data Source</em></a> explains how the SenseLab database was integrated.</p>
    <p><a href="#graphs">9 <em>Named Graphs</em></a> discusses the use of named graphs and query details.</p>
    <p><a href="#nextsteps">10 <em>Opportunities for further development</em></a> discusses problem areas and possible improvements.</p>

    
    <h2 id="usecase">2 Use Case</h2>

    <p>Alzheimer's is a debilitating neurodegenerative disease that affects approximately 27 million people worldwide. The cause of Alzheimer's is currently unknown and no therapy is able to halt its progression. However, insight into the mechanism and potential treatment of this debilitating disease may come from the integration of neurological, biomedical and biological resources. The knowledge base assembles several neurology-related resources alongside an array of clinical and biological resources. This makes it possible to integrate knowledge across several research domains and potentially provide insight into the mechanisms of the disease.</p>
    <p>The scientific question under scrutiny in our use case involves several elements of putative functional importance to Alzheimer's. CA1 Pyramidal Neurons (CA1PN) are known to be particularly damaged in Alzheimer's disease and play a key role in signal transduction. Signal transduction pathways are considered to be rich in proteins that might respond to chemical therapy. By integrating information about signal transduction, pyramidal neurons, their genes, and gene products, the query corresponding to our scientific question can provide information relevant to researchers that are looking for drug target candidates that are potentially effective against Alzheimer's Disease.</p>

    <h2 id="dbs">3 Data Sources</h2>
     
    <p>In order to incorporate data from several information sources, it was necessary to convert several exported formats, each into its own <em>RDF bundle</em>. The largest RDF bundle of 200M triples resulted from MeSH associations with PubMed articles. In contrast, there were a number of smaller bundles ranging from 10K to 10M triples. This resulted in a total of approximately 350M triples occupying approximately 20GB when loaded into the RDF repository. In several cases, we extracted only a subset, for example, by selecting only human, rat, and mouse data. Click on <em>[Details]</em> in the table below to view details such as the date of the last extraction.</p>

    <p>At the time of publication, the following information sources have been (sometimes partially) incorporated into the knowledge base. This set will continue to be extended in depth (i.e., more complete inclusion of partially represented data sets) and in breadth (i.e., novel data sets):</p>
    <table class="dbsTable">
      <tr><td class="db"><a href="http://www.brain-map.org/">Allen Brain Atlas (ABA)</a></td>
      <td>Allen Brain Atlas is an interactive, genome-wide image database of gene expression in the mouse brain. A combination of RNA <i>in situ</i> hybridization data, detailed Reference Atlases and informatics analysis tools are integrated to provide a searchable digital atlas of gene expression. Together, these resources present a comprehensive online platform for exploration of the brain at the cellular and molecular level.</td>
      <td><a href="#aba">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://www.addgene.org/">Addgene</a></td>
      <td>A catalog of plasmids from Addgene</td>
      <td><a href="#addgene">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://brancusi.usc.edu/bkms/">BAMS</a></td>
      <td>The Brain Architecture Management System (BAMS) is designed to be a repository of information about brain structures from different species, and has a set of inference engines for processing the neurobiological data. BAMS contains five interrelated modules: Brain Parts (brain regions, major fiber tracts, and ventricles), Cell Types, Molecules, Relations (between structures from different neuroanatomical atlases), and Connections.</td>
      <td><a href="#bams">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://www.opengalen.org/faq/faq4.html">GALEN</a></td>
      <td>GALEN is an advanced terminology of medical concepts for clinical information systems. We imported the <a href="http://www.co-ode.org/galen/">GALEN ontology</a> in OWL from <a href="http://www.co-ode.org/">CO-ODE</a>.</td>
      <td><a href="#galen">[Details]</a></td></tr>

      <tr><td class="db"><a href="ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/README">NCBI gene_info</a></td>

      <td>Information from the gene_info file distributed by NCBI that was imported into OWL.</td>
      <td><a href="#geneinfo">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://www.geneontology.org/">Gene Ontology (GO)</a></td>
      <td>The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. GO terms are often used to annotate gene and protein records.</td>
      <td><a href="#goa">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://www.ebi.ac.uk/GOA/">GOA</a></td>
      <td><a href="http://www.geneontology.org/">GO</a> annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI).</td>
      <td><a href="#goa">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=homologene">HomoloGene</a></td>
      <td> Homologene is a system for automated detection of homologs among the annotated genes of several completely sequenced <a href="http://en.wikipedia.org/wiki/Eukaryotic">eukaryotic</a> genomes.</td>

      <td><a href="#homologene">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://pubmed.gov">MEDLINE/PubMed</a></td>
      <td>PubMed is a service of the <a href="http://www.nlm.nih.gov/">U.S. National Library of Medicine</a> that includes over 17 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. PubMed includes links to full text articles and other related resources.</td>
      <td><a href="#pubmed">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://www.nlm.nih.gov/mesh/introduction2008.html">MeSH</a></td>
      <td>Medical Subject Headings. 2008 MeSH includes the subject descriptors appearing in <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed">MEDLINE/PubMed</a>, the National Library of Medicine (NLM) catalog database, and other NLM databases.</td>
      <td><a href="#mesh">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://sw.neurocommons.org/2007/text-mining.html">Neurocommons Text Mining Pilot</a></td>

      <td>Protein/gene associations/interactions extracted from <a href="http://www.temis.com/">Temis</a> software applied to 7% of Medline records. Annotations were captured in RDF using the <a href="http://sw.neurocommons.org/2007/schema.html">Neurocommons Annotations Schema</a>.</td>
      <td><a href="#textmining">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://www.berkeleybop.org/ontologies/">Open Biomedical Ontologies (OBO)</a></td>
      <td>All Open Biomedical Ontologies (<a href="http://obofoundry.org/">OBO</a>) available from <a href="http://www.berkeleybop.org/">Berkeley Bioinformatics Open-source Projects.</a></td>

      <td><a href="#obo">[Details]</a></td></tr>

<!--      <tr><td class="db">Selected Open Biomedical Ontologies</td> 
      <td>Selected OBO ontologies, downloaded ~21 April 2007, augmented with inferred relations</td></tr> -->
        
      <tr><td class="db"><a href="http://sw.neurocommons.org/2007/kb-sources/sciencecommons.owl">Science Commons Ontology</a></td>
      <td>A bridging ontology, from <a href="http://sciencecommons.org">Science Commons</a>, importing other ontologies used in the prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology.</td>
      <td><a href="#sciencecommons">[Details]</a></td></tr>

      <tr id="ds_SenseLab"><td class="db"><a href="http://neuroweb.med.yale.edu/senselab/">SenseLab</a></td>
      <td>See <a href="http://www.w3.org/TR/hcls-senselab/">Experiences with the conversion of SenseLab databases to RDF/OWL</a>.</td>
      <td><a href="#senselab">[Details]</a></td></tr>

      <tr><td class="db"><a href="http://swan.mindinformatics.org/">SWAN</a></td>
      <td><a href="http://purl.org/swan/1.1/">Semantic Web Applications in Neuromedicine</a> [<a href="#ref-SWAN">SWAN</a>] is a knowledge base of hypotheses, claims, and evidence in Alzheimer Disease (AD) research, created through a community process to capture the collective scientific insights of the AD field.</td>
      <td><!-- a href="#swan">[Details]</a></td -->Not yet public</td></tr>

  
    <tr><td class="db"><a href="http://www.w3.org/2004/02/skos/">SKOS</a></td>
    <td>Simple Knowledge Organization System (SKOS): specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web.</td>

    <td><a href="#mesh-skos">[Details]</a></td>
<!--     <td><a href="http://www.w3.org/Consortium/Legal/copyright-software" -->
<!--       >W3C software licensing rules</a></td> -->
    </tr>

    </table>

    <h2 id="terms">4 Design Decisions</h2>
    <p>A number of design decisions were made during the construction
    of the prototype knowledge base. Many of the decisions were pragmatic in
    nature, as a consequence of the need to implement the solution on
    a commodity PC within a two-month period for a demonstration at
    WWW2007.</p>
   <ul>
     <li><em><b>URI Scheme</b></em>
     
     <br></br>HTTP URIs were adopted as the mechanism to identify
     biological entities. In particular, URIs with a Persistent URL
     (<a href="http://purl.org">PURL</a>) were used as they provide re-direction capabilities, which
     make the identifiers more robust against future change.</li>

    <li><em><b>Unifying terms</b></em>

    <br></br> While data in different information sources may talk
    about the same thing, one must provide a common set of identifiers
    in order to get the RDF graph to connect. For instance, the named
    graph PubMesh uses gene record
    identifiers to relate genes to PubMed articles. It uses terms like
    <span class="var gene">ncbi_gene:1812</span> to identify a gene
    record. The Gene Ontology database
    records use the same identifiers, which allows us to easily link
    information contained in the two corresponding named graphs. New
    databases are able to connect their data graphs to the existing
    store by re-using the same terms. We accomplished this by
    translating internal identifiers from the databases into URIs in
    our chosen scheme.</li>

     <li><em><b>Ontology Design</b></em>

    <br></br> An ontology was built with sufficient detail for the
    immediate needs of the demonstration and was limited by the date
    of the demo. Consequently, it contains more detail in the core
    areas of focus, than in areas of more peripheral interest. The
    ontology was written in OWL-DL so that we could specify statements
    in an interoperable and computable way. We also wanted to verify
    small subsets for consistency during development, with the hope
    that in the future a more capable repository will be able to do
    appropriate inferences based on the class and property
    definitions. The ontology distinguishes between real world
    entities and documents about real world entities. We endeavored to
    follow the OBO foundry methodology, which espouses the principle
    that we first identify what instances are by identifying them with
    physical things, such as a molecule in some person's body. Classes
    are defined as sets of those instances. For example, the class of
    glutamate receptors can be defined as multimeric macromolecules
    that have high binding affinity for glutamate molecules. Expressed
    more formally, we can say <i>EVERY</i> glutamate receptor
    <i>IS_A</i> multimeric macromolecule <i>THAT</i> has high binding
    affinity for <i>SOME</i> glutamate molecule. In this way, the
    class of glutamate receptors can be defined in terms of the
    classes multimeric macromolecules and glutamate molecule,
    something which OWL expresses quite naturally. The knowledge base
    contains many such definitions of classes.</li>

     <li><em><b>Multiple Graphs</b></em>
     <br></br>

      Once the data was converted into RDF/OWL, it was loaded into the
      triple store as a number of separate graphs. This approach made
      it simpler to re-load and update data, which was required often
      as a consequence of iterative enhancements to the ontology. This
      fast upload capability proved critical as the data reached the
      scale of hundreds of millions of triples. This partitioning of
      data also helped queries to be performed rapidly.</li>
<!-- The core
     constructs used by the ontology include (has value?)... -->

     <li><em><b>Precomputed Inferences</b></em>

     <br></br>

     Our approach has been to choose a representation in valid OWL-DL,
     with the expectation that queries would be evaluated against all
     answers that could be inferred from our representation. However,
     our triple store has no native inferencing capabilities. To
     enable querying against inferred information, we added
     pre-computed inferences in the form of non-OWL-DL, direct
     class-class relations, to the <em>classrelations</em> graph (see
     <a href="#graphs">Named Graphs</a> section). These non-OWL-DL
     relations were added so that it would be easy to use SPARQL
     queries to access the inferences, which were in some cases
     represented in OWL as property restrictions, as in the case of
     partonomic relations. The direct class-class relations were more
     compact to represent in RDF and queries that took advantage of
     them were easier to write in SPARQL.</li>
   </ul> 
   
    <h2 id="mechanics">5 Importing to RDF - Homologene Example</h2>

    <p>A number of different approaches were used for the conversion
    of data into RDF/OWL. The most commonly used approach was the use
    of <a href="http://svn.neurocommons.org/svn/trunk/convert/">Lisp code</a> to read text exports of the data and create OWL or
    RDF documents. We will focus on the example of importing data
    from Homologene.</p>

    <p>The general steps required to import from an existing data source into RDF are:</p>
    <ul>
     <li><i>Read the data into your program:</i> This can be accomplished by
     first exporting to a text format of choice (CSV, tab-delimited,
     XML, etc.) and reading that format in or accessing the database
     directly with a database connector.</li>
     <li><i>Write the data into the desired RDF format:</i> This can be in
     the form of an RDF/XML file that is then loaded into the
     repository. The Turtle format of RDF is also often supported and may be
     easier to produce and manipulate. Another approach is to use
     software libraries that allow you to add triples directly to your
     repository.</li>
    </ul>

<!--    <p><em>Note: Although some HCLSIG members have suggested a preference for exporting to XML and creating the RDF from XQuery, we didn't apply that technique here.</em></p> -->

    <p>In the case of Homologene, we start with a text file that contains the exported information. The original tab delimited file is <a href="ftp://ftp.ncbi.nih.gov/pub/HomoloGene/build54/homologene.data">ftp://ftp.ncbi.nih.gov/build54/homologene.data</a>.</p>

<p>Here is a sample of the original file:</p>
<pre style="border: double;">
99949	9606	727759	LOC727759	113427825	XP_001125931.1
99949	10116	678753	LOC678753	109498373	XP_001053282.1
99949	5833	812783	GeneID:812783	16805082	NP_473111.1
99950	3702	820917	AT3G16650	18401203	NP_566557.1
</pre>

<p>We are interested in the first 3 fields. The first field identifies
the homologous cluster.  The second field is the species taxon.  The
third field is the EntrezGene id. We are only interested in human,
mouse, rat, taxon ids: "9606" "10116" "10090".</p>

<p>The <a href="http://svn.neurocommons.org/svn/trunk/convert/homologene.lisp">Lisp code</a> for the homologene conversion is also available. In the conversion process, we first iterate over the lines in the file, creating a table mapping cluster id to the pairs of taxon id, entrez id in the cluster. This is the variable <em>homologene</em>, created by the function <em>read-homologene</em>. For each of these clusters we will create an individual to represent the cluster e.g for cluster 99949:</p>

<pre>
  &lt;sciencecommons:orthology_record rdf:about="http://purl.org/science/record/homologene/cluster_r54_99949"&gt;
    &lt;sciencecommons:has_homologous_gene_record rdf:resource="http://purl.org/commons/record/ncbi_gene/678753"/&gt;
    &lt;sciencecommons:has_homologous_gene_record rdf:resource="http://purl.org/commons/record/ncbi_gene/727759"/&gt;
    &lt;sciencecommons:has_supporting_evidence rdf:resource="http://purl.org/science/evidence/homologene/cluster_r54_99949"/&gt;
  &lt;/sciencecommons:orthology_record&gt;
</pre>

    <p>Above is the RDF/XML (see [<a href="#ref-RDF">RDF</a>]) expression of:</p>

<pre>@PREFIX homologene: &lt;http://purl.org/science/record/homologene/&gt;
homologene:cluster_r54_99949 sciencecommons:has_homologous_gene_record rdf:resource ncbi_gene:678753 .
homologene:cluster_r54_99949 sciencecommons:has_homologous_gene_record rdf:resource ncbi_gene:727759 .
homologene:cluster_r54_99949 sciencecommons:has_supporting_evidence homologene:cluster_r54_99949 .</pre>

<p>Note that we used HTTP URLs to identify Homologene records by prefixing the EntrezGene identifiers (e.g. <code>727759</code>) with a stem URL, <code>http://purl.org/commons/record/ncbi_gene/</code>. The <a href="http://purl.org/commons/record/ncbi_gene/727759">resulting URL</a> can be usefully resolved with a web browser. The domain purl.org serves Persistent URLs (PURLs), which currently redirect these requests for NCBI gene identifiers to a script at sw.neurocommons.org. If the community wishes to move the service to, for instance, an NCBI page about these genes, they can simply notify the custodians of purl.org. This extra level of indirection protects these identifiers from becoming orphaned as organizations stop existing or change their priorities.

These URLs were also used to identify gene information imported from other data sources, automatically linking the Semantic Web representations of these records. For example, PubMesh statements about gene records use these same identifiers for genes, as do the statements from Gene Ontology and SenseLab. This allows for trivial data integration between different resources involving Entrez Gene records.</p>

<!-- A placeholder has been put in the science commons ontology to
    allow information relating to the 'evidence' of an
    assertion. For example, this could include the BLAST scores to be
    used to establish the level of orthology. -->

    <p>Also, the individual http://purl.org/science/evidence/homologene/cluster_r54_99949 serves as a link to the "evidence", which is not elaborated in this translation, but would include the BLAST scores and other evidence used to establish the orthology in future work. (see <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;db=homologene&amp;dopt=AlignmentScores&amp;list_uids=99949">http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&amp;db=homologene&amp;dopt=AlignmentScores&amp;list_uids=99949</a>)</p>

    <h2 id="query">6 Query</h2>

    <p>Our scientific question can be summarized as "What genes are
    involved in signal transduction that are related to pyramidal
    neurons?". The scientific question can be answered with the
    following query, which searches for gene names and processes from
    four data sources within the knowledge base. The data sources
    include: MeSH (Pyramidal Neurons), PubMed (Journal Articles),
    Entrez Gene (Genes), Gene Ontology (Signal Transduction). The example <a
    href="pyrNeurSigTransduct.rq">query</a> selects the gene
    name of the genes involved in <em>signal transduction</em> that
    are related to <em>pyramidal neurons</em>. Some of the complexity
    in this query comes from the need to capture relevant anatomical
    and functional detail at the subcellular and molecular level. The
    portion probing the Gene Ontology queries
    a set of classes describing processes at the molecular level. Our
    query employs the SPARQL RDF query language to perform knowledge
    integration across the sources of the knowledge base. Details on
    SPARQL can be found in the <a
    href="#references">References</a>.</p>

    <p><em>[Note: The query below will not work verbatim at SPARQL endpoints. We have simplified the actual <a href="#qWithGraphs">Banff demonstration query</a> for explanatory purposes in our example below. The Banff demonstration query is discussed in more detail in <a href="#graphs">Named Graphs</a> Section. You can try running the query <a href="http://purl.org/hcls/hclskb_demo.html">HERE</a>.].</em></p>

    <p>Please note that the same color (and CSS class) is used to connect the descriptive text in the query with relevant portions of the following figures.</p>

    <table class="dbsTable" style="font-size: 75%">
      <thead>
	<tr>
	  <th>Source</th>
	  <th>(colored) CSS class</th>
	</tr>
      </thead>
      <tbody>
	<tr>
	  <td>PubMesh</td>
          <td><span class="mesh">mesh</span></td>
	</tr>
	<tr>
	  <td>Gene Ontology Annotation (GOA)</td>
          <td><span class="goa">goa</span></td>
	</tr>
	<tr>
	  <td>Entrez Gene</td>
          <td><span class="glbl">glbl</span></td>
	</tr>
	<tr>
	  <td>Gene Ontology</td>
          <td><span class="plbl">plbl</span></td>
	</tr>
      </tbody>
    </table>


    <pre class="query" id="geneProc">SELECT ?genename ?processname

WHERE {
<span class="mesh" id="pyrNeurSigTransduct_mesh" title="Medical Subject Headings">  <span class="comment"># PubMeSH includes <span class="var gene">?gene_record</span>s mentioned in <span class="var article">?article</span>s which are identified by pmid in <span class="var">?pubmed_record</span>s .</span>
  ?pubmed_record sc:has-as-minor-mesh <a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a> .
  ?article sc:identified_by_pmid ?pubmed_record .
  <span class="var gene">?gene_record</span> sc:describes_gene_or_gene_product_mentioned_by ?article .</span>

<span class="goa" id="pyrNeurSigTransduct_goa" title="Gene Ontology Database">  <span class="comment"># The Gene Ontology has a set of <span class="var protein">?protein</span>s such that foreach <span class="var protein">?protein</span>, <span class="var protein">?protein</span> ro:has_function [ ro:realized_as <span class="var process">?process</span> ].</span>
  ?protein rdfs:subClassOf ?restriction1 .
  ?restriction1 owl:onProperty ro:has_function .
  ?restriction1 owl:someValuesFrom ?restriction2 .
  ?restriction2 owl:onProperty ro:realized_as .
  ?restriction2 owl:someValuesFrom <span class="var process">?process</span> .
  <span class="comment"># Also, foreach ?protein, ?protein has a parent class which is linked by some predicate to <span class="var gene">?gene_record</span>.</span>
  ?protein rdfs:subClassOf ?protein_superclass .
  ?protein_superclass owl:equivalentClass ?restriction3 .
  ?restriction3 owl:onProperty <abbr title="http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_described_by">dnaGeneProduct:described_by</abbr> .
  ?restriction3 owl:hasValue <span class="var gene">?gene_record</span> .
  <span class="comment"># Each <span class="var process">?process</span> (that we are interested in) is a subclass of the <em>signal transduction</em> process.</span>
  <span class="var process" id="query-part-of">?process</span> obo:part_of <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165"><span class="identifier" title="signal transduction">go:GO_0007165</span></a> .</span>

<span class="glbl" id="pyrNeurSigTransduct_glbl" title="Gene Labels">  <span class="var gene">?gene_record</span> rdfs:label ?genename .</span>

<span class="plbl" id="pyrNeurSigTransduct_plbl" title="Process Labels">  <span class="var process">?process</span> rdfs:label ?processname .</span>
}</pre>


    <p>The following shows a few of the results from the query:</p>

    <table>
      <thead>
	<tr>
	  <!-- th>pubmed_record</th -->
	  <th>gene_record_name</th>
	  <th>processname</th>
	  <!-- th>receptor_protein_name</th -->
	</tr>
      </thead>
      <tbody>
	<tr>
	  <!-- td>http://purl.org/commons/record/pmid/16640790</td -->
	  <td>Entrez Gene record for human DRD1, 1812</td>
	  <td>adenylate cyclase activation</td>
	  <!-- td>D1 receptor</td -->
	</tr>
	<tr>
	  <!-- td>http://purl.org/commons/record/pmid/16640790</td -->
	  <td>Entrez Gene record for human ADRB2, 154</td>
	  <td>adenylate cyclase activation</td>
	  <!-- td></td -->
	</tr>
	<tr><td colspan="3">...</td>                                                                       	</tr>
      </tbody>
    </table>

    <p>The following section describes the RDF data model and how we employed it to make our query possible.</p>

    <h2 id="triplemodel">7 Data Model</h2>

    <p>The data in the knowledge base is modeled in OWL-DL, which has been expressed as RDF triples. Briefly, an RDF triple consists of a <em>subject</em>, <em>predicate</em>, and <em>object</em>. The predicate is also known as the <em>property</em> of the triple. Subjects and objects in the data unify to create an RDF Graph, with subjects and objects as nodes and predicates as edges. For more information about RDF and OWL, see the <a href="#references">References</a> section in the Appendix.</p>

    <p>Nodes labeled with a leading "_:", e.g. <em>_:activateAdenalCyclase</em>, are called <a href="http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-blank-nodes">RDF blank nodes</a> [<a href="#refCONCEPTS">CONCEPTS</a>]. These frequently have machine-generated identifiers and are therefore typically opaque to a human reader (e.g., the set of all nodes that represent protein entities linked to the GO molecular function Adenal Cyclase Activation). Here, for the purposes of explanation, they have been named to convey meaning to the reader. Blank nodes ending in "_1" in this document indicate this blank node is one of many in this class, e.g. <em>_:signalingParticipants_1</em>.</p>

    <div class="figure">
      <p id="triplesPicture">
	<object data="triples.svg" type="image/svg+xml" height="490" width="850"><img src="triples.png" alt="Triples in Solution" /></object>
	<!-- img src="triples.png" alt="Triples in Solution" / -->
      </p>
      <p>Figure 1. Triples in Solution [<a href="triples.svg">SVG image</a> <a href="triples.png">PNG image</a>]</p>
    </div>

    <p>Figure 1, <em>Triples in Solution</em>, shows a graphical representation of the triples that compose <em>one</em> solution to the query posed in <a href="#query">section 6</a>. Following is a discussion of the origins and intents of those triples:</p>

    <p>The <a href="http://sw.neurocommons.org/2007/text-mining.html">application of a commercial text mining tool</a> to neuroscience-related PubMed abstracts results in a set of annotations that link MeSH terms to genes (for more details on MeSH, see the table in <a href="#dbs">Data Sources</a>). An article with PubMed id 10698743 mentions <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a> and that the corresponding PubMed record has a MeSH term <a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a>. The following three triples express this:</p>

    <table id="triplesTable" class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <thead>
	<tr>
	  <th>subject</th>
	  <th>predicate</th>
	  <th>object</th><th class="placeholder"></th><th class="placeholder"></th>
	</tr>
      </thead>
      <tbody>
	<tr class="mesh" id="pyrNeurSigTransduct_triples_mesh" title="Medical Subject Headings">  <td>pubmedRec:10698743</td>            <td>sc:has-as-minor-mesh</td> <td><a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a></td> <td>.</td></tr>

  <tr class="mesh"><td>article:10698743</td>           <td>sc:identified_by_pmid</td> <td>pubmedRec:10698743</td> <td>.</td></tr>
  <tr class="mesh"><td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td>            <td>sc:describes_gene_or_gene_product_mentioned_by</td> <td>article:10698743</td> <td>.</td></tr>
      </tbody>
    </table>

    <p>A set of genes or gene products in human bodies are described by <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a>. Here, we call this set <em>_:equiv1812</em>.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="goa"><td>_:equiv1812</td>               <td>owl:onProperty</td>      <td title="http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_described_by">dnaGeneProduct:described_by</td> <td>.</td></tr>

  <tr class="goa"><td>_:equiv1812</td>               <td>owl:hasValue</td>        <td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td> <td>.</td></tr>
      </tbody>
    </table>

    <p><em><span>protein:ncbi_gene.1812</span></em> has the same extension (members) as the OWL restriction <em>_:equiv1812</em>.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="goa"><td>protein:ncbi_gene.1812</td> <td>owl:equivalentClass</td> <td>_:equiv1812</td> <td>.</td></tr>
      </tbody>
    </table>

    <p>The expression</p>
<pre>NamedClass equivalentClass R .
R onProperty SomeProperty .
R hasValue SomeClass</pre>
    <p> is the RDF representation of an OWL class axiom that says: for all X such that</p>
<pre>X SomeProperty SomeClass .</pre>
    <p>X is a member of the class <code>NamedClass</code> (and vice versa). See <a href="http://www.w3.org/TR/owl-semantics/mapping#transformation_hasValue">OWL Web Ontology Language Semantics and Abstract Syntax Section 4. Mapping to RDF Graphs</a> for a formal treatment of this.</p>

	<p>Using our other supplied constant, we note that adenylate cyclase activation, <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&amp;selected=GO:0007190&amp;viz=graph"><span class="var process" title="adenylate cyclase activation">go:GO_0007190</span></a>, is part of signal transduction, <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165&amp;selected=GO:0007165&amp;viz=graph"><span title="signal transduction" class="identifier input">go:GO_0007165</span></a>. <span id="subclass-part-of" class="note">Note: this simplified query matches only processes that are a sub-process of go:GO_0007165; the <a href="#qWithGraphs">actual query</a>, described in <a href="#graphs">§9 Named Graphs</a>, looks also for subclasses. The part_of relationships were inferred from the OWL class restrictions described in <a href="#preproc">§7.1 Precomputing Inferences</a>.</span> The class of functions that are <em>realized_as</em> adenylate cyclase activation is here labeled <em>_:activateAdenylCyclase</em>.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="goa"><td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&amp;selected=GO:0007190&amp;viz=graph"><span class="var process">go:GO_0007190</span></a></td>             <td>obo:part_of</td>     <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165&amp;selected=GO:0007165&amp;viz=graph"><span class="identifier" title="signal transduction">go:GO_0007165</span></a></td> <td>.</td></tr>

  <tr class="goa"><td>_:activateAdenylCyclase</td>   <td>owl:onProperty</td>       <td>ro:realized_as</td> <td>.</td></tr>
  <tr class="goa"><td>_:activateAdenylCyclase</td>   <td>owl:someValuesFrom</td>  <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&amp;selected=GO:0007190&amp;viz=graph"><span class="var process">go:GO_0007190</span></a></td> <td>.</td></tr>
      </tbody>
    </table>

    <p>There are many possible classes of substance participating in molecular signaling, one of which (called here <em>_:molecularSignalers_1</em>) is defined by the ability to activate adenyl cyclase.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="goa"><td>_:signalingParticipants_1</td>    <td>owl:onProperty</td>      <td>ro:has_function</td> <td>.</td></tr>
  <tr class="goa"><td>_:signalingParticipants_1</td>    <td>owl:someValuesFrom</td>  <td>_:activateAdenylCyclase</td> <td>. <span class="comment"></span></td></tr>
      </tbody>
    </table>

	<p>The class of proteins in the intersection of <em>_:signalingParticipants_1</em> and <em><span>protein:ncbi_gene.1812</span></em> is here abbreviated <em>proteinsubclass:p1812_7190_1</em>, though the actual identifier is <em>proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a</em>. <span id="empty-protein-set" class="note">Note: the Venn diagram reveals that this set is potentially empty, theoretically permitting the query to range over pairs of gene/process that aren't related through any known protein. However, OWL-DL reasoners will not infer new classes, so the proteins in the intersection of ncbi_gene:1812 and the substances participating in molecular signaling is restricted to the set which have already been entered into the knowledge base, e.g. like <em>proteinsubclass:p1812_7190_1</em></span></p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
<tr class="goa" id="pyrNeurSigTransduct_triples_goa" title="Gene Ontology Database">  <td title="proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a">proteinsubclass:p1812_7190_1</td>               <td>rdfs:subClassOf</td>     <td>_:signalingParticipants_1</td> <td>.</td></tr>
  <tr class="goa"><td title="proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a">proteinsubclass:p1812_7190_1</td>               <td>rdfs:subClassOf</td>     <td>protein:ncbi_gene.1812</td> <td>.</td></tr>
      </tbody>
    </table>

    <p><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a> and <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&amp;selected=GO:0007190&amp;viz=graph"><span class="var process">go:GO_0007190</span></a> have human-readable labels.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="glbl" id="pyrNeurSigTransduct_triples_glbl" title="Gene Labels">  <td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td>            <td>rdfs:label</td>          <td>"Entrez Gene record for human DRD1, 1812"</td> <td>.</td></tr>
  <tr class="plbl" id="pyrNeurSigTransduct_triples_plbl" title="Process Labels">  <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&amp;selected=GO:0007190&amp;viz=graph"><span class="var process">go:GO_0007190</span></a></td>             <td>rdfs:label</td>          <td>"adenylate cyclase activation"</td> <td>.</td></tr>
      </tbody>
    </table>

  <p>The addition of another MeSH record gives us another solution:</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="mesh" id="pyrNeurSigTransduct_triples_mesh2" title="Medical Subject Headings">  <td>pubmedRec:11441182</td>           <td>sc:has-as-minor-mesh</td> <td><a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a></td> <td>.</td></tr>
  <tr class="mesh" title="Medical Subject Headings"><td>article:11441182</td>          <td>sc:identified_by_pmid</td> <td>pubmedRec:11441182</td> <td>.</td></tr>
  <tr class="mesh" title="Medical Subject Headings"><td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td>            <td>sc:describes_gene_or_gene_product_mentioned_by</td> <td>article:11441182</td> <td>.</td></tr>
      </tbody>
    </table>

  <h3 id="preproc">7.1 Precomputing Inferences</h3>

    <div class="figure">
      <p id="rulePicture">
	<object data="rule.svg" type="image/svg+xml" height="490" width="850"><img src="rule.png" alt="obo:part_of Rule" /></object>
	<!-- img src="rule.png" alt="obo:part_of Rule" / -->
      </p>
      <p>Figure 2. obo:part_of Rule [<a href="rule.svg">SVG image</a> <a href="rule.png">PNG image</a>]</p>
    </div>

<p>The demonstration query depends on the existence of an <em>obo:part_of</em> (or <em>rdfs:subClassOf</em>) relationship between any part (i.e. any subclass of any step in the sequence) of molecular signaling, and the general identifier for molecular signaling, <em>go:GO_0007165</em>:</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr style="background-color: #dddddd" class="goa" title="Gene Ontology Database"><td><a href="#query-part-of"><span class="var process">?process</span></a></td> 	<td>obo:part_of</td> 		<td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165"><span class="identifier" title="signal transduction">go:GO_0007165</span></a></td> <td>.</td></tr>

      </tbody>
    </table>

    <p>Triples of this form were generated by a rule, graphically expressed in Figure 2, <em>obo:part_of Rule</em>. The shaded area on the right of the figure shows the OWL restriction which is the antecedent of the rule:</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="goa" title="Gene Ontology Database"><td>_:subPart</td> 	<td>owl:onProperty</td> 		<td>obo:part_of</td> <td>.</td></tr>
  <tr class="goa" title="Gene Ontology Database"><td>_:subPart</td> 	<td>owl:allValuesFrom</td>  	<td>_:subClass</td> <td>.</td></tr>
  <tr class="goa" title="Gene Ontology Database"><td>_:subClass</td> 	<td>owl:onProperty</td> 		<td>rdfs:subClassOf</td> <td>.</td></tr>
  <tr class="goa" title="Gene Ontology Database"><td>_:subClass</td> 	<td>owl:hasValue</td>  		<td>_:parentClass</td> <td>.</td></tr>
      </tbody>
    </table>

	<p>The symmetric property for <em>rdfs:subClassOf</em> need not be explicitly modeled because the <a>RDF Schema Specification</a> defines subClassOf, including its transitivity. Note that if <em>_:subClass</em> is a subClassOf <em>_:parentClass</em>, then all members of <em>_:subClassOf</em> are of type <em>_:parentClass</em> (as well as <em>_:subClass</em>):</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="goa" title="Gene Ontology Database"><td>_:subClass</td>  	<td>owl:onProperty</td> 		<td>rdf:type</td> <td>.</td></tr>
  <tr class="goa" title="Gene Ontology Database"><td>_:subClass</td>  	<td>owl:hasValue</td>  		<td>_:parentClass</td> <td>.</td></tr>
      </tbody>
    </table>
<!--
  <tr class="goa" title="Gene Ontology Database"><td>_:subClass2</td>  	<td>owl:onProperty</td> 		<td>obo:part_of</td> <td>.</td></tr>
  <tr class="goa" title="Gene Ontology Database"><td>_:subClass2</td>  	<td>owl:allValuesFrom</td>  	<td>_:subClass</td> <td>.</td></tr>

  <p>This is a pre-compiled closure of <tt>All members of <span class="var process" title="adenylate cyclase activation">go:GO_0007190</span> are parts of <span title="signal transduction" class="identifier input">go:GO_0007165</span>.</tt> It is not <tt>{ ?process (rdfs:subClassOf*|obo:part_of*) go:GO_0007165 }</tt> as may intuited from <tt>rdfs:subClassOf</tt>'s semantics.</p>
-->
	<p>Because the triple store used does not perform inferencing, these triples have been pre-computed (forward-chained) and inserted into the triple store. This also simplifies the query. If these triples were not pre-computed, the <tt>obo:part-of</tt> part of the query would be expressed:</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr style="background-color: #dddddd" title="Gene Ontology Database"><td><span class="var process">?process</span></td>	<td>rdfs:subClassOf</td> <td><span class="identifier var">?what</span></td> <td>.</td></tr>
  <tr style="background-color: #dddddd" title="Gene Ontology Database"><td><span class="var">?what</span></td>	<td>owl:onProperty</td> <td><span class="identifier" title="signal transduction">obo:has_part</span></td> <td>.</td></tr>
  <tr style="background-color: #dddddd" title="Gene Ontology Database"><td><span class="var">?what</span></td>	<td>owl:someValuesFrom</td> <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165"><span class="identifier" title="signal transduction">go:GO_0007165</span></a></td> <td>.</td></tr>

      </tbody>
    </table>

	<p>and would need to query over a transitive closure of the union of the <tt>obo:part-of</tt> and <tt>rdfs:subClassOf</tt> rules.</p>


    <h2 id="newsource">8 Adding a New Data Source</h2>

    <p><a href="#ds_SenseLab">SenseLab</a> is a collection of relational (Oracle) databases for neuroscientific research that was independently added to the knowledge base after the other data sources. An accompanying document, <a href="../senselab/">Experiences with the conversion of SenseLab databases to RDF/OWL</a>, describes the details of adding it to this knowledge base. With this new data incorporated, the <a href="#geneProc">example query</a> could be extended to extract data from the <span class="senselab">new data source</span>, in this case, discovering the names of receptor proteins associated with the genes discovered in the previous query. In an integrative query of this sort, we can use the results as a starting point for more detailed queries of a particular repository, such as in this case SenseLab.</p>

    <pre class="query" id="geneProcReceptor">SELECT ?genename ?processname <span class="senselab">?receptor_protein_name</span>

WHERE {
<span class="mesh" id="pyrNeurSigTransduct_senselab_mesh" title="Medical Subject Headings">  <span class="comment"># PubMeSH includes <span class="var gene">?gene_record</span>s mentioned in <span class="var article">?article</span>s which are identified by pmid in <span class="var">?pubmed_record</span>s .</span>
  ?pubmed_record sc:has-as-minor-mesh <span class="identifier" title="pyramidal neurons">mesh:D017966</span> .
  ?article sc:identified_by_pmid ?pubmed_record .
  <span class="var gene">?gene_record</span> sc:describes_gene_or_gene_product_mentioned_by ?article .</span>

<span class="goa" id="pyrNeurSigTransduct_senselab_goa" title="Gene Ontology Database">  <span class="comment"># The Gene Ontology asserts that foreach <span class="var protein">?protein</span>, <span class="var protein">?protein</span> ro:has_function [ ro:realized_as <span class="var process">?process</span> ].</span>
  ?protein rdfs:subClassOf ?restriction1 .
  ?restriction1 owl:onProperty ro:has_function .
  ?restriction1 owl:someValuesFrom ?restriction2 .
  ?restriction2 owl:onProperty ro:realized_as .
  ?restriction2 owl:someValuesFrom <span class="var process">?process</span> .
  <span class="comment"># Also, foreach <span class="var process">?protein</span>, <span class="var process">?protein</span> has a parent class which is linked by some predicate to <span class="var gene">?gene_record</span>.</span>
  ?protein rdfs:subClassOf ?protein_superclass .
  ?protein_superclass owl:equivalentClass ?restriction3 .
  ?restriction3 owl:onProperty <abbr title="http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_described_by">dnaGeneProduct:described_by</abbr> .
  ?restriction3 owl:hasValue <span class="var gene">?gene_record</span> .
  <span class="comment"># Each <span class="var process">?process</span> (that we are interested in) is a subclass of the <em>signal transduction</em> process.</span>
  <span class="var process">?process</span> obo:part_of <span class="identifier" title="signal transduction">go:GO_0007165</span> .</span>

<span class="glbl" id="pyrNeurSigTransduct_senselab_glbl" title="Gene Labels">  <span class="var gene">?gene_record</span> rdfs:label ?genename .</span>

<span class="plbl" id="pyrNeurSigTransduct_senselab_plbl" title="Process Labels">  <span class="var process">?process</span> rdfs:label ?processname .</span>

<span class="senselab">  OPTIONAL {
  <span class="comment"># Foreach <span class="var">?gene</span>, <span class="var">?gene</span> senselab:has_nucleotide_sequence_described_by <span class="var gene">?gene_record</span> .</span>
  ?gene owl:equivalentClass ?restriction4 .
  ?restriction4 owl:onProperty senselab:has_nucleotide_sequence_described_by .
  ?restriction4 owl:hasValue <span class="var gene">?gene_record</span> .

  <span class="comment"># Foreach <span class="var">?receptor_protein</span>, <span class="var">?receptor_protein</span> senselab:proteinGeneProductOf <span class="var">?gene</span> .</span>
  ?receptor_protein rdfs:subClassOf ?restriction5 .
  ?restriction5 owl:onProperty senselab:proteinGeneProductOf .
  ?restriction5 owl:someValuesFrom ?gene .

  <span class="comment"># Find the labels of all such <span class="var">?receptor_protein</span>s.</span>
  ?receptor_protein rdfs:label ?receptor_protein_name
  }</span>
}</pre>

    <p>yielding another variable in our results:</p>

    <table>
      <thead>
	<tr>
	  <!-- th>pubmed_record</th -->
	  <th>gene_record_name</th>
	  <th>processname</th>
	  <th>receptor_protein_name</th>
	</tr>
      </thead>
      <tbody>
	<tr>
	  <!-- td>http://purl.org/commons/record/pmid/16640790</td -->
	  <td>Entrez Gene record for human DRD1, 1812</td>
	  <td>adenylate cyclase activation</td>
	  <td>D1 receptor</td>
	</tr>
	<tr>
	  <!-- td>http://purl.org/commons/record/pmid/16640790</td -->
	  <td>Entrez Gene record for human ADRB2, 154</td>
	  <td>adenylate cyclase activation</td>
	  <td>NULL</td>
	</tr>
	<tr><td colspan="3">...</td>                                                                       	</tr>
      </tbody>
    </table>


    <p>The additional triples this matched in the <em>SenseLab</em> knowledge base connect to the existing data by talking about the same genes, e.g. <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a>.</p>

    <div class="figure">
      <p id="senselabPicture">
	<object data="slTriples.svg" type="image/svg+xml" height="355" width="850"><img src="slTriples.png" alt="Additional Triples from SenseLab" /></object>
	<!-- img src="slTriples.png" alt="Additional Triples from SenseLab" / -->
      </p>
      <p>Figure 3. Additional Triples from SenseLab [<a href="slTriples.svg">SVG image</a> <a href="slTriples.png">PNG image</a>]</p>
    </div>

    <p>Figure 3, <em>Additional Triples from SenseLab</em>, shows a subset of the triples provided by SenseLab. Following is a discussion of the origins and intents of those triples:</p>

    <p>A nucleotide sequence is also described by <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a>. Here, we call this <em>_:nucleo1812</em>.</p>

    <table id="senselabTable" class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <thead>
	<tr>
	  <th>subject</th>
	  <th>predicate</th>
	  <th>object</th><th class="placeholder"></th><th class="placeholder"></th>
	</tr>
      </thead>
      <tbody>
  <tr class="senselab"><td>_:nucleo1812</td>               <td>owl:onProperty</td>      <td title="http://purl.org/ycmi/senselab/neuron_ontology.owl#has_nucleotide_sequence_described_by">nucleotideSequence:described_by</td> <td>.</td></tr>
  <tr class="senselab"><td>_:nucleo1812</td>               <td>owl:hasValue</td>        <td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td> <td>.</td></tr>
      </tbody>
    </table>

    <p>The class <em><span>senselab:DRD1_Gene</span></em> has the same members as the OWL restriction <em>_:nucleo1812</em>.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="senselab"><td>senselab:DRD1_Gene</td> <td>owl:equivalentClass</td> <td>_:nucleo1812</td> <td>.</td></tr>
      </tbody>
    </table>

	<p>This <em>_:protGeneProd_1</em> is defined by being a product of <em>DRD1_Gene</em>.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="senselab"><td>_:protGeneProd_1</td>    <td>owl:onProperty</td>      <td>senselab:proteinGeneProductOf</td> <td>.</td></tr>
  <tr class="senselab"><td>_:protGeneProd_1</td>    <td>owl:someValuesFrom</td>  <td>senselab:DRD1_Gene</td> <td>. <span class="comment"></span></td></tr>
      </tbody>
    </table>

    <p>Our solution is a subclass of <em>_:protGeneProd_1</em> called <em><span>senselab:D1</span></em>.</p>

    <table class="triplesTable">
      <col style="width: 10.5em;" />
      <col style="width: 17.5em;" />
      <col style="width: 10.2em;" />
      <col style="width: .1em;" />
      <col/>
      <tbody>
  <tr class="senselab" title="Gene Ontology Database">  <td>senselab:D1</td>               <td>rdfs:subClassOf</td>     <td>_:protGeneProd_1</td> <td>.</td></tr>
  <tr class="senselab"><td>senselab:D1</td>               <td>rdfs:label</td>     <td>"D1"</td> <td>.</td></tr>

      </tbody>
    </table>


    <h2 id="graphs">9 Named Graphs</h2>

    <p>In the Banff Demo, the resulting knowledge base partitioned the assertions into groups called Named Graphs. This process basically consists of associating a distinct URI with a connected graph of triples, and then referring to that graph via the URI. At the time of publication, any query would be expected to include SPARQL GRAPH constraints, e.g.:</p>

    <pre class="query" id="qWithGraphs">prefix go: &lt;http://purl.org/obo/owl/GO#&gt;
prefix rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt;
prefix owl: &lt;http://www.w3.org/2002/07/owl#&gt;
prefix mesh: &lt;http://purl.org/commons/record/mesh/&gt;
prefix sc: &lt;http://purl.org/science/owl/sciencecommons/&gt;
prefix ro: &lt;http://www.obofoundry.org/ro/ro.owl#&gt;
prefix senselab: &lt;http://purl.org/ycmi/senselab/neuron_ontology.owl#&gt;
prefix obo: &lt;http://purl.org/obo/owl/obo#&gt;

SELECT ?genename ?processname <span class="senselab">?receptor_protein_name</span>

WHERE {
<span class="mesh" id="pyrNeurSigTransduct_named_mesh" title="Medical Subject Headings">  <span class="comment"># PubMeSH includes <span class="var gene">?gene_record</span>s mentioned in <span class="var article">?article</span>s which are identified by pmid in <span class="var">?pubmed_record</span>s .</span>
GRAPH &lt;http://purl.org/commons/hcls/pubmesh&gt; {
  ?pubmed_record sc:has-as-minor-mesh <span class="identifier" title="pyramidal neurons">mesh:D017966</span> .
  ?article sc:identified_by_pmid ?pubmed_record .
  <span class="var gene">?gene_record</span> sc:describes_gene_or_gene_product_mentioned_by ?article
}</span>

<span class="goa" id="pyrNeurSigTransduct_named_goa" title="Gene Ontology Database">  <span class="comment"># The Gene Ontology asserts that foreach <span class="var protein">?protein</span>, <span class="var protein">?protein</span> ro:has_function [ ro:realized_as <span class="var process">?process</span> ].</span>
GRAPH &lt;http://purl.org/commons/hcls/goa&gt; {
  ?protein rdfs:subClassOf ?restriction1 .
  ?restriction1 owl:onProperty ro:has_function .
  ?restriction1 owl:someValuesFrom ?restriction2 .
  ?restriction2 owl:onProperty ro:realized_as .
  ?restriction2 owl:someValuesFrom <span class="var process">?process</span> .
  <span class="comment"># Also, foreach <span class="var process">?protein</span>, <span class="var process">?protein</span> has a parent class which is linked by some predicate to <span class="var gene">?gene_record</span>.</span>
  ?protein rdfs:subClassOf ?protein_superclass .
  ?protein_superclass owl:equivalentClass ?restriction3 .
  ?restriction3 owl:onProperty sc:is_protein_gene_product_of_dna_described_by .
  ?restriction3 owl:hasValue <span class="var gene">?gene_record</span> .
  <span class="comment"># Each <span class="var process">?process</span> (that we are interested in) is a subclass or component of the <em>signal transduction</em> process.</span>

  GRAPH &lt;http://purl.org/commons/hcls/20070416/classrelations&gt; {
      { <span class="var process">?process</span> obo:part_of <span class="identifier" title="signal transduction">go:GO_0007165</span> }
    UNION
      { <span class="var process">?process</span> rdfs:subClassOf <span class="identifier" title="signal transduction">go:GO_0007165</span> }
  }
}</span>

<span class="glbl" id="pyrNeurSigTransduct_named_glbl" title="Gene Labels">GRAPH &lt;http://purl.org/commons/hcls/gene&gt; {
  <span class="var gene">?gene_record</span> rdfs:label ?genename
}</span>

<span class="plbl" id="pyrNeurSigTransduct_named_plbl" title="Process Labels">GRAPH &lt;http://purl.org/commons/hcls/20070416&gt; {
  <span class="var process">?process</span> rdfs:label ?processname
}</span>

<span class="senselab">GRAPH &lt;http://purl.org/ycmi/senselab/neuron_ontology.owl&gt; {
  <span class="comment"># Foreach <span class="var">?gene</span>, <span class="var">?gene</span> senselab:has_nucleotide_sequence_described_by <span class="var gene">?gene_record</span> .</span>
  ?gene owl:equivalentClass ?restriction4 .
  ?restriction4 owl:onProperty senselab:has_nucleotide_sequence_described_by .
  ?restriction4 owl:hasValue <span class="var gene">?gene_record</span> .

  <span class="comment"># Foreach <span class="var">?receptor_protein</span>, <span class="var">?receptor_protein</span> senselab:proteinGeneProductOf <span class="var">?gene</span> .</span>
  ?receptor_protein rdfs:subClassOf ?restriction5 .
  ?restriction5 owl:onProperty senselab:proteinGeneProductOf .
  ?restriction5 owl:someValuesFrom ?gene .

  <span class="comment"># Find the labels of all such <span class="var">?receptor_protein</span>s.</span>
  ?receptor_protein rdfs:label ?receptor_protein_name
}</span>
}</pre>

    <p>The named graphs help with both provenance and scaling. In the current approach, each RDF bundle is imported into its own named graph. This is useful for a number of reasons. First, we know the source of each named graph, so we can control and review which data sources are being accessed by our queries. Additionally, the association of a named graph with a data source serves as data provenance and can also be employed by schemes that exploit knowledge about the data source to assign confidence measures in a model of trust. For example, one of the knowledge base data sources resulted from text mining experiments to find protein <em>associations</em>. Users of the knowledge base can choose to view this evidence of association differently than the <em>associations</em> provided from a protein-protein interaction database. Also, named graphs support scaling by making it possible to update selected parts of the knowledge base, for example when the data source has new information or related ontologies are changed.</p>

    <h2 id="nextsteps">10 Opportunities for further development</h2>
    <p>The knowledge base was initially designed for the purposes of a live demo. It also provided a basis for early work on the <a href="http://neurocommons.org">Neurocommons</a>, where its development continues. Some design choices were made to favor simplicity and maximal performance, including the use of a central triple store, and the design of the data and queries. Many of the choices were guided by the desire for transparency for a broader audience of biomedical informaticists. Several areas of possible improvement are noted here:</p>
    <ul>
      <li>Broaden the knowledge base to cover more of the related
      domains such as structural chemistry, cells, anatomy,
      physiology, behavior, protocols, and reagents.</li>

      <li>The sources accessed by a query could eventually be spread
      across repositories in separate locations to demonstrate the
      ease of integrating distributed data sources with Semantic
      Web.</li>
      <li>Create dynamic visual interfaces that provide the user with
      the means to create and refine a query without requiring
      prerequisite knowledge of the data or query language.</li>
    </ul>
    <p>There are also a number of open issues that should be addressed in future work:</p>
    <ul>
      <li>What relations should we use to connect a biological entity with artificial entities describing it, e.g. <em>protein records, sequence records, PubMed records</em>?</li>
      <li>What is the best way to model evidence so that it can be recorded in data provenance?</li>
      <li>How are information resources such as <em>database entry</em> or <em>XML document associated with a database entry</em> best represented in <a href="http://ifomis.org/bfo/">BFO</a>-friendly ontologies?</li>
      <li>Mapping across terminologies: MeSH, in particular has terms that are synonymous which many terms in other ontologies, including genes, proteins, GO terms, etc. We made efforts to harmonize the representation in certain cases, such as between Senselab and GO. In other cases, such as MeSH, we have done no harmonization so this should be reviewed for eventual corrections.</li>

    </ul>

    <h1 id="appendix">Appendix</h1>

    <h2 id="rdfbundles"><b>A</b> RDF Sources</h2>

<p>A table of the RDF sources used to create the Knowledge base:</p>

    <div style="text-align: center;">
      <table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5">
 <thead>
  <tr>
    <th>RDF bundle name</th>
    <th>Last modified</th><th>Size</th>

    <th>Description</th>
    <th>RDF conversion by</th>
    <th>Terms</th></tr>
</thead>
<tbody>
  <tr><td id="aba"><a href="http://purl.org/hcls/2007/kb-sources/aba-2007-08-07.tgz">aba-2007-08-07.tgz</a></td> 
    <td>22-Sep-2007 &nbsp;</td> <td>51M</td>

    <td>SC's extract of <a href="http://www.brain-map.org/"
      >Allen Brain Atlas</a> metadata from their Web site.
      Web site was read on 26 Feb 2007 or
      shortly before</td>
    <td>SC</td>
    <td><a href="http://www.brain-map.org/pdf/ABATermsOfUse.pdf"
	   >terms of use</a></td></tr>

  <tr><td id="addgene"><a href="http://purl.org/hcls/2007/kb-sources/addgene.ttl">addgene.ttl</a></td>                   
    <td>16-May-2007</td> <td>1.1M</td>

    <td><a href="http://www.addgene.org/">Addgene</a>
      catalog (tab-delimited file)</td>
    <td>SC</td>
    <td>provided to Science Commons by Addgene</td></tr>

  <tr><td id="bams"><a href="http://purl.org/hcls/2007/kb-sources/bams-from-swanson-98-4-23-07.owl">bams-from-swanson-98-4-23-07.owl</a></td>
    <td>23-Apr-2007</td> 
    <td>5.6M</td>

    <td>
      <a href="http://brancusi.usc.edu/bkms/bamsxml.html">BAMS</a>
      </td>
    <td>HCLSIG/NIST</td>
    <td>released without contract</td></tr>

  <tr><td id="galen"><a href="http://purl.org/hcls/2007/kb-sources/galen.tgz">galen.tgz</a></td>                     
    <td>22-Sep-2007</td> <td>1.9M</td>

    <td><a href="http://www.co-ode.org/galen/"
	>Galen from co-ode.org</a></td>
    <td>-</td>
    <td>released without contract </td></tr>

  <tr><td id="geneinfo"><a href="http://purl.org/hcls/2007/kb-sources/gene-owl.tgz">gene-owl.tgz</a></td>                   
    <td>08-May-2007</td> <td>7.7M</td>

    <td>Select fields from Entrez Gene records</td>
    <td>HCLSIG/SC</td> <!-- ? -->
    <td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
	>NCBI Copyright and Disclaimers</a></td></tr>

  <tr><td id="pubmed"><a href="http://purl.org/hcls/2007/kb-sources/gene-pubmed.ttl.tgz">gene-pubmed.ttl.tgz</a></td> 
    <td>08-May-2007</td> <td>1.5M</td>

    <td>Entrez Gene Extract from 
	<a href="ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz"
	 >ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz</a></td>
    <td>HCLSIG/HP/SC</td>
    <td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
	>NCBI Copyright and Disclaimers</a></td></tr>

  <tr><td id="goa"><a href="http://purl.org/hcls/2007/kb-sources/goa-in-owl.tgz">goa-in-owl.tgz</a></td>
    <td>16-May-2007</td> <td>73M</td>

    <td id="go">GO annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI)</td>
    <td>HCLSIG/SC</td>
    <td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
	>NCBI Copyright and Disclaimers</a>;
	<a href="http://www.ebi.ac.uk/Information/termsofuse.html"
	>EBI terms of use</a></td></tr>

  <tr><td id="homologene"><a href="http://purl.org/hcls/2007/kb-sources/homologene.tgz">homologene.tgz</a></td>
    <td>16-May-2007</td> <td>626K</td>

    <td>Homologene </td>
    <td>HCLSIG/SC</td>
    <td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
	>NCBI Copyright and Disclaimers</a></td></tr>

  <tr>
    <td id="mesh"><!-- a href="http://sw.neurocommons.org/2007/kb-sources/medline/medline-mesh.tgz" -->medline-mesh.tgz<!-- /a --><br />
(<a href="http://apps.nlm.nih.gov/medlineplus/contact/index.cfm">contact Medline</a> for use terms)</td>
    <td>16-May-2007</td> <td>758M</td>

    <td>List of all associations of MeSH headings to papers indexed by 
	Medline
	extracted from 2007 Medline baseline distribution</td>
    <td>HCLSIG/SC</td>
    <td><a href="http://www.nlm.nih.gov/databases/license/license_standard.html"
      >License Agreement to Lease NLM Databases in Machine-Readable
      Form</a> -
      see below</td></tr>

  <tr>
    <td id="medline"><!-- a href="http://sw.neurocommons.org/2007/kb-sources/medline/medline-titles.tgz" -->medline-titles.tgz<!-- /a --><br />
(<a href="http://apps.nlm.nih.gov/medlineplus/contact/index.cfm">contact Medline</a> for use terms)</td>

    <td>16-May-2007</td> <td>670M</td>
    <td>Extracted from 2007 Medline baseline distribution</td>
    <td>HCLSIG/SC</td>

    <td>see below </td></tr>

  <tr><td id="mqualhead"><a href="http://purl.org/hcls/2007/kb-sources/mesh-qualified-headings.ttl.gz"
           >mesh-qualified-headings.ttl.gz</a></td>

    <td>30-Apr-2007</td> <td>13M</td>
    <td><a href="http://www.nlm.nih.gov/mesh/"
      >NLM 2007 MeSH</a> descriptor/qualifier pairs
    </td>

    <td>HCLSIG/SC</td>
    <td><a href="http://www.nlm.nih.gov/mesh/termscon.html"
      >MeSH MOU</a> </td></tr>

  <tr><td id="mesh-skos"><a href="http://purl.org/hcls/2007/kb-sources/mesh-skos.tgz">mesh-skos.tgz</a></td>
    <td>16-May-2007</td> <td>13M</td>
    <td><a href="http://www.nlm.nih.gov/mesh/"
      >NLM 2007 MeSH</a>
    </td>
    <td> <a href="http://thesauri.cs.vu.nl/eswc06/"
         >van Assem et al/SC</a></td>

    <td><a href="http://www.nlm.nih.gov/mesh/termscon.html"
      >MeSH MOU</a> </td></tr>

  <tr><td id="mesh07-eswc06"><a href="http://purl.org/hcls/2007/kb-sources/mesh07-eswc06.rdfs">mesh07-eswc06.rdfs</a></td>
    <td>28-Jun-2007</td> <td>2.2K</td>
    <td><a href="http://thesauri.cs.vu.nl/eswc06/"
	>van Assem et al's ontology</a> 
      (used by output of MeSH to SKOS conversion)</td>

    <td> <a href="http://thesauri.cs.vu.nl/eswc06/"
         >van Assem et al</a></td>
    <td>released without contract</td></tr>

  <tr><td id="textmining"><a href="http://purl.org/hcls/2007/kb-sources/neurocommons-text-mining.tgz">neurocommons-text-mining.tgz</a></td>
    <td>05-May-2007</td> <td>24M</td>
    <td><a href="http://sw.neurocommons.org/2007/text-mining.html"
	>Neurocommons text mining pilot</a> - extracted from Temis
	software applied to 7% of Medline records</td>

    <td>SC</td>
    <td>released without contract </td></tr>

  <tr><td id="obo"><a href="http://purl.org/hcls/2007/kb-sources/obo-all.tgz">obo-all.tgz</a></td>
    <td>22-Sep-2007</td> <td>36M</td>
    <td><a href="http://www.berkeleybop.org/ontologies/"
	  >All OBO ontologies</a> </td>

    <td>BBOP</td>

    <td>released without contract</td></tr>

  <tr><td id="obo-in-owl"><a href="http://purl.org/hcls/2007/kb-sources/obo-in-owl.tgz">obo-in-owl.tgz</a></td>
    <td>16-May-2007</td> <td>2.6M</td>
    <td>selected OBO ontologies, downloaded ~21 April 2007, augmented with
      inferred relations</td>

    <td>HCLSIG/SC</td>

    <td>released without contract </td></tr>

  <tr><td id="sciencecommons"><a href="http://purl.org/hcls/2007/kb-sources/sciencecommons.owl">sciencecommons.owl</a></td>
    <td>28-Jun-2007</td> <td>19K</td>
    <td>A bridging ontology, from <a href="http://sciencecommons.org">Science Commons</a>, importing other ontologies used in the prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology.</td>

    <td>HCLSIG/SC</td>
    <td>released without contract </td></tr>

  <tr><td id="senselab"><a href="http://purl.org/hcls/2007/kb-sources/senselab.tgz">senselab.tgz</a></td>
    <td>16-May-2007</td> <td>216K</td>
    <td>From <a href="http://neuroweb.med.yale.edu/senselab/"
	     >Yale Senselab</a> </td>

    <td>HCLSIG/Yale</td>
    <td>released without contract </td></tr>
</tbody>
</table>
</div> 

   <p><i>Attributions: <br></br>Science Commons (SC), Berkeley Bioinformatics Open-source Projects(BBOP), Health Care and Life Sciences Interest Group (HCLSIG), National Institute of Standards and Technology (NIST), Hewlett Packard (HP)</i></p>

    <h2 id="references"><b>B</b> References</h2>


    <dl>
      <dt><a name="ref-OWL" id="ref-OWL"></a>[OWL]</dt>
      <dd><cite><a href="http://www.w3.org/TR/2004/REC-owl-features-20040210/">OWL Web Ontology Language Overview</a></cite>, <br />
	Deborah L. McGuinness and Frank van Harmelen, Editors, <br />
	W3C Recommendation, 10 February 2004, <br />
	http://www.w3.org/TR/2004/REC-owl-features-20040210/ .<br />
	<a href="http://www.w3.org/TR/owl-features/">Latest version</a> available at http://www.w3.org/TR/owl-features/ .</dd>

      <dt><a id="ref-RDF" name="ref-RDF">[RDF]</a></dt>
      <dd><cite><a href="http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/">Resource Description Framework (RDF) Model and Syntax Specification</a></cite>, <br />
        Ora Lassila, Ralph R. Swick, Editors, <br />
        World Wide Web Consortium Recommendation, 1999, <br />
        http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.<br />
        <a href="http://www.w3.org/TR/REC-rdf-syntax/">Latest version</a> available at http://www.w3.org/TR/REC-rdf-syntax/.</dd>

      <dt><a name="refCONCEPTS" id="refCONCEPTS">[RDF CONCEPTS]</a></dt>
      <dd><cite><a href="http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/">Resource Description Framework (RDF): Concepts and Abstract Syntax</a></cite>, <br />
	G. Klyne, J. J. Carroll, Editors, <br />
	W3C Recommendation, 10 February 2004, <br />
	http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ .<br />
	<a href="http://www.w3.org/TR/rdf-concepts/" title="Latest version of Resource Description Framework (RDF): Concepts and Abstract Syntax">Latest version</a> available at http://www.w3.org/TR/rdf-concepts/ .</dd>

      <dt><a name="ref-SPARQL" id="ref-SPARQL">[SPARQL]</a></dt>
      <dd><cite><a href="http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/">SPARQL Query Language for RDF</a></cite>, 
	A. Seaborne, E. Prud'hommeaux,  Editors, 
	W3C Recommendation, 15 January 2008, 
	http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/ . 
	<a href="http://www.w3.org/TR/rdf-sparql-query/" title="Latest version of SPARQL Query Language for RDF">Latest version</a> available at http://www.w3.org/TR/rdf-sparql-query/ .</dd>

      <dt><a name="ref-TURTLE" id="ref-TURTLE"></a>[TURTLE]</dt>
      <dd><cite><a href="http://www.w3.org/TeamSubmission/turtle/">Turtle - Terse RDF Triple Language</a></cite>, <br />
	W3C Team Submission, 14 January 2008, <br />
	http://www.w3.org/TeamSubmission/turtle/ .</dd>

      <dt>[<a name="ref-SWAN" id="ref-SWAN">SWAN</a>]</dt>
      <dd><cite>Alzforum and SWAN: The Present and Future of Scientific Web Communities</cite>, <br />
	Clark T and Kinoshita J., <br />
	Briefings in Bioinformatics 2007;8:163-171 doi:10.1093/bib/bbm012.</dd>
    </dl>

    <h2 id="resources"><b>C</b> Additional Resources</h2>

    <p>The knowledge base has been installed at several locations. At the time of this writing, these locations provide SPARQL query access, however, it is not guaranteed that the endpoints at these address will persist, or continue to serve the knowledge base described in this note:</p>
    <ul>
      <li><a href="http://sparql.neurocommons.org:8890/nsparql/">Science Commons</a></li>
      <li><a href="http://hcls.deri.ie/hcls_demo.html">DERI Ireland</a></li>
    </ul>
     <p>Below are a few visual interfaces that make it possible to browse the results of a search on the knowledge base:</p>
     <ul>
      <li><a href="http://purl.org/hcls/2007/prototypes/GoogleMapAllenBrainAtlas">Prototype of a Google-Maps interface</a> to the Allen Brain Atlas.</li>
      <li><a href="http://purl.org/hcls/2007/prototypes/ExhibitGeneExpression.html">Visualization</a> of gene expression data using <a href="http://simile.mit.edu/exhibit/">Exhibit</a></li>
     </ul>

    <p><a href="http://ycmi.med.yale.edu/entrez_neuron.html">Entrez Neuron</a> was developed by the SenseLab team as a graphical user interface for querying the SenseLab ontologies.</p>

    <p>We used the open source edition of the Openlink Virtuoso repository from <a href="http://sourceforge.net/projects/virtuoso/">http://sourceforge.net/projects/virtuoso/</a>.</p>

    <p>The actions and scripts that were used to create the knowledge base on a commodity PC have been documented by several HCLSIG members. The necessary instructions and scripts that were used will be listed here as completely as possible:</p>
    <ul>
      <li>Repository <a href="http://esw.w3.org/topic/HCLS/Banff2007Demo/HCLS/Banff2007Demo/HowToMakeOneForYourself">installation and steps</a> for creating a mirror repository have been documented by Donald Doherty.</li>
      <li>All <a href="http://svn.neurocommons.org/svn/trunk/convert/">conversion scripts</a> from Science Commons are available under a 
<a href="http://sw.neurocommons.org/2007/LICENSE.txt">BSD license</a>.</li>
    <li><a href="http://thesauri.cs.vu.nl/eswc06/">MeSH conversion to SKOS</a> was performed with an approach outlined in a 2006 European Semantic Web Conference paper from Mark van Assem et al.</li>
    </ul>

    <p>The following resources may be of interest for future work:</p>
    <ul>
     <li><a href="http://www.ontotext.com/owlim/">OWLIM</a> is a high-performance <a href="http://www.ontotext.com/inference/semantic_repository.html">semantic repository</a> 
developed in Java. It is packaged as a Storage and Inference Layer (SAIL) for 
the <a href="http://www.openrdf.org/">Sesame</a> RDF database.</li>
     <li><a href="http://clarkparsia.com/weblog/2007/10/26/towards-sparql-dl-evaluation-in-pellet/">SPARQL-DL</a></li>
    </ul>

    <h2 id="acknowledgements"><b>D</b> Acknowledgements</h2>
   
    <p>In memory of our friend and colleague William Bug, Ontological Engineer.</p>

    <p>Special thanks to: Alan Ruttenberg (Science Commons) who coordinated the assembly, conversion, and deployment
    of the data sets and ontologies and Susie Stephens (Eli Lilly) who coordinated the BioRDF task force. Together they presented the initial version of the
    knowledge base at a WWW2007 Banff workshop.</p>

<p><span style="text-decoration: underline;">Contributors:</span></p>

    <p>Many contributed to the development, documentation and validation of the knowledge base, as
    well as the thinking behind it.</p>

<p>
Mikail Bota (USC) who kindly provided the BAMS database for our use and John Barkley (NIST) converted it to RDF.

Huajun Chen (Zhejiang University), Matthias Samwald (Yale Center for Medical Informatics; DERI Galway; Semantic Web Company), Alan Ruttenberg, and Kei-Hoi Cheung (Yale Center for Medical Informatics) participated in the the SenseLab RDF Conversion. 

Members of the SWAN team: Tim Clark, Paolo Ciccarese, June Kinoshita, Gwen Wong, and Elizabeth Wu contributed the SWAN data source. 

June Kinoshita, Gwen Wong, Elizabeth Wu, Don Doherty (Brainstage Research Inc.), William Bug (School of Medicine, UCSD), and Alan Ruttenberg worked on the neurogenerative disease use cases.

Ray Hookaway (HP) provided digests from Entrez Gene that were more easily converted to RDF.

Jonathan Rees (Science Commons) did the RDF conversions of Addgene,
Pubmed to Gene, Medline, and MeSH, the Neurocommons text mining pilot,
and compiled the data source and licensing information for this document.

Alan Ruttenberg did the RDF conversion of Entrez
Gene records, GO Annotations, Allen Brain Atlas, Homologene, wrote the
Science Commons ontology.

Alan Ruttenberg and Matthias Samwald wrote the SPARQL queries described in this document. 

Chris Mungall (NCBO) wrote the converter that produced the OWL versions of the OBO ontologies and consulted on matters of ontology.
</p><p>
Eric Neumann (Clinical Semantics Group) produced the Exhibit visualization. 
Alan Ruttenberg developed the Google Mouse prototype, with contributions
from Mike Travers (CollabRX), Brian Gilman (SciLink), and Tom
Stambaugh (Zeetix).

Don Doherty, Matthias Samwald, Holger Stenzorn (DERI), M. Scott Marshall, and Eric Prud'hommeaux have presented this work at conferences.</p>

<p>Barry Smith (State University of New York at Buffalo, USA) provided
advice on ontology work and led the development of the <a href="http://ifomis.org/bfo">Basic Formal
Ontology</a>, which inspired all ontology work related to the knowledge
base.</p>

<p>William Bug (School of Medicine, UCSD), Michel Dumontier (Carleton
University), and Holger Stenzorn (DERI) reviewed and gave detailed
comments on an initial draft of this note. Alan Ruttenberg Jonathan
Rees and Susie Stephens, reviewed and contributed to several versions
of the document.

Susie Stephens coordinated the BioRDF task force, worked on
presentations of the work, and wrote the introduction to this
document.

M. Scott Marshall (University of Amsterdam) and Eric
Prud&#x2019;hommeaux (W3C) edited and coordinated the production of
this note. Eric Prud&#x2019;hommeaux created the figures. 
</p>
    <p>We would like to offer special thanks for organizations which gave contributions of equipment and service. Through Ray Hookway and Jeannine Crockford, <a href="http://hp.com/">Hewlett Packard</a> donated two machines for a period of six weeks during the demo. Science Commons hosted the the prototype during development and continues to host and develop a knowledge base derived from the prototype as part of the <a href="http://neurocommons.org/page/Main_Page">Neurocommons</a>. <a href="http://www.csail.mit.edu/index.php">MIT CSAIL</a> hosts Science Commons and provided computer and networking infrastructure. </p><p> Kingsley Idehen, Orri Erling, Ivan Mikhailov, Mitko Iliev, Patrick van Kleef and Anton Avramov from <a href="http://www.openlinksw.com/">Openlink Software</a> provided rapid technical support including several custom builds of the <a href="http://www.openlinksw.com/virtuoso/">Virtuoso</a> triple store to address early performance issues, making it possible to develop the prototype on an aggressive schedule. Evren Sirin from <a href="http://clarkparsia.com/">Clark and Parsia</a> provided support for the <a href="http://pellet.owldl.com/">Pellet OWL reasoner</a>.

</p>

   <p>In addition to data sources that were incorporated into the prototype, other data that did not make it in was provided by Judith Blake (MGD) an Simon Twigger (RGD), and  Colin Knep (Alzforum)</p>


    <div class="nav"><a href="http://validator.w3.org/check/referer">
      <img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" /></a>
      </div><hr></hr>

  </body>

</html>