index.html
97.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta name="generator" content="Emacs 22" />
<meta name="RCS-Id" content="$Id: Overview.html,v 1.5
7 2008/01/14 16:10:23 mmarshal Exp $" />
<title>A Prototype Knowledge Base for the Life Sciences</title>
<style type="text/css">
/*<![CDATA[*/
pre { }
/*
a:link { color: green }
a:visited { color: green }
a:hover { color: green }
*/
pre a:link { text-decoration: none }
.schema th { text-align: left }
table, td, th { border-style: solid;
border-width: 1px;
border-color: black;
border-bottom-color: gray;
border-right-color: gray; }
table.dbsTable { border-collapse: collapse; border-color: #000000; }
table.dbsTable td:first-child { vertical-align: top; }
table.dbsTable td { padding: 2px 5px 2px 5px; }
table.triplesTable {
margin-left: 2em;
border: none;
}
table.triplesTable td {
border: none;
table-layout: fixed;
font-size: 75%;
font-family: fixed, monospace;
}
table.triplesTable tr.p td {
font-size: 100%;
}
table.triplesTable a:link { text-decoration: none }
table.triplesTable tr.p a:link { text-decoration: underline }
table.triplesTable th { text-align: left; }
table.triplesTable p {
margin-left: -2em;
}
.choice { border-style: dashed; }
.placeholder { visibility: hidden; }
.issue { background-color: #fcc; }
/* http://www.w3.org/Style/Examples/007/figures */
div.figure {
/* float: right; */
/* width: 25%; */
border: thin silver solid;
margin: 0.5em;
padding: 0.5em;
}
div.figure p {
text-align: center;
font-style: italic;
font-size: smaller;
text-indent: 0;
}
tt {
white-space: pre;
}
/*]]>*/
</style>
<link rel="stylesheet" type="text/css" href="local.css" />
<link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-IG-NOTE" />
</head>
<body>
<div class="head">
<p><a href="http://www.w3.org/">
<img src="http://www.w3.org/Icons/w3c_home" alt="W3C" height="48" width="72" /></a></p>
<h1 id="main">A Prototype Knowledge Base for the Life Sciences</h1>
<h2 class="no-num no-toc" id="w3c-doctype">W3C Interest Group Note 4 June 2008</h2>
<dl>
<!-- dt>Editors working draft.</dt>
<dd><span class="cvs-id">$Revision: 1.6 $ of $Date: 2008/06/06 00:17:45 $</span></dd>
<dd>see also <a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">public-semweb-lifesci@w3.org Mail Archives</a></dd>
<dt>Published W3C Technical Report version:</dt -->
<dt>This version:</dt>
<dd><a href="http://www.w3.org/TR/2008/NOTE-hcls-kb-20080604/">http://www.w3.org/TR/2008/NOTE-hcls-kb-20080604/</a></dd>
<dt>Latest version:</dt>
<dd><a href="http://www.w3.org/TR/hcls-kb/">http://www.w3.org/TR/hcls-kb/</a></dd>
<dt>Previous version:</dt>
<dd><a href="http://www.w3.org/TR/2008/WD-hcls-kb-20080404/">http://www.w3.org/TR/2008/WD-hcls-kb-20080404/</a></dd>
<dt>Editors:</dt>
<dd>M. Scott Marshall, University of Amsterdam <<a href="mailto:marshall@science.uva.nl">marshall@science.uva.nl</a>></dd>
<dd>Eric Prud'hommeaux, W3C <<a href="mailto:eric@w3.org">eric@w3.org</a>></dd>
<dt id="contributors">Contributors:</dt>
<dd>Alan Ruttenberg, Science Commons <<a href="mailto:alanruttenberg@gmail.com">alanruttenberg@gmail.com</a>></dd>
<dd>Jonathan Rees, Science Commons <<a href="mailto:jar@creativecommons.org">jar@creativecommons.org</a>></dd>
<dd>Susie Stephens, Lilly <<a href="mailto:Stephens_Susie_M@lilly.com">Stephens_Susie_M@lilly.com</a>></dd>
<dd>Matthias Samwald, Yale Center for Medical Informatics; DERI Galway; Semantic Web Company <<a href="mailto:samwald@gmx.at">samwald@gmx.at</a>></dd>
<dd>Kei-Hoi Cheung, Yale Center for Medical Informatics <<a href="mailto:kei.cheung@yale.edu">kei.cheung@yale.edu</a>></dd>
</dl>
<p class="copyright"><a href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</a> © 2008 <a href="http://www.w3.org/"><acronym title="World Wide Web Consortium">W3C</acronym></a><sup>®</sup> (<a href="http://www.csail.mit.edu/"><acronym title="Massachusetts Institute of Technology">MIT</acronym></a>, <a href="http://www.ercim.org/"><acronym title="European Research Consortium for Informatics and Mathematics">ERCIM</acronym></a>, <a href="http://www.keio.ac.jp/">Keio</a>), All Rights Reserved. W3C <a href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</a>, <a href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</a> and <a href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</a> rules apply.</p>
</div>
<hr title="Separator for header" />
<div>
<h2 class="notoc" id="abstract">Abstract</h2>
<p>The prototype we describe is a biomedical knowledge base, constructed for a demonstration at <a href="http://www2007.org/prog-W3CTrack.php#thursday">Banff WWW2007</a> , that integrates 15 distinct data sources using currently available Semantic Web technologies such as the W3C standard Web Ontology Language [<a href="#ref-OWL">OWL</a>] and Resource Description Framework [<a href="#ref-RDF">RDF</a>]. This report outlines which resources were integrated, how the knowledge base was constructed using free and open source triple store technology, how it can be queried using the W3C Recommended RDF query language SPARQL [<a href="#ref-SPARQL">SPARQL</a>], and what resources and inferences are involved in answering complex queries. While the utility of the knowledge base is illustrated by identifying a set of genes involved in Alzheimer's Disease, the approach described here can be applied to any use case that integrates data from multiple domains.</p>
</div>
<div>
<h2 id="status">Status of This Document</h2>
<p><em>This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the <a href="http://www.w3.org/TR/">W3C technical reports index</a> at http://www.w3.org/TR/.</em></p>
<p>
This W3C Interest Group Note describes how one can use the Semantic Web to express and integrate scientific data.
These techniques can be used for modeling any data, and the benefits of integration and model consistency apply to other diverse, distributed data domains.
It is hoped that this document will inspire further contributions to the ongoing work at Neurocommons and the Health Care and Life Sciences Interest Group, as well as inspire those in other domains to exploit the Semantic Web.
</p>
<p class="notetoeditor">This document describes the construction and use of the HCLS Knowledgebase used in the <a href="http://esw.w3.org/topic/HCLS/Banff2007Demo">WWW2007 Banff HCLS Demo</a>. It describes the process for creating a bilogical database on the Semantic Web. The companion document, <a href="http://www.w3.org/TR/2008/NOTE-hcls-senselab-20080604/">Experiences with the conversion of SenseLab databases to RDF/OWL</a>, describes the process for integrating new data into this Knowledgebase.</p>
<p>The document was produced by the <a href="http://www.w3.org/2001/sw/hcls/">Semantic Web in Health Care and Life Sciences Interest Group (HCLS)</a>, part of the <a href="http://www.w3.org/2001/sw/">W3C Semantic Web Activity</a> (<a href="http://www.w3.org/2001/sw/hcls/charter">see charter</a>). Comments may be sent to the <a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">publicly archived</a> <a href="mailto:public-semweb-lifesci@w3.org">public-semweb-lifesci@w3.org</a> mailing list. Feedback is encouraged, as is participation in the recently <a href="http://www.w3.org/2008/05/HCLSIGCharter">re-chartered</a> HCLSIG. A <a href="WD2NOTE">list of changes since the last publication</a> is available.</p>
<p>Publication as
an Interest Group Note
does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p>
<p>This document was produced by a group operating under the disclosure
obligations of the <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</a>. The group does
not expect this document to become a W3C Recommendation. An
individual who has actual knowledge of a patent which the individual
believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information to
<a href="mailto:public-semweb-lifesci@w3.org">public-semweb-lifesci@w3.org</a> [<a href="http://lists.w3.org/Archives/Public/public-semweb-lifesci/">public archive</a>] in accordance with
in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>.</p>
</div>
<hr />
<div class="toc">
<h2 id="contents">Table of Contents</h2>
<ul class="toc">
<li class="tocline1"><a href="#introduction">1 Introduction</a> </li>
<li class="tocline2"><a href="#docscope">1.2 Document Scope and Target Audience</a></li>
<li class="tocline2"><a href="#termStability">1.3 Stability of Terms</a></li>
<li class="tocline2"><a href="#docConventions">1.4 Document Conventions</a></li>
<li class="tocline2"><a href="#docOutline">1.5 Document Outline</a></li>
<li class="tocline1"><a href="#usecase">2 Use Case</a></li>
<li class="tocline1"><a href="#dbs">3 Data Sources</a></li>
<li class="tocline1"><a href="#terms">4 Design Decisions</a></li>
<li class="tocline1"><a href="#mechanics">5 Importing to RDF - Homologene Example</a></li>
<li class="tocline1"><a href="#query">6 Query</a></li>
<li class="tocline1"><a href="#triplemodel">7 Data Model</a><ul>
<li class="tocline2"><a href="#preproc">7.1 Precomputing Inferences</a></li>
</ul></li>
<li class="tocline1"><a href="#newsource">8 Adding a New Data Source</a></li>
<li class="tocline1"><a href="#graphs">9 Named Graphs</a></li>
<li class="tocline1"><a href="#nextsteps">10 Opportunities for further development</a></li>
</ul>
</div>
<h3 id="appendices">Appendices</h3>
<ul class="toc">
<li class="tocline2"><a href="#rdfbundles">A RDF Sources</a></li>
<li class="tocline2"><a href="#references">B References</a></li>
<li class="tocline2"><a href="#resources">C Additional Resources</a></li>
<li class="tocline2"><a href="#acknowledgements">D Acknowledgements</a></li>
</ul>
<hr />
<h2 id="introduction">1 Introduction</h2>
<p>The life sciences have a rich history of making data available on the Web, because researchers recognized the benefits of sharing data and made it available to other researchers for the benefit of greater science. However, because many of the data repositories were developed in relative isolation, they tend to use different identifier schemes, incompatible terminology, and dissimilar data formats. This makes it hard for researchers to find all data about an entity of interest and to assemble it into a useful block of knowledge. This prototype was built to demonstrate how Semantic Web technologies can integrate such heterogeneous data sets and thereby help scientists to more easily answer interesting scientific questions.</p>
<p>The key to advancing scientific understanding is empowering scientists with the information that they need to make well-informed decisions. Scientists need to be able to easily gain access to all information about chemical compounds, biological systems, diseases, and the interactions between these entities, and this requires data to be effectively integrated in order to provide a <em>biological systems level</em> view to the user, i.e. a complete view of biological activity. However, achieving this goal has proven to be a formidable challenge in the life sciences, where data and models are found in a large variety of formats and scales that span from the molecular to the anatomical.</p>
<p>In order to overcome the challenge of gaining insight directly from the Web, a number of laboratories, organizations, and companies have built internal data warehouses from the publicly available data sources. This certainly helps scientists to more easily query for all information related to entities of interest. However, these efforts generally integrate only a subset of publicly available data that is deemed to be of greatest interest, and it has proven difficult to add data sources to the warehouse at a later point. Further, advances in scientific knowledge require regular changes to be made to the underlying data models, and this is not straightforward with a relational model. Organizations that use this approach also typically face challenges with representing data that is at different levels of abstraction, and that includes data of very different quality.</p>
<p>Many health care and life sciences organizations are interested in the data integration abilities promised by the Semantic Web. More specifically, the benefits include the aggregation of heterogeneous data using explicit semantics, and the expression of rich and well-defined models for data aggregation and search. Semantic Web technologies enable one to more flexibly add additional data sets into the data model, and more easily reuse data in unanticipated ways. Once data has been aggregated, a Semantic Web reasoner computes implied relationships among the aggregated data resulting in tighter integration and the possibility of additional insights.</p>
<p>This prototype knowledge base imports data from data sources that span multiple domains in the life sciences to make cross-discipline queries. It therefore provides a working (and reproducible) example of the possibilities that become available via knowledge integration. The use of an RDF repository to store RDF and OWL makes it possible to query, manipulate, and reason about the data with standard tools, such as OWL reasoners, and languages, such as the SPARQL Query Language for RDF. Although this document addresses a specific use case, the approach described here can be applied to any use case that integrates data from multiple domains.</p>
<h3 id="docscope">1.2 Document Scope and Target Audience</h3>
<p>This document attempts to succinctly describe how this knowledge base was constructed so that interested parties can use the core techniques to create their own knowledge base. We have attempted to write a general description but, unavoidably, the knowledge base makes use of specialized resources, such as those found in the <a href="#dbs">Data Sources</a> section. Some, but not all, of the reasoning behind design decisions is explained. Several technologies such as the Semantic Web standards RDF, OWL, and SPARQL were used, but in order to keep this document to a manageable size, we will not explain all aspects in the depth that would be required for those new to the area. Those interested in a general introduction to the Semantic Web should see <a href="http://www.semanticwebprimer.org/">The Semantic Web Primer</a>. See also the <a href="http://www.co-ode.org/">CO-ODE web site</a> for a <a href="http://www.co-ode.org/resources/tutorials/protege-owl-tutorial.php">hands-on OWL tutorial with Protégé</a>. For materials introducing ontology see <a href="http://www.bioontology.org/">National Center for Biomedical Ontology</a>(NCBO) <a href="http://www.bioontology.org/wiki/index.php/Introduction_to_Biomedical_Ontologies">Introduction to Biomedical Ontologies</a>. For materials related to reasoning see <a href="http://www.cs.man.ac.uk/~horrocks/Teaching/cs646/">The Semantic Web: Ontologies and OWL</a>.
</p>
<h3 id="termStability">1.3 Stability of Terms</h3>
<p>This document uses URLs to identify records about biological entities and processes. The identifiers used in this document are the same as those used in the prototype knowledge base and are not yet stable. Knowledge base implementors should use these terms whenever possible.</p>
<h3 id="docConventions">1.4 Document Conventions</h3>
<p>RDF data in this document is expressed in Turtle [<a href="#ref-TURTLE">TURTLE</a>]. Queries on this data are expressed in SPARQL [<a href="#ref-SPARQL">SPARQL</a>]. The following namespace prefix bindings are assumed unless otherwise stated:</p>
<div style="text-align: center;">
<table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5">
<tr><th>Prefix</th> <th>URI</th> <th>Description</th></tr>
<tr><td><code>rdf:</code></td> <td><code>http://www.w3.org/1999/02/22-rdf-syntax-ns#</code></td> <td>The RDF Vocabulary</td></tr>
<tr><td><code>rdfs:</code></td> <td><code>http://www.w3.org/2000/01/rdf-schema#</code></td> <td>The RDF Schema vocabulary</td></tr>
<tr><td><code>xsd:</code></td> <td><code>http://www.w3.org/2001/XMLSchema#</code></td> <td>XML Schema</td></tr>
<tr><td><code>sc:</code></td> <td><code>http://purl.org/science/owl/sciencecommons/</code></td> <td>The <i>ad hoc</i> Science Commons ontology</td></tr>
<tr><td><code>pubmedRec:</code></td> <td><code>http://purl.org/commons/record/pmid/</code></td> <td>PubMed records (not the articles themselves)</td></tr>
<tr><td><code>article:</code></td> <td><code>http://purl.org/science/article/pmid</code>/</td> <td>PubMed articles</td></tr>
<tr><td><code>ncbi_gene:</code></td> <td><code>http://purl.org/commons/record/ncbi_gene/</code></td> <td>Entrez Gene records (not the genes themselves)</td></tr>
<tr><td><code>proteinsubclass:</code></td> <td><code>http://purl.org/science/protein/subjects/</code></td> <td>Proteins of a given gene participating in a given pathway</td></tr>
<tr><td><code>go:</code></td> <td><code>http://purl.org/obo/owl/GO#</code></td> <td>Gene Ontology terms</td></tr>
<tr><td><code>protein:</code></td> <td><code>http://purl.org/science/protein/bysequence/</code></td> <td>National Center for Biotechnology Information (NCBI) records for Genes sequences</td></tr>
<tr><td><code>ro:</code></td> <td><code>http://www.obofoundry.org/ro/ro.owl#</code> (<a href="http://www.berkeleybop.org/ontologies/obo-all/ro_proposed/ro_proposed.owl">proposed update</a> may be more complete)</td> <td>Relation Ontology (RO): Relationships between members of OBO classes</td></tr>
<tr><td><code>obo:</code></td> <td><code>http://purl.org/obo/owl/obo#</code></td> <td>Open Biomedical Ontologies (OBO)</td></tr>
<tr><td><code>senselab:</code></td> <td><code>http://purl.org/ycmi/senselab/neuron_ontology.owl#</code></td> <td>Neuroscience ontology derived from the SenseLab NeuronDB database</td></tr>
<tr><td><code>dnaGeneProduct:</code></td> <td><code>http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_</code></td> <td>Syntactic trick to shorten <em>sc:is_protein...described_by</em></td></tr>
</table>
</div>
<h3 id="docOutline">1.5 Document Outline</h3>
<p><a href="#introduction">1 <em>Introduction</em></a> motivates and explains this document.</p>
<p><a href="#usecase">2 <em>Use Case</em></a> introduces an
interesting scientific question that the knowledge base can be used
to address.</p>
<p><a href="#dbs">3 <em>Data Sources</em></a> describes the data sources that have been incorporated into the knowledge base.</p>
<p><a href="#terms">4 <em>Design Decisions</em></a> explains the reasons for several design choices.</p>
<p><a href="#mechanics">5 <em>Importing to RDF - Homologene Example</em></a> explains the process of translating data into RDF triples.</p>
<p><a href="#query">6 <em>Query</em></a> explains the use case query that answers the scientific question.</p>
<p><a href="#triplemodel">7 <em>Data Model</em></a> explains the basics of RDF triples.</p>
<p><a href="#newsource">8 <em>Adding a New Data Source</em></a> explains how the SenseLab database was integrated.</p>
<p><a href="#graphs">9 <em>Named Graphs</em></a> discusses the use of named graphs and query details.</p>
<p><a href="#nextsteps">10 <em>Opportunities for further development</em></a> discusses problem areas and possible improvements.</p>
<h2 id="usecase">2 Use Case</h2>
<p>Alzheimer's is a debilitating neurodegenerative disease that affects approximately 27 million people worldwide. The cause of Alzheimer's is currently unknown and no therapy is able to halt its progression. However, insight into the mechanism and potential treatment of this debilitating disease may come from the integration of neurological, biomedical and biological resources. The knowledge base assembles several neurology-related resources alongside an array of clinical and biological resources. This makes it possible to integrate knowledge across several research domains and potentially provide insight into the mechanisms of the disease.</p>
<p>The scientific question under scrutiny in our use case involves several elements of putative functional importance to Alzheimer's. CA1 Pyramidal Neurons (CA1PN) are known to be particularly damaged in Alzheimer's disease and play a key role in signal transduction. Signal transduction pathways are considered to be rich in proteins that might respond to chemical therapy. By integrating information about signal transduction, pyramidal neurons, their genes, and gene products, the query corresponding to our scientific question can provide information relevant to researchers that are looking for drug target candidates that are potentially effective against Alzheimer's Disease.</p>
<h2 id="dbs">3 Data Sources</h2>
<p>In order to incorporate data from several information sources, it was necessary to convert several exported formats, each into its own <em>RDF bundle</em>. The largest RDF bundle of 200M triples resulted from MeSH associations with PubMed articles. In contrast, there were a number of smaller bundles ranging from 10K to 10M triples. This resulted in a total of approximately 350M triples occupying approximately 20GB when loaded into the RDF repository. In several cases, we extracted only a subset, for example, by selecting only human, rat, and mouse data. Click on <em>[Details]</em> in the table below to view details such as the date of the last extraction.</p>
<p>At the time of publication, the following information sources have been (sometimes partially) incorporated into the knowledge base. This set will continue to be extended in depth (i.e., more complete inclusion of partially represented data sets) and in breadth (i.e., novel data sets):</p>
<table class="dbsTable">
<tr><td class="db"><a href="http://www.brain-map.org/">Allen Brain Atlas (ABA)</a></td>
<td>Allen Brain Atlas is an interactive, genome-wide image database of gene expression in the mouse brain. A combination of RNA <i>in situ</i> hybridization data, detailed Reference Atlases and informatics analysis tools are integrated to provide a searchable digital atlas of gene expression. Together, these resources present a comprehensive online platform for exploration of the brain at the cellular and molecular level.</td>
<td><a href="#aba">[Details]</a></td></tr>
<tr><td class="db"><a href="http://www.addgene.org/">Addgene</a></td>
<td>A catalog of plasmids from Addgene</td>
<td><a href="#addgene">[Details]</a></td></tr>
<tr><td class="db"><a href="http://brancusi.usc.edu/bkms/">BAMS</a></td>
<td>The Brain Architecture Management System (BAMS) is designed to be a repository of information about brain structures from different species, and has a set of inference engines for processing the neurobiological data. BAMS contains five interrelated modules: Brain Parts (brain regions, major fiber tracts, and ventricles), Cell Types, Molecules, Relations (between structures from different neuroanatomical atlases), and Connections.</td>
<td><a href="#bams">[Details]</a></td></tr>
<tr><td class="db"><a href="http://www.opengalen.org/faq/faq4.html">GALEN</a></td>
<td>GALEN is an advanced terminology of medical concepts for clinical information systems. We imported the <a href="http://www.co-ode.org/galen/">GALEN ontology</a> in OWL from <a href="http://www.co-ode.org/">CO-ODE</a>.</td>
<td><a href="#galen">[Details]</a></td></tr>
<tr><td class="db"><a href="ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/README">NCBI gene_info</a></td>
<td>Information from the gene_info file distributed by NCBI that was imported into OWL.</td>
<td><a href="#geneinfo">[Details]</a></td></tr>
<tr><td class="db"><a href="http://www.geneontology.org/">Gene Ontology (GO)</a></td>
<td>The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. GO terms are often used to annotate gene and protein records.</td>
<td><a href="#goa">[Details]</a></td></tr>
<tr><td class="db"><a href="http://www.ebi.ac.uk/GOA/">GOA</a></td>
<td><a href="http://www.geneontology.org/">GO</a> annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI).</td>
<td><a href="#goa">[Details]</a></td></tr>
<tr><td class="db"><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=homologene">HomoloGene</a></td>
<td> Homologene is a system for automated detection of homologs among the annotated genes of several completely sequenced <a href="http://en.wikipedia.org/wiki/Eukaryotic">eukaryotic</a> genomes.</td>
<td><a href="#homologene">[Details]</a></td></tr>
<tr><td class="db"><a href="http://pubmed.gov">MEDLINE/PubMed</a></td>
<td>PubMed is a service of the <a href="http://www.nlm.nih.gov/">U.S. National Library of Medicine</a> that includes over 17 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. PubMed includes links to full text articles and other related resources.</td>
<td><a href="#pubmed">[Details]</a></td></tr>
<tr><td class="db"><a href="http://www.nlm.nih.gov/mesh/introduction2008.html">MeSH</a></td>
<td>Medical Subject Headings. 2008 MeSH includes the subject descriptors appearing in <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed">MEDLINE/PubMed</a>, the National Library of Medicine (NLM) catalog database, and other NLM databases.</td>
<td><a href="#mesh">[Details]</a></td></tr>
<tr><td class="db"><a href="http://sw.neurocommons.org/2007/text-mining.html">Neurocommons Text Mining Pilot</a></td>
<td>Protein/gene associations/interactions extracted from <a href="http://www.temis.com/">Temis</a> software applied to 7% of Medline records. Annotations were captured in RDF using the <a href="http://sw.neurocommons.org/2007/schema.html">Neurocommons Annotations Schema</a>.</td>
<td><a href="#textmining">[Details]</a></td></tr>
<tr><td class="db"><a href="http://www.berkeleybop.org/ontologies/">Open Biomedical Ontologies (OBO)</a></td>
<td>All Open Biomedical Ontologies (<a href="http://obofoundry.org/">OBO</a>) available from <a href="http://www.berkeleybop.org/">Berkeley Bioinformatics Open-source Projects.</a></td>
<td><a href="#obo">[Details]</a></td></tr>
<!-- <tr><td class="db">Selected Open Biomedical Ontologies</td>
<td>Selected OBO ontologies, downloaded ~21 April 2007, augmented with inferred relations</td></tr> -->
<tr><td class="db"><a href="http://sw.neurocommons.org/2007/kb-sources/sciencecommons.owl">Science Commons Ontology</a></td>
<td>A bridging ontology, from <a href="http://sciencecommons.org">Science Commons</a>, importing other ontologies used in the prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology.</td>
<td><a href="#sciencecommons">[Details]</a></td></tr>
<tr id="ds_SenseLab"><td class="db"><a href="http://neuroweb.med.yale.edu/senselab/">SenseLab</a></td>
<td>See <a href="http://www.w3.org/TR/hcls-senselab/">Experiences with the conversion of SenseLab databases to RDF/OWL</a>.</td>
<td><a href="#senselab">[Details]</a></td></tr>
<tr><td class="db"><a href="http://swan.mindinformatics.org/">SWAN</a></td>
<td><a href="http://purl.org/swan/1.1/">Semantic Web Applications in Neuromedicine</a> [<a href="#ref-SWAN">SWAN</a>] is a knowledge base of hypotheses, claims, and evidence in Alzheimer Disease (AD) research, created through a community process to capture the collective scientific insights of the AD field.</td>
<td><!-- a href="#swan">[Details]</a></td -->Not yet public</td></tr>
<tr><td class="db"><a href="http://www.w3.org/2004/02/skos/">SKOS</a></td>
<td>Simple Knowledge Organization System (SKOS): specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web.</td>
<td><a href="#mesh-skos">[Details]</a></td>
<!-- <td><a href="http://www.w3.org/Consortium/Legal/copyright-software" -->
<!-- >W3C software licensing rules</a></td> -->
</tr>
</table>
<h2 id="terms">4 Design Decisions</h2>
<p>A number of design decisions were made during the construction
of the prototype knowledge base. Many of the decisions were pragmatic in
nature, as a consequence of the need to implement the solution on
a commodity PC within a two-month period for a demonstration at
WWW2007.</p>
<ul>
<li><em><b>URI Scheme</b></em>
<br></br>HTTP URIs were adopted as the mechanism to identify
biological entities. In particular, URIs with a Persistent URL
(<a href="http://purl.org">PURL</a>) were used as they provide re-direction capabilities, which
make the identifiers more robust against future change.</li>
<li><em><b>Unifying terms</b></em>
<br></br> While data in different information sources may talk
about the same thing, one must provide a common set of identifiers
in order to get the RDF graph to connect. For instance, the named
graph PubMesh uses gene record
identifiers to relate genes to PubMed articles. It uses terms like
<span class="var gene">ncbi_gene:1812</span> to identify a gene
record. The Gene Ontology database
records use the same identifiers, which allows us to easily link
information contained in the two corresponding named graphs. New
databases are able to connect their data graphs to the existing
store by re-using the same terms. We accomplished this by
translating internal identifiers from the databases into URIs in
our chosen scheme.</li>
<li><em><b>Ontology Design</b></em>
<br></br> An ontology was built with sufficient detail for the
immediate needs of the demonstration and was limited by the date
of the demo. Consequently, it contains more detail in the core
areas of focus, than in areas of more peripheral interest. The
ontology was written in OWL-DL so that we could specify statements
in an interoperable and computable way. We also wanted to verify
small subsets for consistency during development, with the hope
that in the future a more capable repository will be able to do
appropriate inferences based on the class and property
definitions. The ontology distinguishes between real world
entities and documents about real world entities. We endeavored to
follow the OBO foundry methodology, which espouses the principle
that we first identify what instances are by identifying them with
physical things, such as a molecule in some person's body. Classes
are defined as sets of those instances. For example, the class of
glutamate receptors can be defined as multimeric macromolecules
that have high binding affinity for glutamate molecules. Expressed
more formally, we can say <i>EVERY</i> glutamate receptor
<i>IS_A</i> multimeric macromolecule <i>THAT</i> has high binding
affinity for <i>SOME</i> glutamate molecule. In this way, the
class of glutamate receptors can be defined in terms of the
classes multimeric macromolecules and glutamate molecule,
something which OWL expresses quite naturally. The knowledge base
contains many such definitions of classes.</li>
<li><em><b>Multiple Graphs</b></em>
<br></br>
Once the data was converted into RDF/OWL, it was loaded into the
triple store as a number of separate graphs. This approach made
it simpler to re-load and update data, which was required often
as a consequence of iterative enhancements to the ontology. This
fast upload capability proved critical as the data reached the
scale of hundreds of millions of triples. This partitioning of
data also helped queries to be performed rapidly.</li>
<!-- The core
constructs used by the ontology include (has value?)... -->
<li><em><b>Precomputed Inferences</b></em>
<br></br>
Our approach has been to choose a representation in valid OWL-DL,
with the expectation that queries would be evaluated against all
answers that could be inferred from our representation. However,
our triple store has no native inferencing capabilities. To
enable querying against inferred information, we added
pre-computed inferences in the form of non-OWL-DL, direct
class-class relations, to the <em>classrelations</em> graph (see
<a href="#graphs">Named Graphs</a> section). These non-OWL-DL
relations were added so that it would be easy to use SPARQL
queries to access the inferences, which were in some cases
represented in OWL as property restrictions, as in the case of
partonomic relations. The direct class-class relations were more
compact to represent in RDF and queries that took advantage of
them were easier to write in SPARQL.</li>
</ul>
<h2 id="mechanics">5 Importing to RDF - Homologene Example</h2>
<p>A number of different approaches were used for the conversion
of data into RDF/OWL. The most commonly used approach was the use
of <a href="http://svn.neurocommons.org/svn/trunk/convert/">Lisp code</a> to read text exports of the data and create OWL or
RDF documents. We will focus on the example of importing data
from Homologene.</p>
<p>The general steps required to import from an existing data source into RDF are:</p>
<ul>
<li><i>Read the data into your program:</i> This can be accomplished by
first exporting to a text format of choice (CSV, tab-delimited,
XML, etc.) and reading that format in or accessing the database
directly with a database connector.</li>
<li><i>Write the data into the desired RDF format:</i> This can be in
the form of an RDF/XML file that is then loaded into the
repository. The Turtle format of RDF is also often supported and may be
easier to produce and manipulate. Another approach is to use
software libraries that allow you to add triples directly to your
repository.</li>
</ul>
<!-- <p><em>Note: Although some HCLSIG members have suggested a preference for exporting to XML and creating the RDF from XQuery, we didn't apply that technique here.</em></p> -->
<p>In the case of Homologene, we start with a text file that contains the exported information. The original tab delimited file is <a href="ftp://ftp.ncbi.nih.gov/pub/HomoloGene/build54/homologene.data">ftp://ftp.ncbi.nih.gov/build54/homologene.data</a>.</p>
<p>Here is a sample of the original file:</p>
<pre style="border: double;">
99949 9606 727759 LOC727759 113427825 XP_001125931.1
99949 10116 678753 LOC678753 109498373 XP_001053282.1
99949 5833 812783 GeneID:812783 16805082 NP_473111.1
99950 3702 820917 AT3G16650 18401203 NP_566557.1
</pre>
<p>We are interested in the first 3 fields. The first field identifies
the homologous cluster. The second field is the species taxon. The
third field is the EntrezGene id. We are only interested in human,
mouse, rat, taxon ids: "9606" "10116" "10090".</p>
<p>The <a href="http://svn.neurocommons.org/svn/trunk/convert/homologene.lisp">Lisp code</a> for the homologene conversion is also available. In the conversion process, we first iterate over the lines in the file, creating a table mapping cluster id to the pairs of taxon id, entrez id in the cluster. This is the variable <em>homologene</em>, created by the function <em>read-homologene</em>. For each of these clusters we will create an individual to represent the cluster e.g for cluster 99949:</p>
<pre>
<sciencecommons:orthology_record rdf:about="http://purl.org/science/record/homologene/cluster_r54_99949">
<sciencecommons:has_homologous_gene_record rdf:resource="http://purl.org/commons/record/ncbi_gene/678753"/>
<sciencecommons:has_homologous_gene_record rdf:resource="http://purl.org/commons/record/ncbi_gene/727759"/>
<sciencecommons:has_supporting_evidence rdf:resource="http://purl.org/science/evidence/homologene/cluster_r54_99949"/>
</sciencecommons:orthology_record>
</pre>
<p>Above is the RDF/XML (see [<a href="#ref-RDF">RDF</a>]) expression of:</p>
<pre>@PREFIX homologene: <http://purl.org/science/record/homologene/>
homologene:cluster_r54_99949 sciencecommons:has_homologous_gene_record rdf:resource ncbi_gene:678753 .
homologene:cluster_r54_99949 sciencecommons:has_homologous_gene_record rdf:resource ncbi_gene:727759 .
homologene:cluster_r54_99949 sciencecommons:has_supporting_evidence homologene:cluster_r54_99949 .</pre>
<p>Note that we used HTTP URLs to identify Homologene records by prefixing the EntrezGene identifiers (e.g. <code>727759</code>) with a stem URL, <code>http://purl.org/commons/record/ncbi_gene/</code>. The <a href="http://purl.org/commons/record/ncbi_gene/727759">resulting URL</a> can be usefully resolved with a web browser. The domain purl.org serves Persistent URLs (PURLs), which currently redirect these requests for NCBI gene identifiers to a script at sw.neurocommons.org. If the community wishes to move the service to, for instance, an NCBI page about these genes, they can simply notify the custodians of purl.org. This extra level of indirection protects these identifiers from becoming orphaned as organizations stop existing or change their priorities.
These URLs were also used to identify gene information imported from other data sources, automatically linking the Semantic Web representations of these records. For example, PubMesh statements about gene records use these same identifiers for genes, as do the statements from Gene Ontology and SenseLab. This allows for trivial data integration between different resources involving Entrez Gene records.</p>
<!-- A placeholder has been put in the science commons ontology to
allow information relating to the 'evidence' of an
assertion. For example, this could include the BLAST scores to be
used to establish the level of orthology. -->
<p>Also, the individual http://purl.org/science/evidence/homologene/cluster_r54_99949 serves as a link to the "evidence", which is not elaborated in this translation, but would include the BLAST scores and other evidence used to establish the orthology in future work. (see <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=homologene&dopt=AlignmentScores&list_uids=99949">http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=homologene&dopt=AlignmentScores&list_uids=99949</a>)</p>
<h2 id="query">6 Query</h2>
<p>Our scientific question can be summarized as "What genes are
involved in signal transduction that are related to pyramidal
neurons?". The scientific question can be answered with the
following query, which searches for gene names and processes from
four data sources within the knowledge base. The data sources
include: MeSH (Pyramidal Neurons), PubMed (Journal Articles),
Entrez Gene (Genes), Gene Ontology (Signal Transduction). The example <a
href="pyrNeurSigTransduct.rq">query</a> selects the gene
name of the genes involved in <em>signal transduction</em> that
are related to <em>pyramidal neurons</em>. Some of the complexity
in this query comes from the need to capture relevant anatomical
and functional detail at the subcellular and molecular level. The
portion probing the Gene Ontology queries
a set of classes describing processes at the molecular level. Our
query employs the SPARQL RDF query language to perform knowledge
integration across the sources of the knowledge base. Details on
SPARQL can be found in the <a
href="#references">References</a>.</p>
<p><em>[Note: The query below will not work verbatim at SPARQL endpoints. We have simplified the actual <a href="#qWithGraphs">Banff demonstration query</a> for explanatory purposes in our example below. The Banff demonstration query is discussed in more detail in <a href="#graphs">Named Graphs</a> Section. You can try running the query <a href="http://purl.org/hcls/hclskb_demo.html">HERE</a>.].</em></p>
<p>Please note that the same color (and CSS class) is used to connect the descriptive text in the query with relevant portions of the following figures.</p>
<table class="dbsTable" style="font-size: 75%">
<thead>
<tr>
<th>Source</th>
<th>(colored) CSS class</th>
</tr>
</thead>
<tbody>
<tr>
<td>PubMesh</td>
<td><span class="mesh">mesh</span></td>
</tr>
<tr>
<td>Gene Ontology Annotation (GOA)</td>
<td><span class="goa">goa</span></td>
</tr>
<tr>
<td>Entrez Gene</td>
<td><span class="glbl">glbl</span></td>
</tr>
<tr>
<td>Gene Ontology</td>
<td><span class="plbl">plbl</span></td>
</tr>
</tbody>
</table>
<pre class="query" id="geneProc">SELECT ?genename ?processname
WHERE {
<span class="mesh" id="pyrNeurSigTransduct_mesh" title="Medical Subject Headings"> <span class="comment"># PubMeSH includes <span class="var gene">?gene_record</span>s mentioned in <span class="var article">?article</span>s which are identified by pmid in <span class="var">?pubmed_record</span>s .</span>
?pubmed_record sc:has-as-minor-mesh <a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a> .
?article sc:identified_by_pmid ?pubmed_record .
<span class="var gene">?gene_record</span> sc:describes_gene_or_gene_product_mentioned_by ?article .</span>
<span class="goa" id="pyrNeurSigTransduct_goa" title="Gene Ontology Database"> <span class="comment"># The Gene Ontology has a set of <span class="var protein">?protein</span>s such that foreach <span class="var protein">?protein</span>, <span class="var protein">?protein</span> ro:has_function [ ro:realized_as <span class="var process">?process</span> ].</span>
?protein rdfs:subClassOf ?restriction1 .
?restriction1 owl:onProperty ro:has_function .
?restriction1 owl:someValuesFrom ?restriction2 .
?restriction2 owl:onProperty ro:realized_as .
?restriction2 owl:someValuesFrom <span class="var process">?process</span> .
<span class="comment"># Also, foreach ?protein, ?protein has a parent class which is linked by some predicate to <span class="var gene">?gene_record</span>.</span>
?protein rdfs:subClassOf ?protein_superclass .
?protein_superclass owl:equivalentClass ?restriction3 .
?restriction3 owl:onProperty <abbr title="http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_described_by">dnaGeneProduct:described_by</abbr> .
?restriction3 owl:hasValue <span class="var gene">?gene_record</span> .
<span class="comment"># Each <span class="var process">?process</span> (that we are interested in) is a subclass of the <em>signal transduction</em> process.</span>
<span class="var process" id="query-part-of">?process</span> obo:part_of <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165"><span class="identifier" title="signal transduction">go:GO_0007165</span></a> .</span>
<span class="glbl" id="pyrNeurSigTransduct_glbl" title="Gene Labels"> <span class="var gene">?gene_record</span> rdfs:label ?genename .</span>
<span class="plbl" id="pyrNeurSigTransduct_plbl" title="Process Labels"> <span class="var process">?process</span> rdfs:label ?processname .</span>
}</pre>
<p>The following shows a few of the results from the query:</p>
<table>
<thead>
<tr>
<!-- th>pubmed_record</th -->
<th>gene_record_name</th>
<th>processname</th>
<!-- th>receptor_protein_name</th -->
</tr>
</thead>
<tbody>
<tr>
<!-- td>http://purl.org/commons/record/pmid/16640790</td -->
<td>Entrez Gene record for human DRD1, 1812</td>
<td>adenylate cyclase activation</td>
<!-- td>D1 receptor</td -->
</tr>
<tr>
<!-- td>http://purl.org/commons/record/pmid/16640790</td -->
<td>Entrez Gene record for human ADRB2, 154</td>
<td>adenylate cyclase activation</td>
<!-- td></td -->
</tr>
<tr><td colspan="3">...</td> </tr>
</tbody>
</table>
<p>The following section describes the RDF data model and how we employed it to make our query possible.</p>
<h2 id="triplemodel">7 Data Model</h2>
<p>The data in the knowledge base is modeled in OWL-DL, which has been expressed as RDF triples. Briefly, an RDF triple consists of a <em>subject</em>, <em>predicate</em>, and <em>object</em>. The predicate is also known as the <em>property</em> of the triple. Subjects and objects in the data unify to create an RDF Graph, with subjects and objects as nodes and predicates as edges. For more information about RDF and OWL, see the <a href="#references">References</a> section in the Appendix.</p>
<p>Nodes labeled with a leading "_:", e.g. <em>_:activateAdenalCyclase</em>, are called <a href="http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-blank-nodes">RDF blank nodes</a> [<a href="#refCONCEPTS">CONCEPTS</a>]. These frequently have machine-generated identifiers and are therefore typically opaque to a human reader (e.g., the set of all nodes that represent protein entities linked to the GO molecular function Adenal Cyclase Activation). Here, for the purposes of explanation, they have been named to convey meaning to the reader. Blank nodes ending in "_1" in this document indicate this blank node is one of many in this class, e.g. <em>_:signalingParticipants_1</em>.</p>
<div class="figure">
<p id="triplesPicture">
<object data="triples.svg" type="image/svg+xml" height="490" width="850"><img src="triples.png" alt="Triples in Solution" /></object>
<!-- img src="triples.png" alt="Triples in Solution" / -->
</p>
<p>Figure 1. Triples in Solution [<a href="triples.svg">SVG image</a> <a href="triples.png">PNG image</a>]</p>
</div>
<p>Figure 1, <em>Triples in Solution</em>, shows a graphical representation of the triples that compose <em>one</em> solution to the query posed in <a href="#query">section 6</a>. Following is a discussion of the origins and intents of those triples:</p>
<p>The <a href="http://sw.neurocommons.org/2007/text-mining.html">application of a commercial text mining tool</a> to neuroscience-related PubMed abstracts results in a set of annotations that link MeSH terms to genes (for more details on MeSH, see the table in <a href="#dbs">Data Sources</a>). An article with PubMed id 10698743 mentions <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a> and that the corresponding PubMed record has a MeSH term <a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a>. The following three triples express this:</p>
<table id="triplesTable" class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<thead>
<tr>
<th>subject</th>
<th>predicate</th>
<th>object</th><th class="placeholder"></th><th class="placeholder"></th>
</tr>
</thead>
<tbody>
<tr class="mesh" id="pyrNeurSigTransduct_triples_mesh" title="Medical Subject Headings"> <td>pubmedRec:10698743</td> <td>sc:has-as-minor-mesh</td> <td><a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a></td> <td>.</td></tr>
<tr class="mesh"><td>article:10698743</td> <td>sc:identified_by_pmid</td> <td>pubmedRec:10698743</td> <td>.</td></tr>
<tr class="mesh"><td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td> <td>sc:describes_gene_or_gene_product_mentioned_by</td> <td>article:10698743</td> <td>.</td></tr>
</tbody>
</table>
<p>A set of genes or gene products in human bodies are described by <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a>. Here, we call this set <em>_:equiv1812</em>.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="goa"><td>_:equiv1812</td> <td>owl:onProperty</td> <td title="http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_described_by">dnaGeneProduct:described_by</td> <td>.</td></tr>
<tr class="goa"><td>_:equiv1812</td> <td>owl:hasValue</td> <td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td> <td>.</td></tr>
</tbody>
</table>
<p><em><span>protein:ncbi_gene.1812</span></em> has the same extension (members) as the OWL restriction <em>_:equiv1812</em>.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="goa"><td>protein:ncbi_gene.1812</td> <td>owl:equivalentClass</td> <td>_:equiv1812</td> <td>.</td></tr>
</tbody>
</table>
<p>The expression</p>
<pre>NamedClass equivalentClass R .
R onProperty SomeProperty .
R hasValue SomeClass</pre>
<p> is the RDF representation of an OWL class axiom that says: for all X such that</p>
<pre>X SomeProperty SomeClass .</pre>
<p>X is a member of the class <code>NamedClass</code> (and vice versa). See <a href="http://www.w3.org/TR/owl-semantics/mapping#transformation_hasValue">OWL Web Ontology Language Semantics and Abstract Syntax Section 4. Mapping to RDF Graphs</a> for a formal treatment of this.</p>
<p>Using our other supplied constant, we note that adenylate cyclase activation, <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&selected=GO:0007190&viz=graph"><span class="var process" title="adenylate cyclase activation">go:GO_0007190</span></a>, is part of signal transduction, <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165&selected=GO:0007165&viz=graph"><span title="signal transduction" class="identifier input">go:GO_0007165</span></a>. <span id="subclass-part-of" class="note">Note: this simplified query matches only processes that are a sub-process of go:GO_0007165; the <a href="#qWithGraphs">actual query</a>, described in <a href="#graphs">§9 Named Graphs</a>, looks also for subclasses. The part_of relationships were inferred from the OWL class restrictions described in <a href="#preproc">§7.1 Precomputing Inferences</a>.</span> The class of functions that are <em>realized_as</em> adenylate cyclase activation is here labeled <em>_:activateAdenylCyclase</em>.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="goa"><td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&selected=GO:0007190&viz=graph"><span class="var process">go:GO_0007190</span></a></td> <td>obo:part_of</td> <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165&selected=GO:0007165&viz=graph"><span class="identifier" title="signal transduction">go:GO_0007165</span></a></td> <td>.</td></tr>
<tr class="goa"><td>_:activateAdenylCyclase</td> <td>owl:onProperty</td> <td>ro:realized_as</td> <td>.</td></tr>
<tr class="goa"><td>_:activateAdenylCyclase</td> <td>owl:someValuesFrom</td> <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&selected=GO:0007190&viz=graph"><span class="var process">go:GO_0007190</span></a></td> <td>.</td></tr>
</tbody>
</table>
<p>There are many possible classes of substance participating in molecular signaling, one of which (called here <em>_:molecularSignalers_1</em>) is defined by the ability to activate adenyl cyclase.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="goa"><td>_:signalingParticipants_1</td> <td>owl:onProperty</td> <td>ro:has_function</td> <td>.</td></tr>
<tr class="goa"><td>_:signalingParticipants_1</td> <td>owl:someValuesFrom</td> <td>_:activateAdenylCyclase</td> <td>. <span class="comment"></span></td></tr>
</tbody>
</table>
<p>The class of proteins in the intersection of <em>_:signalingParticipants_1</em> and <em><span>protein:ncbi_gene.1812</span></em> is here abbreviated <em>proteinsubclass:p1812_7190_1</em>, though the actual identifier is <em>proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a</em>. <span id="empty-protein-set" class="note">Note: the Venn diagram reveals that this set is potentially empty, theoretically permitting the query to range over pairs of gene/process that aren't related through any known protein. However, OWL-DL reasoners will not infer new classes, so the proteins in the intersection of ncbi_gene:1812 and the substances participating in molecular signaling is restricted to the set which have already been entered into the knowledge base, e.g. like <em>proteinsubclass:p1812_7190_1</em></span></p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="goa" id="pyrNeurSigTransduct_triples_goa" title="Gene Ontology Database"> <td title="proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a">proteinsubclass:p1812_7190_1</td> <td>rdfs:subClassOf</td> <td>_:signalingParticipants_1</td> <td>.</td></tr>
<tr class="goa"><td title="proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a">proteinsubclass:p1812_7190_1</td> <td>rdfs:subClassOf</td> <td>protein:ncbi_gene.1812</td> <td>.</td></tr>
</tbody>
</table>
<p><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a> and <a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&selected=GO:0007190&viz=graph"><span class="var process">go:GO_0007190</span></a> have human-readable labels.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="glbl" id="pyrNeurSigTransduct_triples_glbl" title="Gene Labels"> <td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td> <td>rdfs:label</td> <td>"Entrez Gene record for human DRD1, 1812"</td> <td>.</td></tr>
<tr class="plbl" id="pyrNeurSigTransduct_triples_plbl" title="Process Labels"> <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007190&selected=GO:0007190&viz=graph"><span class="var process">go:GO_0007190</span></a></td> <td>rdfs:label</td> <td>"adenylate cyclase activation"</td> <td>.</td></tr>
</tbody>
</table>
<p>The addition of another MeSH record gives us another solution:</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="mesh" id="pyrNeurSigTransduct_triples_mesh2" title="Medical Subject Headings"> <td>pubmedRec:11441182</td> <td>sc:has-as-minor-mesh</td> <td><a href="http://www.slicksurface.com/medical-thesaurus/descriptor/D017966/pyramidal-cells.htm"><span class="identifier" title="pyramidal neurons">mesh:D017966</span></a></td> <td>.</td></tr>
<tr class="mesh" title="Medical Subject Headings"><td>article:11441182</td> <td>sc:identified_by_pmid</td> <td>pubmedRec:11441182</td> <td>.</td></tr>
<tr class="mesh" title="Medical Subject Headings"><td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td> <td>sc:describes_gene_or_gene_product_mentioned_by</td> <td>article:11441182</td> <td>.</td></tr>
</tbody>
</table>
<h3 id="preproc">7.1 Precomputing Inferences</h3>
<div class="figure">
<p id="rulePicture">
<object data="rule.svg" type="image/svg+xml" height="490" width="850"><img src="rule.png" alt="obo:part_of Rule" /></object>
<!-- img src="rule.png" alt="obo:part_of Rule" / -->
</p>
<p>Figure 2. obo:part_of Rule [<a href="rule.svg">SVG image</a> <a href="rule.png">PNG image</a>]</p>
</div>
<p>The demonstration query depends on the existence of an <em>obo:part_of</em> (or <em>rdfs:subClassOf</em>) relationship between any part (i.e. any subclass of any step in the sequence) of molecular signaling, and the general identifier for molecular signaling, <em>go:GO_0007165</em>:</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr style="background-color: #dddddd" class="goa" title="Gene Ontology Database"><td><a href="#query-part-of"><span class="var process">?process</span></a></td> <td>obo:part_of</td> <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165"><span class="identifier" title="signal transduction">go:GO_0007165</span></a></td> <td>.</td></tr>
</tbody>
</table>
<p>Triples of this form were generated by a rule, graphically expressed in Figure 2, <em>obo:part_of Rule</em>. The shaded area on the right of the figure shows the OWL restriction which is the antecedent of the rule:</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="goa" title="Gene Ontology Database"><td>_:subPart</td> <td>owl:onProperty</td> <td>obo:part_of</td> <td>.</td></tr>
<tr class="goa" title="Gene Ontology Database"><td>_:subPart</td> <td>owl:allValuesFrom</td> <td>_:subClass</td> <td>.</td></tr>
<tr class="goa" title="Gene Ontology Database"><td>_:subClass</td> <td>owl:onProperty</td> <td>rdfs:subClassOf</td> <td>.</td></tr>
<tr class="goa" title="Gene Ontology Database"><td>_:subClass</td> <td>owl:hasValue</td> <td>_:parentClass</td> <td>.</td></tr>
</tbody>
</table>
<p>The symmetric property for <em>rdfs:subClassOf</em> need not be explicitly modeled because the <a>RDF Schema Specification</a> defines subClassOf, including its transitivity. Note that if <em>_:subClass</em> is a subClassOf <em>_:parentClass</em>, then all members of <em>_:subClassOf</em> are of type <em>_:parentClass</em> (as well as <em>_:subClass</em>):</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="goa" title="Gene Ontology Database"><td>_:subClass</td> <td>owl:onProperty</td> <td>rdf:type</td> <td>.</td></tr>
<tr class="goa" title="Gene Ontology Database"><td>_:subClass</td> <td>owl:hasValue</td> <td>_:parentClass</td> <td>.</td></tr>
</tbody>
</table>
<!--
<tr class="goa" title="Gene Ontology Database"><td>_:subClass2</td> <td>owl:onProperty</td> <td>obo:part_of</td> <td>.</td></tr>
<tr class="goa" title="Gene Ontology Database"><td>_:subClass2</td> <td>owl:allValuesFrom</td> <td>_:subClass</td> <td>.</td></tr>
<p>This is a pre-compiled closure of <tt>All members of <span class="var process" title="adenylate cyclase activation">go:GO_0007190</span> are parts of <span title="signal transduction" class="identifier input">go:GO_0007165</span>.</tt> It is not <tt>{ ?process (rdfs:subClassOf*|obo:part_of*) go:GO_0007165 }</tt> as may intuited from <tt>rdfs:subClassOf</tt>'s semantics.</p>
-->
<p>Because the triple store used does not perform inferencing, these triples have been pre-computed (forward-chained) and inserted into the triple store. This also simplifies the query. If these triples were not pre-computed, the <tt>obo:part-of</tt> part of the query would be expressed:</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr style="background-color: #dddddd" title="Gene Ontology Database"><td><span class="var process">?process</span></td> <td>rdfs:subClassOf</td> <td><span class="identifier var">?what</span></td> <td>.</td></tr>
<tr style="background-color: #dddddd" title="Gene Ontology Database"><td><span class="var">?what</span></td> <td>owl:onProperty</td> <td><span class="identifier" title="signal transduction">obo:has_part</span></td> <td>.</td></tr>
<tr style="background-color: #dddddd" title="Gene Ontology Database"><td><span class="var">?what</span></td> <td>owl:someValuesFrom</td> <td><a href="http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0007165"><span class="identifier" title="signal transduction">go:GO_0007165</span></a></td> <td>.</td></tr>
</tbody>
</table>
<p>and would need to query over a transitive closure of the union of the <tt>obo:part-of</tt> and <tt>rdfs:subClassOf</tt> rules.</p>
<h2 id="newsource">8 Adding a New Data Source</h2>
<p><a href="#ds_SenseLab">SenseLab</a> is a collection of relational (Oracle) databases for neuroscientific research that was independently added to the knowledge base after the other data sources. An accompanying document, <a href="../senselab/">Experiences with the conversion of SenseLab databases to RDF/OWL</a>, describes the details of adding it to this knowledge base. With this new data incorporated, the <a href="#geneProc">example query</a> could be extended to extract data from the <span class="senselab">new data source</span>, in this case, discovering the names of receptor proteins associated with the genes discovered in the previous query. In an integrative query of this sort, we can use the results as a starting point for more detailed queries of a particular repository, such as in this case SenseLab.</p>
<pre class="query" id="geneProcReceptor">SELECT ?genename ?processname <span class="senselab">?receptor_protein_name</span>
WHERE {
<span class="mesh" id="pyrNeurSigTransduct_senselab_mesh" title="Medical Subject Headings"> <span class="comment"># PubMeSH includes <span class="var gene">?gene_record</span>s mentioned in <span class="var article">?article</span>s which are identified by pmid in <span class="var">?pubmed_record</span>s .</span>
?pubmed_record sc:has-as-minor-mesh <span class="identifier" title="pyramidal neurons">mesh:D017966</span> .
?article sc:identified_by_pmid ?pubmed_record .
<span class="var gene">?gene_record</span> sc:describes_gene_or_gene_product_mentioned_by ?article .</span>
<span class="goa" id="pyrNeurSigTransduct_senselab_goa" title="Gene Ontology Database"> <span class="comment"># The Gene Ontology asserts that foreach <span class="var protein">?protein</span>, <span class="var protein">?protein</span> ro:has_function [ ro:realized_as <span class="var process">?process</span> ].</span>
?protein rdfs:subClassOf ?restriction1 .
?restriction1 owl:onProperty ro:has_function .
?restriction1 owl:someValuesFrom ?restriction2 .
?restriction2 owl:onProperty ro:realized_as .
?restriction2 owl:someValuesFrom <span class="var process">?process</span> .
<span class="comment"># Also, foreach <span class="var process">?protein</span>, <span class="var process">?protein</span> has a parent class which is linked by some predicate to <span class="var gene">?gene_record</span>.</span>
?protein rdfs:subClassOf ?protein_superclass .
?protein_superclass owl:equivalentClass ?restriction3 .
?restriction3 owl:onProperty <abbr title="http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_described_by">dnaGeneProduct:described_by</abbr> .
?restriction3 owl:hasValue <span class="var gene">?gene_record</span> .
<span class="comment"># Each <span class="var process">?process</span> (that we are interested in) is a subclass of the <em>signal transduction</em> process.</span>
<span class="var process">?process</span> obo:part_of <span class="identifier" title="signal transduction">go:GO_0007165</span> .</span>
<span class="glbl" id="pyrNeurSigTransduct_senselab_glbl" title="Gene Labels"> <span class="var gene">?gene_record</span> rdfs:label ?genename .</span>
<span class="plbl" id="pyrNeurSigTransduct_senselab_plbl" title="Process Labels"> <span class="var process">?process</span> rdfs:label ?processname .</span>
<span class="senselab"> OPTIONAL {
<span class="comment"># Foreach <span class="var">?gene</span>, <span class="var">?gene</span> senselab:has_nucleotide_sequence_described_by <span class="var gene">?gene_record</span> .</span>
?gene owl:equivalentClass ?restriction4 .
?restriction4 owl:onProperty senselab:has_nucleotide_sequence_described_by .
?restriction4 owl:hasValue <span class="var gene">?gene_record</span> .
<span class="comment"># Foreach <span class="var">?receptor_protein</span>, <span class="var">?receptor_protein</span> senselab:proteinGeneProductOf <span class="var">?gene</span> .</span>
?receptor_protein rdfs:subClassOf ?restriction5 .
?restriction5 owl:onProperty senselab:proteinGeneProductOf .
?restriction5 owl:someValuesFrom ?gene .
<span class="comment"># Find the labels of all such <span class="var">?receptor_protein</span>s.</span>
?receptor_protein rdfs:label ?receptor_protein_name
}</span>
}</pre>
<p>yielding another variable in our results:</p>
<table>
<thead>
<tr>
<!-- th>pubmed_record</th -->
<th>gene_record_name</th>
<th>processname</th>
<th>receptor_protein_name</th>
</tr>
</thead>
<tbody>
<tr>
<!-- td>http://purl.org/commons/record/pmid/16640790</td -->
<td>Entrez Gene record for human DRD1, 1812</td>
<td>adenylate cyclase activation</td>
<td>D1 receptor</td>
</tr>
<tr>
<!-- td>http://purl.org/commons/record/pmid/16640790</td -->
<td>Entrez Gene record for human ADRB2, 154</td>
<td>adenylate cyclase activation</td>
<td>NULL</td>
</tr>
<tr><td colspan="3">...</td> </tr>
</tbody>
</table>
<p>The additional triples this matched in the <em>SenseLab</em> knowledge base connect to the existing data by talking about the same genes, e.g. <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a>.</p>
<div class="figure">
<p id="senselabPicture">
<object data="slTriples.svg" type="image/svg+xml" height="355" width="850"><img src="slTriples.png" alt="Additional Triples from SenseLab" /></object>
<!-- img src="slTriples.png" alt="Additional Triples from SenseLab" / -->
</p>
<p>Figure 3. Additional Triples from SenseLab [<a href="slTriples.svg">SVG image</a> <a href="slTriples.png">PNG image</a>]</p>
</div>
<p>Figure 3, <em>Additional Triples from SenseLab</em>, shows a subset of the triples provided by SenseLab. Following is a discussion of the origins and intents of those triples:</p>
<p>A nucleotide sequence is also described by <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a>. Here, we call this <em>_:nucleo1812</em>.</p>
<table id="senselabTable" class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<thead>
<tr>
<th>subject</th>
<th>predicate</th>
<th>object</th><th class="placeholder"></th><th class="placeholder"></th>
</tr>
</thead>
<tbody>
<tr class="senselab"><td>_:nucleo1812</td> <td>owl:onProperty</td> <td title="http://purl.org/ycmi/senselab/neuron_ontology.owl#has_nucleotide_sequence_described_by">nucleotideSequence:described_by</td> <td>.</td></tr>
<tr class="senselab"><td>_:nucleo1812</td> <td>owl:hasValue</td> <td><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=retrieve&list_uids=1812"><span class="var gene">ncbi_gene:1812</span></a></td> <td>.</td></tr>
</tbody>
</table>
<p>The class <em><span>senselab:DRD1_Gene</span></em> has the same members as the OWL restriction <em>_:nucleo1812</em>.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="senselab"><td>senselab:DRD1_Gene</td> <td>owl:equivalentClass</td> <td>_:nucleo1812</td> <td>.</td></tr>
</tbody>
</table>
<p>This <em>_:protGeneProd_1</em> is defined by being a product of <em>DRD1_Gene</em>.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="senselab"><td>_:protGeneProd_1</td> <td>owl:onProperty</td> <td>senselab:proteinGeneProductOf</td> <td>.</td></tr>
<tr class="senselab"><td>_:protGeneProd_1</td> <td>owl:someValuesFrom</td> <td>senselab:DRD1_Gene</td> <td>. <span class="comment"></span></td></tr>
</tbody>
</table>
<p>Our solution is a subclass of <em>_:protGeneProd_1</em> called <em><span>senselab:D1</span></em>.</p>
<table class="triplesTable">
<col style="width: 10.5em;" />
<col style="width: 17.5em;" />
<col style="width: 10.2em;" />
<col style="width: .1em;" />
<col/>
<tbody>
<tr class="senselab" title="Gene Ontology Database"> <td>senselab:D1</td> <td>rdfs:subClassOf</td> <td>_:protGeneProd_1</td> <td>.</td></tr>
<tr class="senselab"><td>senselab:D1</td> <td>rdfs:label</td> <td>"D1"</td> <td>.</td></tr>
</tbody>
</table>
<h2 id="graphs">9 Named Graphs</h2>
<p>In the Banff Demo, the resulting knowledge base partitioned the assertions into groups called Named Graphs. This process basically consists of associating a distinct URI with a connected graph of triples, and then referring to that graph via the URI. At the time of publication, any query would be expected to include SPARQL GRAPH constraints, e.g.:</p>
<pre class="query" id="qWithGraphs">prefix go: <http://purl.org/obo/owl/GO#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix mesh: <http://purl.org/commons/record/mesh/>
prefix sc: <http://purl.org/science/owl/sciencecommons/>
prefix ro: <http://www.obofoundry.org/ro/ro.owl#>
prefix senselab: <http://purl.org/ycmi/senselab/neuron_ontology.owl#>
prefix obo: <http://purl.org/obo/owl/obo#>
SELECT ?genename ?processname <span class="senselab">?receptor_protein_name</span>
WHERE {
<span class="mesh" id="pyrNeurSigTransduct_named_mesh" title="Medical Subject Headings"> <span class="comment"># PubMeSH includes <span class="var gene">?gene_record</span>s mentioned in <span class="var article">?article</span>s which are identified by pmid in <span class="var">?pubmed_record</span>s .</span>
GRAPH <http://purl.org/commons/hcls/pubmesh> {
?pubmed_record sc:has-as-minor-mesh <span class="identifier" title="pyramidal neurons">mesh:D017966</span> .
?article sc:identified_by_pmid ?pubmed_record .
<span class="var gene">?gene_record</span> sc:describes_gene_or_gene_product_mentioned_by ?article
}</span>
<span class="goa" id="pyrNeurSigTransduct_named_goa" title="Gene Ontology Database"> <span class="comment"># The Gene Ontology asserts that foreach <span class="var protein">?protein</span>, <span class="var protein">?protein</span> ro:has_function [ ro:realized_as <span class="var process">?process</span> ].</span>
GRAPH <http://purl.org/commons/hcls/goa> {
?protein rdfs:subClassOf ?restriction1 .
?restriction1 owl:onProperty ro:has_function .
?restriction1 owl:someValuesFrom ?restriction2 .
?restriction2 owl:onProperty ro:realized_as .
?restriction2 owl:someValuesFrom <span class="var process">?process</span> .
<span class="comment"># Also, foreach <span class="var process">?protein</span>, <span class="var process">?protein</span> has a parent class which is linked by some predicate to <span class="var gene">?gene_record</span>.</span>
?protein rdfs:subClassOf ?protein_superclass .
?protein_superclass owl:equivalentClass ?restriction3 .
?restriction3 owl:onProperty sc:is_protein_gene_product_of_dna_described_by .
?restriction3 owl:hasValue <span class="var gene">?gene_record</span> .
<span class="comment"># Each <span class="var process">?process</span> (that we are interested in) is a subclass or component of the <em>signal transduction</em> process.</span>
GRAPH <http://purl.org/commons/hcls/20070416/classrelations> {
{ <span class="var process">?process</span> obo:part_of <span class="identifier" title="signal transduction">go:GO_0007165</span> }
UNION
{ <span class="var process">?process</span> rdfs:subClassOf <span class="identifier" title="signal transduction">go:GO_0007165</span> }
}
}</span>
<span class="glbl" id="pyrNeurSigTransduct_named_glbl" title="Gene Labels">GRAPH <http://purl.org/commons/hcls/gene> {
<span class="var gene">?gene_record</span> rdfs:label ?genename
}</span>
<span class="plbl" id="pyrNeurSigTransduct_named_plbl" title="Process Labels">GRAPH <http://purl.org/commons/hcls/20070416> {
<span class="var process">?process</span> rdfs:label ?processname
}</span>
<span class="senselab">GRAPH <http://purl.org/ycmi/senselab/neuron_ontology.owl> {
<span class="comment"># Foreach <span class="var">?gene</span>, <span class="var">?gene</span> senselab:has_nucleotide_sequence_described_by <span class="var gene">?gene_record</span> .</span>
?gene owl:equivalentClass ?restriction4 .
?restriction4 owl:onProperty senselab:has_nucleotide_sequence_described_by .
?restriction4 owl:hasValue <span class="var gene">?gene_record</span> .
<span class="comment"># Foreach <span class="var">?receptor_protein</span>, <span class="var">?receptor_protein</span> senselab:proteinGeneProductOf <span class="var">?gene</span> .</span>
?receptor_protein rdfs:subClassOf ?restriction5 .
?restriction5 owl:onProperty senselab:proteinGeneProductOf .
?restriction5 owl:someValuesFrom ?gene .
<span class="comment"># Find the labels of all such <span class="var">?receptor_protein</span>s.</span>
?receptor_protein rdfs:label ?receptor_protein_name
}</span>
}</pre>
<p>The named graphs help with both provenance and scaling. In the current approach, each RDF bundle is imported into its own named graph. This is useful for a number of reasons. First, we know the source of each named graph, so we can control and review which data sources are being accessed by our queries. Additionally, the association of a named graph with a data source serves as data provenance and can also be employed by schemes that exploit knowledge about the data source to assign confidence measures in a model of trust. For example, one of the knowledge base data sources resulted from text mining experiments to find protein <em>associations</em>. Users of the knowledge base can choose to view this evidence of association differently than the <em>associations</em> provided from a protein-protein interaction database. Also, named graphs support scaling by making it possible to update selected parts of the knowledge base, for example when the data source has new information or related ontologies are changed.</p>
<h2 id="nextsteps">10 Opportunities for further development</h2>
<p>The knowledge base was initially designed for the purposes of a live demo. It also provided a basis for early work on the <a href="http://neurocommons.org">Neurocommons</a>, where its development continues. Some design choices were made to favor simplicity and maximal performance, including the use of a central triple store, and the design of the data and queries. Many of the choices were guided by the desire for transparency for a broader audience of biomedical informaticists. Several areas of possible improvement are noted here:</p>
<ul>
<li>Broaden the knowledge base to cover more of the related
domains such as structural chemistry, cells, anatomy,
physiology, behavior, protocols, and reagents.</li>
<li>The sources accessed by a query could eventually be spread
across repositories in separate locations to demonstrate the
ease of integrating distributed data sources with Semantic
Web.</li>
<li>Create dynamic visual interfaces that provide the user with
the means to create and refine a query without requiring
prerequisite knowledge of the data or query language.</li>
</ul>
<p>There are also a number of open issues that should be addressed in future work:</p>
<ul>
<li>What relations should we use to connect a biological entity with artificial entities describing it, e.g. <em>protein records, sequence records, PubMed records</em>?</li>
<li>What is the best way to model evidence so that it can be recorded in data provenance?</li>
<li>How are information resources such as <em>database entry</em> or <em>XML document associated with a database entry</em> best represented in <a href="http://ifomis.org/bfo/">BFO</a>-friendly ontologies?</li>
<li>Mapping across terminologies: MeSH, in particular has terms that are synonymous which many terms in other ontologies, including genes, proteins, GO terms, etc. We made efforts to harmonize the representation in certain cases, such as between Senselab and GO. In other cases, such as MeSH, we have done no harmonization so this should be reviewed for eventual corrections.</li>
</ul>
<h1 id="appendix">Appendix</h1>
<h2 id="rdfbundles"><b>A</b> RDF Sources</h2>
<p>A table of the RDF sources used to create the Knowledge base:</p>
<div style="text-align: center;">
<table style="border-collapse: collapse; border-color: #000000" border="1" cellpadding="5">
<thead>
<tr>
<th>RDF bundle name</th>
<th>Last modified</th><th>Size</th>
<th>Description</th>
<th>RDF conversion by</th>
<th>Terms</th></tr>
</thead>
<tbody>
<tr><td id="aba"><a href="http://purl.org/hcls/2007/kb-sources/aba-2007-08-07.tgz">aba-2007-08-07.tgz</a></td>
<td>22-Sep-2007 </td> <td>51M</td>
<td>SC's extract of <a href="http://www.brain-map.org/"
>Allen Brain Atlas</a> metadata from their Web site.
Web site was read on 26 Feb 2007 or
shortly before</td>
<td>SC</td>
<td><a href="http://www.brain-map.org/pdf/ABATermsOfUse.pdf"
>terms of use</a></td></tr>
<tr><td id="addgene"><a href="http://purl.org/hcls/2007/kb-sources/addgene.ttl">addgene.ttl</a></td>
<td>16-May-2007</td> <td>1.1M</td>
<td><a href="http://www.addgene.org/">Addgene</a>
catalog (tab-delimited file)</td>
<td>SC</td>
<td>provided to Science Commons by Addgene</td></tr>
<tr><td id="bams"><a href="http://purl.org/hcls/2007/kb-sources/bams-from-swanson-98-4-23-07.owl">bams-from-swanson-98-4-23-07.owl</a></td>
<td>23-Apr-2007</td>
<td>5.6M</td>
<td>
<a href="http://brancusi.usc.edu/bkms/bamsxml.html">BAMS</a>
</td>
<td>HCLSIG/NIST</td>
<td>released without contract</td></tr>
<tr><td id="galen"><a href="http://purl.org/hcls/2007/kb-sources/galen.tgz">galen.tgz</a></td>
<td>22-Sep-2007</td> <td>1.9M</td>
<td><a href="http://www.co-ode.org/galen/"
>Galen from co-ode.org</a></td>
<td>-</td>
<td>released without contract </td></tr>
<tr><td id="geneinfo"><a href="http://purl.org/hcls/2007/kb-sources/gene-owl.tgz">gene-owl.tgz</a></td>
<td>08-May-2007</td> <td>7.7M</td>
<td>Select fields from Entrez Gene records</td>
<td>HCLSIG/SC</td> <!-- ? -->
<td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
>NCBI Copyright and Disclaimers</a></td></tr>
<tr><td id="pubmed"><a href="http://purl.org/hcls/2007/kb-sources/gene-pubmed.ttl.tgz">gene-pubmed.ttl.tgz</a></td>
<td>08-May-2007</td> <td>1.5M</td>
<td>Entrez Gene Extract from
<a href="ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz"
>ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz</a></td>
<td>HCLSIG/HP/SC</td>
<td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
>NCBI Copyright and Disclaimers</a></td></tr>
<tr><td id="goa"><a href="http://purl.org/hcls/2007/kb-sources/goa-in-owl.tgz">goa-in-owl.tgz</a></td>
<td>16-May-2007</td> <td>73M</td>
<td id="go">GO annotations from National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI)</td>
<td>HCLSIG/SC</td>
<td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
>NCBI Copyright and Disclaimers</a>;
<a href="http://www.ebi.ac.uk/Information/termsofuse.html"
>EBI terms of use</a></td></tr>
<tr><td id="homologene"><a href="http://purl.org/hcls/2007/kb-sources/homologene.tgz">homologene.tgz</a></td>
<td>16-May-2007</td> <td>626K</td>
<td>Homologene </td>
<td>HCLSIG/SC</td>
<td><a href="http://www.ncbi.nlm.nih.gov/About/disclaimer.html"
>NCBI Copyright and Disclaimers</a></td></tr>
<tr>
<td id="mesh"><!-- a href="http://sw.neurocommons.org/2007/kb-sources/medline/medline-mesh.tgz" -->medline-mesh.tgz<!-- /a --><br />
(<a href="http://apps.nlm.nih.gov/medlineplus/contact/index.cfm">contact Medline</a> for use terms)</td>
<td>16-May-2007</td> <td>758M</td>
<td>List of all associations of MeSH headings to papers indexed by
Medline
extracted from 2007 Medline baseline distribution</td>
<td>HCLSIG/SC</td>
<td><a href="http://www.nlm.nih.gov/databases/license/license_standard.html"
>License Agreement to Lease NLM Databases in Machine-Readable
Form</a> -
see below</td></tr>
<tr>
<td id="medline"><!-- a href="http://sw.neurocommons.org/2007/kb-sources/medline/medline-titles.tgz" -->medline-titles.tgz<!-- /a --><br />
(<a href="http://apps.nlm.nih.gov/medlineplus/contact/index.cfm">contact Medline</a> for use terms)</td>
<td>16-May-2007</td> <td>670M</td>
<td>Extracted from 2007 Medline baseline distribution</td>
<td>HCLSIG/SC</td>
<td>see below </td></tr>
<tr><td id="mqualhead"><a href="http://purl.org/hcls/2007/kb-sources/mesh-qualified-headings.ttl.gz"
>mesh-qualified-headings.ttl.gz</a></td>
<td>30-Apr-2007</td> <td>13M</td>
<td><a href="http://www.nlm.nih.gov/mesh/"
>NLM 2007 MeSH</a> descriptor/qualifier pairs
</td>
<td>HCLSIG/SC</td>
<td><a href="http://www.nlm.nih.gov/mesh/termscon.html"
>MeSH MOU</a> </td></tr>
<tr><td id="mesh-skos"><a href="http://purl.org/hcls/2007/kb-sources/mesh-skos.tgz">mesh-skos.tgz</a></td>
<td>16-May-2007</td> <td>13M</td>
<td><a href="http://www.nlm.nih.gov/mesh/"
>NLM 2007 MeSH</a>
</td>
<td> <a href="http://thesauri.cs.vu.nl/eswc06/"
>van Assem et al/SC</a></td>
<td><a href="http://www.nlm.nih.gov/mesh/termscon.html"
>MeSH MOU</a> </td></tr>
<tr><td id="mesh07-eswc06"><a href="http://purl.org/hcls/2007/kb-sources/mesh07-eswc06.rdfs">mesh07-eswc06.rdfs</a></td>
<td>28-Jun-2007</td> <td>2.2K</td>
<td><a href="http://thesauri.cs.vu.nl/eswc06/"
>van Assem et al's ontology</a>
(used by output of MeSH to SKOS conversion)</td>
<td> <a href="http://thesauri.cs.vu.nl/eswc06/"
>van Assem et al</a></td>
<td>released without contract</td></tr>
<tr><td id="textmining"><a href="http://purl.org/hcls/2007/kb-sources/neurocommons-text-mining.tgz">neurocommons-text-mining.tgz</a></td>
<td>05-May-2007</td> <td>24M</td>
<td><a href="http://sw.neurocommons.org/2007/text-mining.html"
>Neurocommons text mining pilot</a> - extracted from Temis
software applied to 7% of Medline records</td>
<td>SC</td>
<td>released without contract </td></tr>
<tr><td id="obo"><a href="http://purl.org/hcls/2007/kb-sources/obo-all.tgz">obo-all.tgz</a></td>
<td>22-Sep-2007</td> <td>36M</td>
<td><a href="http://www.berkeleybop.org/ontologies/"
>All OBO ontologies</a> </td>
<td>BBOP</td>
<td>released without contract</td></tr>
<tr><td id="obo-in-owl"><a href="http://purl.org/hcls/2007/kb-sources/obo-in-owl.tgz">obo-in-owl.tgz</a></td>
<td>16-May-2007</td> <td>2.6M</td>
<td>selected OBO ontologies, downloaded ~21 April 2007, augmented with
inferred relations</td>
<td>HCLSIG/SC</td>
<td>released without contract </td></tr>
<tr><td id="sciencecommons"><a href="http://purl.org/hcls/2007/kb-sources/sciencecommons.owl">sciencecommons.owl</a></td>
<td>28-Jun-2007</td> <td>19K</td>
<td>A bridging ontology, from <a href="http://sciencecommons.org">Science Commons</a>, importing other ontologies used in the prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology.</td>
<td>HCLSIG/SC</td>
<td>released without contract </td></tr>
<tr><td id="senselab"><a href="http://purl.org/hcls/2007/kb-sources/senselab.tgz">senselab.tgz</a></td>
<td>16-May-2007</td> <td>216K</td>
<td>From <a href="http://neuroweb.med.yale.edu/senselab/"
>Yale Senselab</a> </td>
<td>HCLSIG/Yale</td>
<td>released without contract </td></tr>
</tbody>
</table>
</div>
<p><i>Attributions: <br></br>Science Commons (SC), Berkeley Bioinformatics Open-source Projects(BBOP), Health Care and Life Sciences Interest Group (HCLSIG), National Institute of Standards and Technology (NIST), Hewlett Packard (HP)</i></p>
<h2 id="references"><b>B</b> References</h2>
<dl>
<dt><a name="ref-OWL" id="ref-OWL"></a>[OWL]</dt>
<dd><cite><a href="http://www.w3.org/TR/2004/REC-owl-features-20040210/">OWL Web Ontology Language Overview</a></cite>, <br />
Deborah L. McGuinness and Frank van Harmelen, Editors, <br />
W3C Recommendation, 10 February 2004, <br />
http://www.w3.org/TR/2004/REC-owl-features-20040210/ .<br />
<a href="http://www.w3.org/TR/owl-features/">Latest version</a> available at http://www.w3.org/TR/owl-features/ .</dd>
<dt><a id="ref-RDF" name="ref-RDF">[RDF]</a></dt>
<dd><cite><a href="http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/">Resource Description Framework (RDF) Model and Syntax Specification</a></cite>, <br />
Ora Lassila, Ralph R. Swick, Editors, <br />
World Wide Web Consortium Recommendation, 1999, <br />
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/.<br />
<a href="http://www.w3.org/TR/REC-rdf-syntax/">Latest version</a> available at http://www.w3.org/TR/REC-rdf-syntax/.</dd>
<dt><a name="refCONCEPTS" id="refCONCEPTS">[RDF CONCEPTS]</a></dt>
<dd><cite><a href="http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/">Resource Description Framework (RDF): Concepts and Abstract Syntax</a></cite>, <br />
G. Klyne, J. J. Carroll, Editors, <br />
W3C Recommendation, 10 February 2004, <br />
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ .<br />
<a href="http://www.w3.org/TR/rdf-concepts/" title="Latest version of Resource Description Framework (RDF): Concepts and Abstract Syntax">Latest version</a> available at http://www.w3.org/TR/rdf-concepts/ .</dd>
<dt><a name="ref-SPARQL" id="ref-SPARQL">[SPARQL]</a></dt>
<dd><cite><a href="http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/">SPARQL Query Language for RDF</a></cite>,
A. Seaborne, E. Prud'hommeaux, Editors,
W3C Recommendation, 15 January 2008,
http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/ .
<a href="http://www.w3.org/TR/rdf-sparql-query/" title="Latest version of SPARQL Query Language for RDF">Latest version</a> available at http://www.w3.org/TR/rdf-sparql-query/ .</dd>
<dt><a name="ref-TURTLE" id="ref-TURTLE"></a>[TURTLE]</dt>
<dd><cite><a href="http://www.w3.org/TeamSubmission/turtle/">Turtle - Terse RDF Triple Language</a></cite>, <br />
W3C Team Submission, 14 January 2008, <br />
http://www.w3.org/TeamSubmission/turtle/ .</dd>
<dt>[<a name="ref-SWAN" id="ref-SWAN">SWAN</a>]</dt>
<dd><cite>Alzforum and SWAN: The Present and Future of Scientific Web Communities</cite>, <br />
Clark T and Kinoshita J., <br />
Briefings in Bioinformatics 2007;8:163-171 doi:10.1093/bib/bbm012.</dd>
</dl>
<h2 id="resources"><b>C</b> Additional Resources</h2>
<p>The knowledge base has been installed at several locations. At the time of this writing, these locations provide SPARQL query access, however, it is not guaranteed that the endpoints at these address will persist, or continue to serve the knowledge base described in this note:</p>
<ul>
<li><a href="http://sparql.neurocommons.org:8890/nsparql/">Science Commons</a></li>
<li><a href="http://hcls.deri.ie/hcls_demo.html">DERI Ireland</a></li>
</ul>
<p>Below are a few visual interfaces that make it possible to browse the results of a search on the knowledge base:</p>
<ul>
<li><a href="http://purl.org/hcls/2007/prototypes/GoogleMapAllenBrainAtlas">Prototype of a Google-Maps interface</a> to the Allen Brain Atlas.</li>
<li><a href="http://purl.org/hcls/2007/prototypes/ExhibitGeneExpression.html">Visualization</a> of gene expression data using <a href="http://simile.mit.edu/exhibit/">Exhibit</a></li>
</ul>
<p><a href="http://ycmi.med.yale.edu/entrez_neuron.html">Entrez Neuron</a> was developed by the SenseLab team as a graphical user interface for querying the SenseLab ontologies.</p>
<p>We used the open source edition of the Openlink Virtuoso repository from <a href="http://sourceforge.net/projects/virtuoso/">http://sourceforge.net/projects/virtuoso/</a>.</p>
<p>The actions and scripts that were used to create the knowledge base on a commodity PC have been documented by several HCLSIG members. The necessary instructions and scripts that were used will be listed here as completely as possible:</p>
<ul>
<li>Repository <a href="http://esw.w3.org/topic/HCLS/Banff2007Demo/HCLS/Banff2007Demo/HowToMakeOneForYourself">installation and steps</a> for creating a mirror repository have been documented by Donald Doherty.</li>
<li>All <a href="http://svn.neurocommons.org/svn/trunk/convert/">conversion scripts</a> from Science Commons are available under a
<a href="http://sw.neurocommons.org/2007/LICENSE.txt">BSD license</a>.</li>
<li><a href="http://thesauri.cs.vu.nl/eswc06/">MeSH conversion to SKOS</a> was performed with an approach outlined in a 2006 European Semantic Web Conference paper from Mark van Assem et al.</li>
</ul>
<p>The following resources may be of interest for future work:</p>
<ul>
<li><a href="http://www.ontotext.com/owlim/">OWLIM</a> is a high-performance <a href="http://www.ontotext.com/inference/semantic_repository.html">semantic repository</a>
developed in Java. It is packaged as a Storage and Inference Layer (SAIL) for
the <a href="http://www.openrdf.org/">Sesame</a> RDF database.</li>
<li><a href="http://clarkparsia.com/weblog/2007/10/26/towards-sparql-dl-evaluation-in-pellet/">SPARQL-DL</a></li>
</ul>
<h2 id="acknowledgements"><b>D</b> Acknowledgements</h2>
<p>In memory of our friend and colleague William Bug, Ontological Engineer.</p>
<p>Special thanks to: Alan Ruttenberg (Science Commons) who coordinated the assembly, conversion, and deployment
of the data sets and ontologies and Susie Stephens (Eli Lilly) who coordinated the BioRDF task force. Together they presented the initial version of the
knowledge base at a WWW2007 Banff workshop.</p>
<p><span style="text-decoration: underline;">Contributors:</span></p>
<p>Many contributed to the development, documentation and validation of the knowledge base, as
well as the thinking behind it.</p>
<p>
Mikail Bota (USC) who kindly provided the BAMS database for our use and John Barkley (NIST) converted it to RDF.
Huajun Chen (Zhejiang University), Matthias Samwald (Yale Center for Medical Informatics; DERI Galway; Semantic Web Company), Alan Ruttenberg, and Kei-Hoi Cheung (Yale Center for Medical Informatics) participated in the the SenseLab RDF Conversion.
Members of the SWAN team: Tim Clark, Paolo Ciccarese, June Kinoshita, Gwen Wong, and Elizabeth Wu contributed the SWAN data source.
June Kinoshita, Gwen Wong, Elizabeth Wu, Don Doherty (Brainstage Research Inc.), William Bug (School of Medicine, UCSD), and Alan Ruttenberg worked on the neurogenerative disease use cases.
Ray Hookaway (HP) provided digests from Entrez Gene that were more easily converted to RDF.
Jonathan Rees (Science Commons) did the RDF conversions of Addgene,
Pubmed to Gene, Medline, and MeSH, the Neurocommons text mining pilot,
and compiled the data source and licensing information for this document.
Alan Ruttenberg did the RDF conversion of Entrez
Gene records, GO Annotations, Allen Brain Atlas, Homologene, wrote the
Science Commons ontology.
Alan Ruttenberg and Matthias Samwald wrote the SPARQL queries described in this document.
Chris Mungall (NCBO) wrote the converter that produced the OWL versions of the OBO ontologies and consulted on matters of ontology.
</p><p>
Eric Neumann (Clinical Semantics Group) produced the Exhibit visualization.
Alan Ruttenberg developed the Google Mouse prototype, with contributions
from Mike Travers (CollabRX), Brian Gilman (SciLink), and Tom
Stambaugh (Zeetix).
Don Doherty, Matthias Samwald, Holger Stenzorn (DERI), M. Scott Marshall, and Eric Prud'hommeaux have presented this work at conferences.</p>
<p>Barry Smith (State University of New York at Buffalo, USA) provided
advice on ontology work and led the development of the <a href="http://ifomis.org/bfo">Basic Formal
Ontology</a>, which inspired all ontology work related to the knowledge
base.</p>
<p>William Bug (School of Medicine, UCSD), Michel Dumontier (Carleton
University), and Holger Stenzorn (DERI) reviewed and gave detailed
comments on an initial draft of this note. Alan Ruttenberg Jonathan
Rees and Susie Stephens, reviewed and contributed to several versions
of the document.
Susie Stephens coordinated the BioRDF task force, worked on
presentations of the work, and wrote the introduction to this
document.
M. Scott Marshall (University of Amsterdam) and Eric
Prud’hommeaux (W3C) edited and coordinated the production of
this note. Eric Prud’hommeaux created the figures.
</p>
<p>We would like to offer special thanks for organizations which gave contributions of equipment and service. Through Ray Hookway and Jeannine Crockford, <a href="http://hp.com/">Hewlett Packard</a> donated two machines for a period of six weeks during the demo. Science Commons hosted the the prototype during development and continues to host and develop a knowledge base derived from the prototype as part of the <a href="http://neurocommons.org/page/Main_Page">Neurocommons</a>. <a href="http://www.csail.mit.edu/index.php">MIT CSAIL</a> hosts Science Commons and provided computer and networking infrastructure. </p><p> Kingsley Idehen, Orri Erling, Ivan Mikhailov, Mitko Iliev, Patrick van Kleef and Anton Avramov from <a href="http://www.openlinksw.com/">Openlink Software</a> provided rapid technical support including several custom builds of the <a href="http://www.openlinksw.com/virtuoso/">Virtuoso</a> triple store to address early performance issues, making it possible to develop the prototype on an aggressive schedule. Evren Sirin from <a href="http://clarkparsia.com/">Clark and Parsia</a> provided support for the <a href="http://pellet.owldl.com/">Pellet OWL reasoner</a>.
</p>
<p>In addition to data sources that were incorporated into the prototype, other data that did not make it in was provided by Judith Blake (MGD) an Simon Twigger (RGD), and Colin Knep (Alzforum)</p>
<div class="nav"><a href="http://validator.w3.org/check/referer">
<img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0!" height="31" width="88" /></a>
</div><hr></hr>
</body>
</html>