As part of the e-LICO project, Simon Jupp and I, together with Julie Klein and Joost Schanstra, colleagues from INSERM in Toulouse, have made a kidney and urinary pathway knowledgebase (KUPKB). We have used a mixture of OWL and RDF based technologies to create a resource for KUP biologists to query across many levels from gross KUP anatomy through cells to cell components, gene products and genes. It also includes diseases and experiments (metabolomic, transcriptomic, proteomic) on various aspects of the KUP field. Most importantly, we haven’t left our KUP biologists to use SPRQL in order to exploit the KUPKB, instead simon Jupp has made the iKUP browser, a GWT Web application for browsing and querying the kUPKB. We’ve written up our KUPKB work in the Journal of biomedical Semantics.
Our JBMS paper describes our process in much more detail, but the outline is:
- Make a KUP ontology (KUPO) by gluing together various extant ontologies that cover the domain – Gross anatomy; cells; gene products; attributes of gene products; disease; descriptions of investigations and so on;
- Join the ontologies together in order to sufficiently describe our domain entities to ask the questions we want to ask;
- Add some bits that are missing;
- Populate the schema formed by the KUPO with lots of “instance” data to form a KUP knowledgebase, the KUPKB.
We used OBO ontologies, especially the GO and a mouse anatomy, the cell ontology and so on. We took portions of Uniprot and Bio2RDF. We also added in lots of experimental data to make the KUPKB.
We used lots of off-the-shelf ontologies to make the KUPKB and this is good. In places, however, we made our own and avoided some on-going efforts there are on-going efforst to standardise experiments, such as microaray, in rdf, but we couldn’t wait. We just wanted very simple questions to be answered. Much time in standards is devoted to edge cases and the time scales involved don’t match the project in question. Also, we are content to gloss over some of the niceties of the experiment as our users want just enough information to go and read the paper properly. We don’t actually need everything about the experiment in the computational knowledge at the moment, but this isn’t to say we won’t in the future. Of course, we’d happy to have it eventually, as it will almost certainly be useful at some point. This is, in essence, the case for application ontologies; they do what is necessary for an application setting, rather than being a reference for all to use and “refer” to for a normative view. This is not to say, however, that a reference ontology cannot be used in an application setting – though one might not use all of such an ontology.
The KUPKB is small for an RDF store, only around 20 million triples, and the RDF store has worked OK for us. Having to upload the whole set of triples in order to delete something is a pain. The advent of SPRQL has made querying easier, with the inclusion of aggregation. Ultimately, we’d like to keep the whole lot as OWL and use automated reasoners, but at the moment they are not really up to it in this setting. We use automated reasoners on the KUPO, to put in the inferred subsumptions and to “check” it out; we’ve also reasoned over smaller portions of the KUPKB, like expereiments, to provide some reasonable queries. The iKUP application does use the OWL API to reason over a bit of the KUPO to help us out with some queries, but we can’t do it all.
the KUPKB is really a bespoke RDF KB; it hasn’t gone for size, which seems to be an aim of a lot of work at the moent. Instead, we’ve pulled together a bespoke set of information, as RDF, in order to meet a particular application need. In doing so, we’ve used portions of the large RDF publications, but not used them as the large publications. Also, as well as publishing some RDF, we’ve been determined to have it consumed as well. This means providing some kind of interaction with the KUPKB that isn’t just a SPRQL end point.
One of the really nice things that Simon has done is to create a front end to the KUPKB using GWT. This is the iKUP browser, and forms the access point to the KUP)KB. It provides, at the moment, a gene centric way to browse the KUPKB and retrieve experiments on certain genes. For instance, one can browse anatomy, down to cells, to isolate genes/gene products that are involved in a particular process in a particular disease and then retrieve experiments that involve those genes or gene products. So far, the response from the KUP community to the iKUP browser has been very positive and it will expand to offer various ways of browsing. Some hypotheses found via iKUP are being tested in the laboratory, which for me is very exciting.
Whilst the bits and pieces of the iKUP browsers are generic to almost any presentation and interactionw with an RDF KB, I don’t really believe in generic tools for presenting and interacting with such KB. I believe in generic tool bits, but not in “one size fits all” for this kind of interaction. There may be a place for such generic tool kits, but tailoring to a specific situation will always give the better results. What we need are high-level tool kits for making such front ends. What Simon has done with GWT was reasonably straight forward and didn’t take too much time. Simon has also used GWT for the Logical Gene Ontology Annotations (GOAL) browser, a UI for querying Gene Ontology Annotations with OWL, again with similar straightforwardness.