Archive for July, 2011

A Simple Knowledge Organisation tool (SKOT)

July 18, 2011

Protege is a complex tool. By deafult it offers a user all that it is possible to do with OWL (relationships, quantifiers, class expressions, and so on) and there can be choices at any point. Often Protege is too complex, especially at an early stage of authoring an ontology where one might simply wish to “sketch” something out; perhaps for migrating to a more sophisticated form later. Andrew gibson, when he worked in our group, wanted a simple tool for “sketching” an ontology. Such a tool would be based on a simple “blob and line” model, corresponding to classes, subclass axioms and existential restrictions. Matt Horridge developed a tool called Montage from Andrew’s specification; it was only ever a prototype and never saw the light of day outside that particular office.

Inspired by this, I offered a third year undergraduate project to develop a “simple knowledge organisation tool” — SKOT. A student called Mark Jordan took this project on and has done a good job, given the restrictions of time involved in a University of Manchester Computer Science third year undergraduate project.

Mark developed a tool called SKOT – the Simple Knowledge Organisation Tool. It is an open source project under an LGPL licence and the SKOT code is available on source forge.

The picture shows SKOT at a point just before the user is about to export the sketch into an OWL file. There is a “term list”, where words or terms that might form the blobs or classes in the ontology are “stored”. The example is the traditional “university” modelling example, with the terms “University”, “Person”, “Student”, “Undergraduate”, “Mark”, “Postgraduate”, “Teaching Assistant”, “Lecturer”, “Lecture” and “TA Lecture”. These terms can be selected and dragged on to the canvass, where they become blogs that represent classes or individuals (for the term “Mark” in the list above). %Relationships are created by selecting a blob, choosing to create a relationship, moving to the “target” and then finishing the interaction. Relationship sub- super-class or of other types as specified by the users. To do this, SKOT takes the following approach:

  1. There is a canvas on which blob and line pictures can be drawn; blobs are classes and lines are relationships.
  2. New blobs and lines can be created on the canvas.
  3. Words or terms can be dragged from a word list onto the canvas, where they form new blobs.
  4. The diagram can be exported as OWL through the OWL API.
  5. SKOT projects can be saved and re-loaded.
  6. It is possible to load in the existential graph portion of an existing OWL ontology, extend it in SKOT and re-save it in the original file.

There’s a lot of user interface work involved in SKOT. groups of blobs can be selected and each member of the group forms a subclass relationship to a selected superclass. The layout is in the hands of the user.

James Eales used SKOT to make a toy ontology of fish, starting by just typing in a load of words about fish. This is fine, but hooking SKOT up to an automatic term recognition tool, as well as hand-typing, would be good. Once in the word list, they are ready to drag into the window where the “ontology” can be sketched.

SKOT with a list of words about fish.

Next, James moved terms from the list on to SKOT’s sketch canvass and made a basic hierarchy of terms. Note that classes and instances are differentiated. Other types of relationship than the subclass are possible, but not used here.

The words now arranged in a simple tree.

This was then saved into an OWL file, imported into Protege and shown using OWLViz.

Note that the export from SKOT to OWL appears to have gone wrong – Cod is now a warm water fish, where in SKOT it was a cold water fish; I’m sure this is easily remedied.

Mark did a basic evaluation, getting some people to install and use SKOT to draft some ontologies. Two of these made ontologies — one an onmtology of guitars and one an ontology of fish. All the users were basically impressed, but also gave long lists of things to do — one user, for example, just found it difficult to work out what to do on start up; however, once he got going, all was basically OK.

SKOT is currently a stand-alone application and it really should be a Protege plugin. there’s also a lot more to do on SKOT, both little things and big things. On the list of little things are to fix various labels on the UI to make better sense. On the larger side of things, we need:

  1. It all connected to an automatic term recognition tool, especially with a PDF to text converter;
  2. We need to have regular expression searches over the term lists and editing of the list;
  3. We need to be able to save the list and import into the list from various sources;
  4. One of the main issues with SKOT is the scalability of the drawing of the blobs and lines. Some zooming would probably be useful. Montage had a facility to “fold away” portions of the sketch that wern’t currently the focus of attention. Andrew gibson had a nice design for how to deal with many of these issues, but those are not here, but some are in the Montage prototype and may see the light of day eventually. There are lots of UI tricks to be played here, but I also suspect that the utility of such a tool lies in the small scale aspect and that such things are inherently very difficult to scale.

Mark’s report on SKOT and how it was built is available.

A Kidney and Urinary Pathway Knowledgebase

July 17, 2011

As part of the e-LICO project, Simon Jupp and I, together with Julie Klein and Joost Schanstra, colleagues from INSERM in Toulouse, have made a kidney and urinary pathway knowledgebase (KUPKB). We have used a mixture of OWL and RDF based technologies to create a resource for KUP biologists to query across many levels from gross KUP anatomy through cells to cell components, gene products and genes. It also includes diseases and experiments (metabolomic, transcriptomic, proteomic) on various aspects of the KUP field. Most importantly, we haven’t left our KUP biologists to use SPRQL in order to exploit the KUPKB, instead simon Jupp has made the iKUP browser, a GWT Web application for browsing and querying the kUPKB. We’ve written up our KUPKB work in the Journal of biomedical Semantics.

Our JBMS paper describes our process in much more detail, but the outline is:

  1. Make a KUP ontology (KUPO) by gluing together various extant ontologies that cover the domain – Gross anatomy; cells; gene products; attributes of gene products; disease; descriptions of investigations and so on;
  2. Join the ontologies together in order to sufficiently describe our domain entities to ask the questions we want to ask;
  3. Add some bits that are missing;
  4. Populate the schema formed by the KUPO with lots of “instance” data to form a KUP knowledgebase, the KUPKB.

We used OBO ontologies, especially the GO and a mouse anatomy, the cell ontology and so on. We took portions of Uniprot and Bio2RDF. We also added in lots of experimental data to make the KUPKB.

We used lots of off-the-shelf ontologies to make the KUPKB and this is good. In places, however, we made our own and avoided some on-going efforts there are on-going efforst to standardise experiments, such as microaray, in rdf, but we couldn’t wait. We just wanted very simple questions to be answered. Much time in standards is devoted to edge cases and the time scales involved don’t match the project in question. Also, we are content to gloss over some of the niceties of the experiment as our users want just enough information to go and read the paper properly. We don’t actually need everything about the experiment in the computational knowledge at the moment, but this isn’t to say we won’t in the future. Of course, we’d happy to have it eventually, as it will almost certainly be useful at some point. This is, in essence, the case for application ontologies; they do what is necessary for an application setting, rather than being a reference for all to use and “refer” to for a normative view. This is not to say, however, that a reference ontology cannot be used in an application setting – though one might not use all of such an ontology.

The KUPKB is small for an RDF store, only around 20 million triples, and the RDF store has worked OK for us. Having to upload the whole set of triples in order to delete something is a pain. The advent of SPRQL has made querying easier, with the inclusion of aggregation. Ultimately, we’d like to keep the whole lot as OWL and use automated reasoners, but at the moment they are not really up to it in this setting. We use automated reasoners on the KUPO, to put in the inferred subsumptions and to “check” it out; we’ve also reasoned over smaller portions of the KUPKB, like expereiments, to provide some reasonable queries. The iKUP application does use the OWL API to reason over a bit of the KUPO to help us out with some queries, but we can’t do it all.

the KUPKB is really a bespoke RDF KB; it hasn’t gone for size, which seems to be an aim of a lot of work at the moent. Instead, we’ve pulled together a bespoke set of information, as RDF, in order to meet a particular application need. In doing so, we’ve used portions of the large RDF publications, but not used them as the large publications. Also, as well as publishing some RDF, we’ve been determined to have it consumed as well. This means providing some kind of interaction with the KUPKB that isn’t just a SPRQL end point.

One of the really nice things that Simon has done is to create a front end to the KUPKB using GWT. This is the iKUP browser, and forms the access point to the KUP)KB. It provides, at the moment, a gene centric way to browse the KUPKB and retrieve experiments on certain genes. For instance, one can browse anatomy, down to cells, to isolate genes/gene products that are involved in a particular process in a particular disease and then retrieve experiments that involve those genes or gene products. So far, the response from the KUP community to the iKUP browser has been very positive and it will expand to offer various ways of browsing. Some hypotheses found via iKUP are being tested in the laboratory, which for me is very exciting.

Whilst the bits and pieces of the iKUP browsers are generic to almost any presentation and interactionw with an RDF KB, I don’t really believe in generic tools for presenting and interacting with such KB. I believe in generic tool bits, but not in “one size fits all” for this kind of interaction. There may be a place for such generic tool kits, but tailoring to a specific situation will always give the better results. What we need are high-level tool kits for making such front ends. What Simon has done with GWT was reasonably straight forward and didn’t take too much time. Simon has also used GWT for the Logical Gene Ontology Annotations (GOAL) browser, a UI for querying Gene Ontology Annotations with OWL, again with similar straightforwardness.

A Travel Ontology

July 12, 2011

I have a plan for a series of OWL and ontology tutorials along the following lines:

  1. An introduction to the OWL language; this is the pizza tutorial;
  2. An advanced OWL language tutorial; this is the family history knowledge base tutorial.
  3. A tutorial that explores ontological modelling much more than the language and reasoning orientated tutorials that are the pizza and FHKB tutorials.

The trick with these tutorials is to find a modelling example that first covers the features one wishes to address and second uses an example that is accessible to all the potential attendees. This second point is often tricky and one must avoid trivialising a subject area too much, something that happens too easily in a subject like biology where there are few examples that are accessible to all biologists, let alone those outside that domain. Many years ago, when we were first talking about the ontological modelling version of the tutorial, Andrew Gibson, who then worked in our group, suggested the field of travel as a suitable tutorial subject area. It is possible to make a Travel Ontology that covers any modelling situation one would wish to cover in such a tutorial and the topic is something that everyone can appreciate; we all either do or know about undertaking the process of moving from one place to another for a variety of purposes.

With travel as our field of interest, we can cover the major ontological distinctions:

  • Processes and the things that participate in them: Holidays, journeys, people, geographical features and so on;
  • Physical and abstract things: lumps of land and the countries or administrative regions that occupy them;
  • Material and immaterial things: buildings and spaces in between them; vehicles and the plans for journeys made in those vehicles;
  • Qualities of things: Long journeys, fast vehicles, and so on;
  • Lots of nice things like rivers, what they flow in to etc., boundaries between countries, notional midpoints on rivers etc. that form boundaries, seas and oceans, ports, docks, and so on;
  • Roles and/or functions of things involved in travel and so on.

In doing so, all of this and much more will touch on many of OWL 2’s features, from classes to individuals, property hierarchies, property chains, property characteristics, and so on. In such an ontology we can show off the language, modelling and the use of automated reasoning.

By bringing in the kinds of ontological distinction outlined above and the goals of the tutorial, we immediately suggest upper level ontologies. My experience with such things has been to find the decision making about what goes where under an upper level all rather hard. I have little confidence that with the current material we have at hand for this kind of modelling that, given an arbitrary set of classes, independant groups or individuals will come up with the same placement of classes in an ontology. I would like to try and fix this – for at least one upper level ontology so that one doesn’t have to spend years teaching oneself bits of philosophy in order to use an upper level ontology. Of course, effort is inevitably involved, but it should be at the level of engineering (ideally at the level of choosing patterns), rather than consulting a guru.

Over the past several years, I have made a Travel Ontology and put it under an upper level ontology. I have used, up until now, a cut down version of Alan Rector’s Simple Upper Bio, but with the biomedical bits chopped out. So far, I’ve put the following things into the Travel Ontology:

  • Land masses; I’ve treated everything as an island. Africa eurasia is one big island surrounded by water. Everything else is an island too. I haven’t started thinking about the other end of the scale – does a rock pointing out of some water count as an island? to a large extent I don’t care.
  • I’ve done a brief sketch of continents and oceanic crust. I made this distinction so I could talk about continental and oceanic islands. I’m not convinced that I need to in the context of travel, but that’s often a useful kind of talking point in a tutorial. I’ll also want to be able to describe, for instance, volcanic islands.
  • I talk about bodies of water: Oceans; seas, lakes, rivers, and so on. Seas are parts of oceans that are next to land (here, I regard things like the Dead Sea as salt water lakes, not seas, taking the view that the indiscriminate and wayward naming of things by humans need not be dogmaticlaly ahdhered to).
  • Countries, and other administrative regions.
  • Vehicles used in travel; services that offer travel between places and the instances of services. that is, the Fareham Portsmouth train service and the 8:43 am instance of that service.
  • Types of country – monarchies, dictatorships, republics, and so on.
  • Boundaries between the above;
  • I’ll want to be able to capture longitude and lattitude, so, for instance, I can describe tropical islands, countries and tropical holidays.
  • Buildings, settlements, roads, vehicles, people, their roels, and so on.
  • Measurements such as hight of mountains, lengths of road, etc. At the moment I’vce done a shameful hack of lots of data properties with numbers at the end and no units.
  • The TBox is relatively small, but there are some 2,000 individuals – not many for what is, in effect, a model of the whole world. It is all a bit scruffy, but I invoke the “work in progress” excuse…

I want to be able to describe things like a walking holiday in Italy, in the first two weeks of June, during which I visited several historical monuments and sent a postcard to my Mum of the forum in Rome. I want to describe a holiday on a narrow boat on a series of canals in England during the summer, passing through Birmingham and ending up on the regent’s canal in London. I haven’t done time points and durations yet (time is always such a pain), but it will be a vital part of the Travel Ontology.

I haven’t done much with the Travel Ontology for a while, but I want to start again and finish something off to the level to which I can do a tutorial about this kind of modelling. Alan Rector’s upper level ontology is no longer maintained as it has sort of been overtaken by Stefan SCHULZ’s BioTop. So, I really want a replacement and I’ve gone with the Semantic Science Integration Ontology (SSIO). The main reasons for this choice are: I like the words used (avoidance of words like “continuant” and “occurrant”); it is young and there is discussion on its formation; I know the people and know that sensible discussion is possible (there is an open world assumption involved in my making these statements). I also like the attribution policy put in place for SSIO.

In moving over to the SSIO, I want to do an experiment. As I place the major classes underneath the SSIO, I want to discuss the choice of where it goes on the SSIO mailing list. From the discussion and the questions I’ve had to ask myself, I want to create a guide to using the SSIO; eventually I’d like to be able to use this in an ontology tutorial and refine it to the level where one can reasonably reliably get consistent placement of classes and choice of relationships. I want attendees of such a tutorial to do exercises in which protocols are followed to “guide” placement of classes and choices of relationship. This guideance will involve “how to think about an entity” – what questions to ask myself about an entity such that I make reasoanbly sensible decisions – and be able to record those decisions.

The work wil go something like this:

  1. Make a list of the main classes in the Travel Ontology;
  2. Take each in turn and discuss where it is placed in the SSIO;
  3. Record discussions and distill the questions I’ve had to ask to make a choice and render them in a form usable by others;
  4. Migrate the relationships to SSIO form;
  5. Fiddle around to put in place some of SSIO’s design patterns and perhaps create others.
  6. Visit various modelling choices of things in the Travel Ontology.

This will be hard work, but can be done piecemeal and may not have to involve only me! The result I want is an ontology placed under an ontology that is used by more than one group; a set of documentation on how it is done; a set of nice modelling examples to use in another ontology tutorial.