The TAMBIS Ontology

The TAMBIS ontology (TaO) was the first ontology that I had any hand in building, though it was really only the smallest of hands — Pat Baker did the overwhelming amount of the work on TaO; I am, however, forever grateful that TAMBIS and Pat’s work gave me access to this field of research. The TaO was built for the Transparent Access to Multiple Bioinformatics Information Sources project. The TaO describes some basic molecular biology and its associated bioinformatics — so proteins and protein database records.

The TaO drove a query interface. the TaO was represented in Grail, a knowledge representation LANGUAGE that looks like a description logic that more or less corresponds to the EL+ profile of OWL 2. The TaO provided the building blocks of the domain and the query interface allowed a user to build up description of that which they wished to retrieve. For example, a protein has parts that are motifs; motifs indicate a function, so a description can be built of proteins that have motifs indicating particular functions.

The contents of the TaO were mapped to values within the bioinformatics resources accessible within TAMBIS. A query rewriting engine took the conceptual query and re-wrote it in a form (essentially a little programme) that would retrieve the instances from the bioinformatics resources that "filled" the query class.

TAMBIS worked, but was very much ahead of its time in terms of technology and its ability to support the goals of TAMBIS. It would work much better today than it did in the late 1990s. Some of my obsevations on this are:

  • Programmatic access to the bioinformatics resources was terrible. we used screen-scraping to retrieve data. this was, of course, very fragile. Today, many bioinformatics resources are available as Web services and this avoids some of these problems.
  • Querying of the resources was poor and not just because of the lack of programmatic access. Keyword searches over fields in records was possible, but a proper query language was lacking. SRS came the closest to providing such a facility and we did use SRS behind some of our resource wrappers.
  • In TAMBIS we had to try and work out the conceptualisation implicit within the resources we were trying to access. These conceptualisations skewed the TaO in many respects — the resources were too explicit in the TaO itself. Since the TaO was developed the terminology within many resources has become much more accessible through the efforts of the Gene Ontology and related efforts in the OBO consortium. A common conceptualisation across resources makes the life of the mapper and queerier much easier.
  • Related to this would be the building of the TaO itself. Today’s TaO would be ontologies such as the Gene Ontology, Sequence Ontology and so on, joined up in a description logic, compositional fashion such that a user could describe a protein with a given function, that took part in a particular process in a given cellular location and had a particular sequence feature. This would enable nice descriptions of instances to be made and, what is more important, the resources marked up with GO etc. could actually answer the question. TAMBIS could ask nice questions, but the underlying resource’s ability to answer them was lacking.
  • On top of all this is bioinformatics’ inability to have a common identification scheme for the entities that it describes. It is very difficult to know if a protein described in one record is the same as one described in another database’s record.(Indeed, what is it that a protein record describes: A single protein, a pool of proteins; a consensus over many varients of the same kind of protein; and so on. the same goes for other kinds of entity.) This makes it almost impossible to adequately integrate resources with any confidence. Much of a bioinformatician’s time is taken in doing mappings between resources and, of course, no one trusts any one else’s mappings, so they are continually re-done. It sometimes seems that there is a danger that if this fundamental blem were to be fixed, then much of a bioinformatician’s craft knowledge would disappear and some science might ensue. Of course, bioinformaticians squeeze out some marvellous answers from the resources we have, but it often seems that bioinformaticians are trying to run encumbered by a huge weight; the weight of issues of identity. It should be added, of course, that a solution is non-trivial, though I think it more of a social problem than anything else.

When the TAO was conceptualised we conflated aspects of biology and bioinformatics. we described proteins as having molecular weights and accession numbers and so on. This is not a good representation of the world of information that we modelled. More properly we should have separately described proteins and their representation. this would have meant a cleaner separation of, for instance, the many representations of a protein and the actual protein. One has to ask, of course, whether this would have made any practical difference other than to have added complexity to query formulation. The myGrid ontology used in the annotation of Web services within Taverna etc. originally had such a separation, but this was for creation of the terminology, not for querying. this clean separation has now gone in favour of something much lightger weight — more akin to a SKOS representation. SKOS is meant for artefacts designed for indexing and information retrieval and this is, after all, what TAMBIS did, so such a representation might well have had advantages.

Towards the end of the TAMBIS project the Ontology Inference Layer (one of the forehhrunners of OWL) appeared. I did a little experiment in re-working the TaO in OIL. This was my first real attempt at ontology modelling and I did it without access to any ontology editing tools at all (at that point we had the editor OILEd, but Java was at that time inaccessible to my screenreader) and reasoning tools. My only way of checking what I’d done was to take my ontology along to OILEd’s developer (Sean Bechhofer) and ask him to classify it — something I only did a few times as I didn’t want to cause too much hassle. the consequence of this is that I’d add scores of acxioms without any resort to a reasoner (a bad mistake). This new version was about two week’s effort to just play around with OIL and to experiment with re-conceptualising the TaO. This history is why this particular ontology has had a longevity and an ewmbarrassment factor (for me) that was very much unintended.

In short this new TaO was pretty terrible. I was only just learning the ins and outs of modelling with universal and existential quantification (GRAIL had only existential quantification); so there are all kinds of mistakes from that direction. there are also a fair number of simple howlers. As I didn’t have access to a reasoner, I had no idea (apart from inspection) of when my axioms made an unsatisfiable ontology — it turned out that there were all sorts of contradictions in the ontology. There’s also some pretty horrible gaffs.

The reason that this ontology has somewhat haunted me is that it was used as a test for the development of explanations for DL reasoners. The ability to provide explanations of unsatisfiabilities and inconsistencies in DL ontologies [1,2] is a great boon, but that this dreadful ontology of mine (unfortunately labelled as the "TAMBIS Ontology" the TAMBIS ontology is the one Pat Baker wrote) is a little unfortunate. Anyway, the 147 unsatisfiable classes can be tracked down to only a few causes and this is some comfort [1,2].
this ontology of mine should be the "Not TAMBIS" Ontology.

All the various forms of the TaO and the "play" versions of the ontology are available:

  • The big version of the TaO in GRAIL.
  • A smaller version of the TaO in GRAIL that was actually fully supported in the TAMBIS software.
  • New TAMBIS in OIL.
  • New TAMBIS in OWL.

References

  • [1] Bijan Parsia, Evren Sirin, Aditya Kalyanpur: Debugging OWL ontologies. WWW 2005: 633-640
  • [2] Aditya Kalyanpur, Bijan Parsia, Evren Sirin, James A. Hendler: Debugging unsatisfiable classes in OWL ontologies. J. Web Sem. 3(4): 268-293 (2005)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: