Which is used most for biomedical ontologies: OBO Format or OWL?

I was reading robert Hoehndorf et al‘s paper Relations as patterns: bridging the gap between OBO and OWL and was rather struck by the opening sentence:

“The OBO Flatfile Format [1] is used to represent most biomedical ontologies, among them the Gene Ontology (GO) [2] and most of the OBO Foundry ontologies [3].”

the bit “The OBO Flatfile Format [1] is used to represent most biomedical ontologies,…” struck me as unlikely (at least on face value). So, I had a look. Using the RESTful API to BioPortal, Nico Matentzoglu (one of our group’s Ph.D. students) downloaded all the publically available ontologies (the API lets you get both public and private, but we didn’t use the private ones). We got a total of 347 ontologies that used the representations as follows:

OBO 114
OWL 161
OWL-DL 32
OWL-FULL 9
PROTEGE frames 2
RRF 26
UMLS-RELA 3

So, OBO Format has 114 and OWL (the different flavours of OWL are apparently different ontologies) has 202. I don’t need to do the stats – there are more OWL ontologies than OBO Format ontologies. I’m assuming that BioPortal is a representative sample of biomedical ontologies. With this assumption, Rob’s statement is wrong.

Can we change the statement to “the OBO Format is the representation of the most widely used biomedical ontologies”? The Gene Ontology (and other OBO Format ontologies) have a large corpus of annotations; I have no numbers across the board, but GO has 3898904 annotations (number of filtered annotations from the Gene Ontology Annotations page on 15 May 2013) and is also widely used in gene over expression analysis etc. This is a big number – and other OBO format ontologies are used for annotations too, though to what extent I don’t yet know.

If we look at some OWL ontologies like SNOMED and NCIT (we can probably argue about whether SNOMED is natively OWL, but we’ll go with it for now), we also probably have some big numbers. The nature of SNOMED annotations of health records means it may be difficult to get the numbers and even though the “mandate” for use and actual use may be different, I suspect the numbers will stil be quite big. Anyway, let’s make something up – UK health records are annotated (I think with Reid codes which are now part of SNOMED) and there are 60 million UK people and, assuming 1 code per person’s record, we’ve got 60 million annotations – quite big. The experimental Factor Ontology (EFO) is more bio and is used for some 636k anotations in the Gene Expression Atlas (thanks to James Malone for the numbers) – not GO sized, but getting on for a biggish number.

So, in terms of numbers OWL ontologies are widely used.

What happens if we take the medical ones out? Then the numbers will start to look much less healthy for OWL ontologies. Nevertheless, we’ve got a lot of OWL ontologies and fewer OBO ontologies and, even if we have fewer OWL ontologies actually used, we’ve got a lot of use of biomedical OWL ontologies. taking the medical ones out of the OWL set, I suspect we’ve still got more OWL bio-ontologies, but the OBO ones are used more widely in bio (and the “important” ones are OBO). Taking a look at BioPortal’s OWL ontologies, one gets the suspicion that a lot of them are “toy” ontologies (I’m sure some OBO Format ontologies come into this category too). this will reduce the number of OWL ontologies, but I don’t want to do the categorisation.

Despite this blog having deteriorated from firm numbers to speculation, I think we could go with an opening sentence for Rob’s paper of

“At the time of writing, most of the widely used bio-ontologies use the OBO Format….”

or

At the time of writing, most of the important bio-ontologies that are extensively used for description of data use the OBO format representation….

Advertisements

3 Responses to “Which is used most for biomedical ontologies: OBO Format or OWL?”

  1. David OS Says:

    But if OBO is just an eccentric dialect of OWL with reduced expressiveness, is this distinction meaningful?

    I consider Drosophila anatomy [1], development [1] and phenotype [2] ontologies to be OWL ontologies whose master version lives in OBO for a mixture of pragmatic and legacy reasons. Conversion is under continuous integration and they are constantly tested using OWL reasoners and browsed and queried in OWL during the dev process. One of the major uses of the anatomy ontology is for live DL queries via elk on VFB.

    Much of this also applies to GO.

    [1] https://sourceforge.net/p/fbbtdv
    [2] https://sourceforge.net/p/fbcv

  2. Robert Stevens Says:

    I toyed with the idea about puttitng in this point about hybrid modes of work – and, yes, it does sort of make the distinction in the original sentence irelevant these days. I don’t know how much the data would be different in 2010, when the paper was written, but I suspect we’d not see a tremendous difference (I may take a look to see if the data is available).

    the next sentence I may look at is along the lines of “recent years have seen an increase in the numbers of biomedical ontologies”.
    this

  3. Chris Mungall Says:

    I agree with David’s points.

    There are maybe 3 different questions to be answered for an ontology

    1. How is the ontology developed?
    2. Which version of the ontology is arbitrarily considered to be the “primary” version
    3. How is the ontology consumed in applications?

    Q1 is of interest to us ontology geeks. As David says, some ontologies are shifting to some hybrid mode of development. E.g. GO has supplementary axioms developed entirely in OWL, most of the internal GO infrastructure consumes OWL (sometimes translating from OBO). Unfortunately there are still many obstacles to switching entirely to OWL for development.

    Q2 is really quite arbitrary and uninteresting. BioPortal doesn’t allow the choice of obo or owl downloads. But in fact all ontologies developed in obo are available in OWL from a standard obolibrary purl, produced by an Oort job run either centrally or by the ontology developers.

    Q3 is of interest to me. For GO I believe the vast majority of applications used by end-users consumes obo-format (there are even a few that use the awful dag format that preceded obo). This can be quite limiting. But we are seeing more applications and web interfaces – like VFB – that consume OWL and are able to leverage standard OWL tools and reasoners.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: