Archive for August, 2010

Automatic Annotation of Qualities

August 24, 2010

this is a small example ontology that I made that demonstrates (using trivial examples) the use of an OWL ontology with an automated reasoner to do automatic annotation of qualities for phenotype data in mice. It uses a straight-forward bit of OWL semantics to do this annotation. The ontology for annotation of qualities is available to use.

In this ontology I describe:

  1. A mouse;
  2. A mouse’s parts;
  3. Measurements;
  4. Qualities associated with a mouse or its parts;
  5. Units for the measurements (not yet done; it’s just an example).

A mouse has parts such as whiskers and tail. Measurements have a value (a data property) and a unit. We can then have measurements for mice and their parts. Finally, we have qualities such as &short" or "long" for mice and their parts.

I then create an individual mouse. One particular individual looks like:

Individual: mouse01

    Types: [in phenotype.owl]
        hasPart some 
             and (hasMeasurement some 
                 and (hasValue value 25.0))))

This describes a mouse that has a tail. The tail has a measurement with a value of 25.0 mm. I’ve not put the units in, but this is really just a trivial example.

We can then write a defined class for a "short tailed mouse" like this:

Class: ShortTailedMouse

Class: ShortTailedMouse

    EquivalentTo: [in phenotype.owl]
         and (hasPart some 
             and (hasMeasurement some 
                 and (hasValue some double[<= 30.0])))))
    SubClassOf: [in phenotype.owl]
        hasPart some 
             and (hasLength some Short))

This says that a short tailed mouse is any mouse that, among other things, has a tail that has a measurement of less than 30.0 (millimetres). This is enough to recognise an individual mouse as a member of this class. The trick is with the axiom:

    SubClassOf: [in phenotype.owl]
        hasPart some 
             and (hasLength some Short))

that says that such a mouse necessarily has a tail that

hasLength some Short

. Having shortness is something that must be true of a mouse tail of this class; we don’t need to assert that such a mouse hasLength short to make it a member of this class; it is just something we have said is true of all members of this class — once we know a mouse is in this class we also know that its tail is short. Wwe’ve already said enough to recognise the mouse as a short tailed mouse; the necessary conditions now "come for free". Thus we recognise that this individual mouse is a short tailed mouse based on the measurement and the quality "short" is automatically a feature of being a member of that class. If we created a pipeline that took phenotyping data, created the OWL individuals for those data, then we could automatically assign phenotype qualities to those individuals and put the annotations back into the database.

There are also defined classes for long, normal tailed mice and some to do whith whiskeriness each of which assigns the appropriate (in this ontology) quality. These are obviously trivial examples. All this needsd working up with some much better examples. I also want to explore several things:

  1. Proper phenotyping SOPS such as those in Empress;
  2. Look at the limits of what can be done this way and what needs to be left to a human phenotyper;
  3. Where and how to deal with the statistical part of phenotyping;
  4. Inserting an ontology such as the Phenotype and Trait ontology (Pato);
  5. connecting it all up to genes.
  6. Doing units sensibly.

An Ontology of Amino Acids

August 9, 2010

My friend Phil Lord and I made this ontology of amino acids many years ago as a demonstration of various widgets in the OWL plugin for Protege 3. It took us about twenty minutes to make the whole ontology. It served as a demonstration of a move towards a sort of mass production of axiomatically rich ontologies. This was its only real purpose, but it does have some quite nice features and raises a few interesting issues.

Biochemists classify amino acids in various ways. taylor provides a much used classification of amino acids that visualises the types of amino acids in a Venn diagram that shows the intersecting sets of amino acid types. One use of this kind of classification is to capture the substitutions between amino acids within proteins during change that will preserve (to a grater or leser degree) structure and function of the protein. As can be seen from taylor’s Venn diagram, such a description inherently suggests some form of multiple inheritance within an ontology of amino acids. We made an ontology that represents this sort of knowledge (though not a replica of taylor’s diagram) of amino acids in the following way:

  1. Made a class "AminoAcid";
  2. Made twenty disjoint subclasses for each of the amino acids;
  3. Made a series of value partitions and little "quality" hierarchies for "Charge", "Polarity", "Size" and "Hydrophobicity". We used the Entity Property Quality (EPQ) pattern.
  4. each of the amino acids then has a series of restrictions for "hasSize", "hasPolarity", "hasCharge" and "hasHydrophobicity" as appropriate;
  5. We then add any defined classes we want to make "LargeAminoAcid", "LargePositiveAminoAcid", etc. in the expected way.

This is a straight-forward example of Alan Rector’s normalisation pattern in which a tree of asserted subsumption relationships are made for the primary axis of classification (in this case simply being an amino acid) and all other aspects are separated out into restrictions upon the classes of amino acid. The defined classes then re-create the multiple inheritance that one sees in the typical description of amino acids. this example of normalisation is typical except for the very flat aserted tree for the amino acids themselves; we didn’t choose one of the characteristics of amino acids used in the classification as the primary axis. The most obvious axis would be the chemical structure of the amino acid, that is, an alpha carbon with an amino group and a carboxyl substituent. simply having a root of amino acid captures this notion adequately, even though this has not been explicitly axiomatised.

There are a few interesting points about this ontology:

  • Proline is not an amino acid; it is an imino acid (the NH rather than NH_2 group on the alpha carbon). Biochemists almost invariably talk of proline as an amino acid even though perfectly aware of this anomaly — it is a pure convenience.
  • The charges on the amino acids are Ph dependant. Depending on the context within the cell and the protein itself, the charge can change. Again, it is just convention that assigns these charges to amino acids.
  • The value partition for "Size" is a a division of the sizes into an arbitrary, but convenient partition. It is easier and more useful to make a few partitions like this than to give exact sizes; we avoid saying "valine is valine sized". (tiny, small, medium and large are just partitions along a range of size that biochemists have found it convenient to use.)

This amino acid ontology captures how biochemists talk about amino acids; that is, a conceptualisation of amino acids. Obviously amino acids exist, but the way in which they are described in this ontology smooths out some of the chemical reality that would tend to obscure rather than reveal the important features of amino acids. That is not to say, of course, that the chemical reality that has been smoothed out is not important, for instance, in explanations of reaction mechanisms and so on. When building an ontology, however, it is important to choose those features that are important for the representation needs.