My friend Phil Lord and I made this ontology of amino acids many years ago as a demonstration of various widgets in the OWL plugin for Protege 3. It took us about twenty minutes to make the whole ontology. It served as a demonstration of a move towards a sort of mass production of axiomatically rich ontologies. This was its only real purpose, but it does have some quite nice features and raises a few interesting issues.
Biochemists classify amino acids in various ways. taylor provides a much used classification of amino acids that visualises the types of amino acids in a Venn diagram that shows the intersecting sets of amino acid types. One use of this kind of classification is to capture the substitutions between amino acids within proteins during change that will preserve (to a grater or leser degree) structure and function of the protein. As can be seen from taylor’s Venn diagram, such a description inherently suggests some form of multiple inheritance within an ontology of amino acids. We made an ontology that represents this sort of knowledge (though not a replica of taylor’s diagram) of amino acids in the following way:
- Made a class "AminoAcid";
- Made twenty disjoint subclasses for each of the amino acids;
- Made a series of value partitions and little "quality" hierarchies for "Charge", "Polarity", "Size" and "Hydrophobicity". We used the Entity Property Quality (EPQ) pattern.
- each of the amino acids then has a series of restrictions for "hasSize", "hasPolarity", "hasCharge" and "hasHydrophobicity" as appropriate;
- We then add any defined classes we want to make "LargeAminoAcid", "LargePositiveAminoAcid", etc. in the expected way.
This is a straight-forward example of Alan Rector’s normalisation pattern in which a tree of asserted subsumption relationships are made for the primary axis of classification (in this case simply being an amino acid) and all other aspects are separated out into restrictions upon the classes of amino acid. The defined classes then re-create the multiple inheritance that one sees in the typical description of amino acids. this example of normalisation is typical except for the very flat aserted tree for the amino acids themselves; we didn’t choose one of the characteristics of amino acids used in the classification as the primary axis. The most obvious axis would be the chemical structure of the amino acid, that is, an alpha carbon with an amino group and a carboxyl substituent. simply having a root of amino acid captures this notion adequately, even though this has not been explicitly axiomatised.
There are a few interesting points about this ontology:
- Proline is not an amino acid; it is an imino acid (the NH rather than NH_2 group on the alpha carbon). Biochemists almost invariably talk of proline as an amino acid even though perfectly aware of this anomaly — it is a pure convenience.
- The charges on the amino acids are Ph dependant. Depending on the context within the cell and the protein itself, the charge can change. Again, it is just convention that assigns these charges to amino acids.
- The value partition for "Size" is a a division of the sizes into an arbitrary, but convenient partition. It is easier and more useful to make a few partitions like this than to give exact sizes; we avoid saying "valine is valine sized". (tiny, small, medium and large are just partitions along a range of size that biochemists have found it convenient to use.)
This amino acid ontology captures how biochemists talk about amino acids; that is, a conceptualisation of amino acids. Obviously amino acids exist, but the way in which they are described in this ontology smooths out some of the chemical reality that would tend to obscure rather than reveal the important features of amino acids. That is not to say, of course, that the chemical reality that has been smoothed out is not important, for instance, in explanations of reaction mechanisms and so on. When building an ontology, however, it is important to choose those features that are important for the representation needs.