Archive for December, 2010

An Ontology for Inferring the Periodic Table of the Elements

December 21, 2010

The Periodic Table is one of the most beautiful works of science. In the mid 1800s chemists had found some sixty elements and had characterised many of their physicochemical properties. A goal of chemists was to organise elements according to increasing atomic weight and by their physicochemical properties. Chemists, including Newlands, had noticed periodicity in physicochemical properties, but Mendelev, in the 1860s, produced his Periodic table of the elements. His table organised elements in periods of increasing atomic weight with groups of elements that had similar physicochemical properties. Mendelev’s table had predictive properties – gaps – where he proposed putative elements and predicted their properties. In the 1920s physicists elucidated the electtronic structure of the atom and the shells of electrons and the numbers of electrons in these shells matched the structure of Mendelev’s table.

I’ve been trying an experiment along the lines of “Could Mendeleev dreamt in OWL?”. That is, can we deduce, just from the physico chemical properties known to chemists of the mid-1800s, the groups of the Periodic Table?the To this end I built a Periodic Table Ontology to try out this idea. I used modern notions of what was known in the mid-1800s. An ontology of the Periodic Table is easy if one uses the electronic structure of atoms, but this wasn’t known until the 1920s. Victorian chemist did, however, know about moles, ions and most of the physicochemical properties that we use today, even if under other names. So, my ontology has lots of values for these physicochemical properties and gives data properties in OWL a good exercise.

the Periodic Table Ontology (PTO) has the following major distinctions:

  1. Atoms
  2. Moles of atom (lumps of substance);
  3. Ion.
  4. Moles of Ion.
  5. Moles of chemical compound – lumps of more than one type of atom.

One of the main characteristics of chemical groups are the ratios in which the elements form salts. A salt is a compound made of at least one kind of metal ion with at least one kind of anion. To do this, we need the notion of a metal and defining “metal” actually prooved rather difficult.

From the list above, it can be seen that I’ve made a distinction between “atoms” as discrete physical objects and “Moles of atom” as lumps of stuff. So, we have sodium atom and mole of sodium atom (which is made of just sodium atoms). The sodium atom has properties such as numbers of proton, atomic radius, ionisation eneergies, and so on. The mole of sodium has properties such as boiling point, melting point, heat of enthalpy, and so on. A mole of something also has electrical conductance and this is what I used to define metals. Metals appear to be defined by their ability to form metal bonds and the formation of metal bonds depends on, amongst other things, pressure. Many elements will form metals at high enough pressure – indeed many more than what we call metal. The formation of metal bonds has nuclei sitting in a sea of electrons and this is what enables metals to conduct electricity. So, I used electrical conductance as a proxcy for metalness. The definition:

Class: MoleOfMetalAtom
    EquivalentTo: [in pto.owl]
        MoleOfPureAtom
         and (hasMeasurement some
            (Measurement
             and (hasQuantity some
                (ElectricalConductivity
                 and (hasValue some double[>= 5.405])))))

picks up most things as metals. A non-mental is defined as something with a conductivity of zero. Metaloids are troublesome; a semi-conductor is not the same as a metalloid, though many of the metalloids are semi-conductors. However, carbon can be a semi-conductor, but isn’t a metalloid. Being a metalloid is defined by a variety of properties, inclduing the type of compounds formed.

The definition above has a not very good description of measurements and units. The Measurement class captures observations. Each measurement measures some Quantity and can have other attributes such as conditions (standard temperature and pressure, for instance) and even some kind of provenance. a Quantity has a value (simple data property) and a unit. For the quantity of ElectricalConductivity we have a unit of SiemensPerMetre. My units ontology is a two hour lash up made from the Wikipedia page on SI units. Siemens per metre is a derived unit using the base units of length and conductance. Again, not a beautiful ontology, but it does its job. Overall, I’d much rather use Bijan Parsia’s OWL extension for units that does all the work for me (hierarchy plus some inter-unit conversion); for my purposes here I need no ontological explanation of units; units simply allow me to interpret some numbers.

Being a metal depends on conditions and a variety of behaviours and the spectrum is rather broad. Writing a definition in OWL (with the artificial constraint of not being very modern) was hard. An extensional definition, where all the metals are simply listed, might be the best route; an element is a metal because chemists say it is a metal based on a collection of broadish criteria that any given substance doesn’t have to universally comply.

I have a definition of mole of metal atom. I then wanted to define “metal atom”. I did this by adding that atoms form moles of atom (as the opposite of moles of atom being made of atoms). To be more ontologicallly proper I would say that atoms have a disposition to form moles of atom as not all atoms form moles of atom. However, this brings me nothing, so i haven’t done it.

As well as atoms, I have classes of ion. Atoms form ions and thus I can have metal ions as well. I added universal constraints on what ions each atom forms so that I coulddefine, if I wished, atoms that form only mono-valent cations and so on.

Having formed metal ions, I could define moles of metal ion, along with anion, cation, mono-, bi-, tri- and so on, valent cations and anions.

A mole of salt was then defined as:

Class: MoleOfSalt
    EquivalentTo: [in pto.owl]
        MoleOfCompoundChemical
         and (isMadeOfMoleOfIon some MoleOfAnion)
         and (isMadeOfMoleOfIon some MoleOfMetalIon)
         and (isMadeOfMoleOfIon only
            (MoleOfAnion
             or MoleOfMetalIon))

and a mole of sodium chloride looks like:

Class: MoleOfSodiumChloride
    SubClassOf: [in pto.owl]
        MoleOfCompoundChemical,
        hasColour some WhiteColour,
        hasState some SolidState,
        hasMeasurement some
            (Measurement
             and (hasQuantity some
                (Density
                 and (hasValue value 2.16)))),
        hasMeasurement some
            (Measurement
             and (hasQuantity some
                (KelvinBoilingPoint
                 and (hasValue value 1465.0)))
             and (inCondition some StandardPressure)),
        hasMeasurement some
            (Measurement
             and (hasQuantity some
                (KelvinMeltingPoint
                 and (hasValue value 801.0)))
             and (inCondition some StandardPressure)),
        hasMeasurement some
            (Measurement
             and (hasQuantity some
                (MolarMass
                 and (hasValue value 58.442)))),
        hasMeasurement some
            (Measurement
             and (hasQuantity some
                (SolubilityInWater
                 and (hasValue value 35.9)))),
        isMadeOfMoleOfIon only
            (MoleOfChlorideIon
             or MoleOfSodiumIon),
        isMadeOfMoleOfIon exactly 1 MoleOfChlorideIon,
        isMadeOfMoleOfIon exactly 1 MoleOfSodiumIon

Hidden in here is a shameful description of colour. All I’ve done is write down the colours as described on Web pages for chemicals. I even have a colour of “colourless colour” (yuk). Anyway, this does let me capture coloured salts, coloured compounds, white salts and so on.

with many other salts described I had the ability to start defining classes such as “salts that are made of exactly one mole of metal ion and one mole of chloride ion”, as in the alkali metal chlorides.

This highlights how verbose the modelling of measurement and quanity makes the ontology; much better to have some simple mechanism for describing quantities. Note also that the units for each quantity are on the named quanitty, so need not be put at this level.

I can define classes that pick up the alkali metal chlorides; the alkaline earths and the halogens. For example:

Class: MoleOfAlkaliMetalChloride
    Annotations: [in pto.owl]
        label "Mole Of Alkali Metal Chloride"@
    EquivalentTo: [in pto.owl]
        MoleOfCompoundChemical
         and (hasMeasurement some
            (Measurement
             and (hasQuantity some
                (Density
                 and (hasValue some double[< 4.0])))))
         and (isMadeOfMoleOfIon only
            (MoleOfChlorideIon
             or MoleOfMetalIon))
         and (isMadeOfMoleOfIon exactly 1 MoleOfChlorideIon)
         and (isMadeOfMoleOfIon exactly 1 MoleOfMetalIon)

Like much of the PTO, this relies on qualified cardinality constraints. Much of this kind of chemistry is defined by precise descriptions of exactly how much of this combines with that and so on. FaCT++ (and other reasoners) gets upset with QCRs much greater than 3 or so. Initially I had each atom defined by its proton; this made the reasoners die. (Of course, this is modern chemistry, but has no impact on my particualar constraints.) I’ve also skirted around describing moles as having Avogadro’s number of entities via QCR!

The PTO takes about 15 minutes to classify on my little lap top with FaCT+. Other reasoners (Pellet and HermiT) are defeated. The QCR probably slows things down and I know there are redundant axioms (the +hasPart exactly n Proton makes the disjointness between the atoms redundant. The disjointness was put in as part of a standard normalisation pattern. As I tend to do now, I made no tree, but just a list of atoms (and the other major classes) and use defined classes to build all hierarchy; this is probably rather expensive. However, building the hiearchy of groups is entirely the purpose of this exercise, so putting in my own would be cheating.

Several things have defeated me on forming definitions for other groups:

  1. My ignorance of transition metal chemistry.
  2. The inability to easily define trends. Alkali metals get softer, more reactive and so on as atomic weight (mass) increases.
  3. The noble gases are an interesting case. Their other name, the “inert gases” gives the clue; they are defined by the fact that, under most conditions, They do nothing. Without explicitly saying this, OWL’s open world assumption makes this hard.
  4. The groups at the start of the P-block move over metals and non-metals and, again, I’m just not good enough at the chemistry. I need to model how reactions take place and I’ve not done this.n

I can nearly do what I want; I can end up with classes that describe various groups in the Periodic Table. The layout is not the job of the ontology; this needs to be done by by a programme using the PTO. the predictive quality of the PTO was helped along by layout — the “gaps” in the table and some interpolation of values for physicochemical properties by chemists; this isn’t really the job of the ontology, but it is a job in which the ontology can participate. there wil be more on this later. Making a group based ontology of the elements is easy if one uses electronic structure. Alkali metals are defined by having one electron in the valence S-shell; alkaline earths have two electrons in the valence S-shell; halogens have five electrons in their valence P-Shell. Thre’s a little fiddling about in the transition or D-block, but its all quite straight-forward. A PTO using electronic structure is available. My attempt to “dream the Periodic Table in OwL” has, at best, been a partial success. It could be done, but I need much more clever ways of expressing relationships between classes — as atomic number (or atomic radius) increases, hardness decreases and reactivity increases; this, together with the formation of chlorides only in a ratio of one to one and oxides only in the ratio two to one, is suggicient to recognise an atom as being an alkali metal atom. I shall use the PTO has a way of highlighting many more issues in using OWL to model a domain’s semantics; some good and some bad.

Advertisements

An Ontology of Sub-Atomic Particles

December 18, 2010

As another modelling exercise, I built a little ontology of sub-atomic particles. This was motivated by work on a description of basic chemistry that was an attempt to infer the groups of the Periodic Table from physico-chemical properties using only OWL’s automated reasoners (Periodic Table Ontology).

Atoms can, of course, be defined by their protons; any atom that has 3 protons can be recognised to be a lithium atom and can be a lithium atom – were it to have 2 protons, it would be a helium atom. Chemistry largely comes from behaviour of electrons; the basic number of which is determined by the number of protons and the ease with which they are gained or lost by a type of atom. We also have neutrons as well, variations of which gives us isotopes of atoms.

When describing chemistry, where does our interest end? Presumably, we could just build an ontology of sub-atomic particles and by eventual and successive composition we can create an ontology of everything; this is silly. However, for chemistry we only need to go as far as protons and ewlectrons (and in some cases the neutrons affect chemistry as well). However, for most chemistry I’m not interested in things below proton/electron.

There is often a question of “when do I stop modelling?”. We do not need to go any further than protons and electrons (quarks) and so on as far as chemistry is concerned. So, for my Periodic Table ontology, I just drew the line at having protons, electrons and neutrons; I can capture most of my chemistry without appealing to sub-atomic physics (though, of course, it does get important, just not for what I’m doing and what many people in biology wish to do).

As a little side-line, however, I knocked together an ontology of sub-atomic particles. I have an ignorance of the area. I used Wikipedia pages to gather my knowledge. As usual, as with any ontology, just doing it raises some interesting points.

I used a flat list of sub-atomic particles and then used defined classes to build the hierarchy. I have a quality of “spin” — both half and full. A particle has either half or full spin (so the property is functional). thus Fermions and Bosons are built.

Class: Fermion
    Annotations: [in particle.owl]
        label "Fermion"
    EquivalentTo: [in particle.owl]
        FundamentalSubAtomicParticle
         and (hasSpin some HalfSpin)

and

Class: Boson
    Annotations: [in particle.owl]
        label "Boson"
    EquivalentTo: [in particle.owl]
        FundamentalSubAtomicParticle
         and (hasSpin some IntegerSpin)

Leptons are fundamental particles with half spin and carriers of various types of force:

Class: Lepton
    Annotations: [in particle.owl]
        label "Lepton"
    EquivalentTo: [in particle.owl]
        FundamentalSubAtomicParticle
         and (carriesForce some
            (ElectroMagneticForce
             or GravitationalForce
             or WeakNuclearForce))
         and (hasSpin some HalfSpin)
         and (carriesForce only
            (ElectroMagneticForce
             or GravitationalForce
             or WeakNuclearForce))

A Nucleon is simply EquivlantTo: (Proton or Neutron. There’s a whole lot more of such defined classes.

I’ve used hasPart that is transitive and hasDirectPart as a sub-property that is not transitive. In OWL one cannot count with transitive propeties, so a design pattern of counting with the intransitive sub-property that implies the transitive super-property yields the desired effects.

Similarly, using qualified cardinality restrictions gives the ability to describe Proton and Neutron by the number of different types of Quark;a neutron has one up quark and 2 down quark (plus some gluon). A Baryion, not a particle of realism, but a composite particle made of exactly 3 quark. A Pion has a quark and an anti-quark. A composite sub-atomic particle is any particle with min 2 fundamental sub-atomic particles (that are those sub-atomic particles that cannot be further divided). I modelled composite particle as having min 2 fundamental or anti-fundamental sub-atomic particles.

Charge on Quarks is interesting. Protons have a charge of one and neutrons are, well, neutral. They are made up from quarks that have charge that “add up” to the charge of the resulting particle. So, we end up with one third and two thirds positive and negative charges. this sounds silly, but what we’re really talking about is “amount” of charge. Charge is given an integer value because some particles (atoms) have twice as much charge as another and so on. As we found out about sub-atomic particles that have charge that are fractions of this conventional amount, we start to get fractional charges. This would suggest that charge should be an “amount” that has a quality of being positive or negative. waht I did was multiply the one third and two third charges by three, giving me the integer; if this permeated through, I’d have to multiply my “traditional” charges on atoms by 3 as well.

I have included “unicorn” entities such as the Higgs boson and the graviton. These particles have been postulated, but (as far as I know), we’ve as yet no evidence for their existence — apart from a theoretical postulation.

There are particles and anti-particles. Waht is the relationship between the two? I don’t want to use the standard restriction upon a class:

Class: Electron
        SubClassOf: SubAtomicParticle
that isAntiParticleOf  some Positron

as this means “each and every electrron is an anti-particle of at least one positron”; we don’t want to say this at all. We need a higher order statement that the class “electron” has a relationship with the class “Positron”. this relationship is also symmetric. OWL2’s punning is a candidate for saying this — or we use simple annotation properties. Also, one goal is to have the hierarchy of the anti-particles built by the hierarchy of the particles. That is, as we infer the one so we infer the heirarchy of the other.

Here are a few things that I’d like to be able to do (but lack the knowledge of the physics):

  1. I need mass in the ontology;
  2. I need energy in the ontology — perhaps even inferring the equivalence of mass and energy.
  3. The other things I’ve forgotten or don’t know about.

An Ontology of Sub-Atomic Particles

December 18, 2010

I built a little <a href="http://www.cs.man.ac.uk/stevensr/ontology/particle.owl”>ontology of sub-atomic particles. This was motivated by work on a description of basic chemistry that was an attempt to infer the groups of the Periodic Table from physico-chemical properties using only OWL’s automated reasoners (<a href="http://www.cs.man.ac.uk/stevensr/ontology/periodic.zip”>periodic table ontology).

Atoms can, of course, be defined by their protons; any atom that has 3 protons can be recognised to be a lithium atom and can be a lithium atom – were it to have 2 protons, it would be a helium atom. Chemistry largely comes from behaviour of electrons; the basic number of which is determined by the number of protons and the ease with which they are gained or lost by a type of atom. We also have neutrons as well, variations of which gives us isotopes of atoms.

When describing chemistry, where does our interest end? Presumably, we could just build an ontology of sub-atomic particles and by eventual and successive composition we can create an ontology of everything; this is silly. However, for chemistry we only need to go as far as protons and ewlectrons (and in some cases the neutrons affect chemistry as well). However, for most chemistry I’m not interested in things below proton/electron.

There is often a question of “when do I stop modelling?”. We do not need to go any further than protons and electrons (quarks) and so on as far as chemistry is concerned. So, for my Periodic Table ontology, I just drew the line at having protons, electrons and neutrons; I can capture most of my chemistry without appealing to sub-atomic physics (though, of course, it does get important, just not for what I’m doing and what many people in biology wish to do).

As a little side-line, however, I knocked together an ontology of sub-atomic particles. I have an ignorance of the area. I used Wikipedia pages to gather my knowledge. As usual, as with any ontology, just doing it raises some interesting points.

I used a flat list of sub-atomic particles and then used defined classes to build the hierarchy. I have a quality of “spin” — both half and full. A particle has either half or full spin (so the property is functional). thus Fermions and Bosons are built.

Class: Fermion
    Annotations: [in particle.owl]
        label "Fermion"
    EquivalentTo: [in particle.owl]
        FundamentalSubAtomicParticle
         and (hasSpin some HalfSpin)

and

Class: Boson
    Annotations: [in particle.owl]
        label "Boson"
    EquivalentTo: [in particle.owl]
        FundamentalSubAtomicParticle
         and (hasSpin some IntegerSpin)

Leptons are fundamental particles with half spin and carriers of various types of force:

Class: Lepton
    Annotations: [in particle.owl]
        label "Lepton"
    EquivalentTo: [in particle.owl]
        FundamentalSubAtomicParticle
         and (carriesForce some
            (ElectroMagneticForce
             or GravitationalForce
             or WeakNuclearForce))
         and (hasSpin some HalfSpin)
         and (carriesForce only
            (ElectroMagneticForce
             or GravitationalForce
             or WeakNuclearForce))

A Nucleon is simply EquivlantTo: (Proton or Neutron. There’s a whole lot more of such defined classes.

I’ve used hasPart that is transitive and hasDirectPart as a sub-property that is not transitive. In OWL one cannot count with transitive propeties, so a design pattern of counting with the intransitive sub-property that implies the transitive super-property yields the desired effects.

Similarly, using qualified cardinality restrictions gives the ability to describe Proton and Neutron by the number of different types of Quark;a neutron has one up quark and 2 down quark (plus some gluon). A Baryion, not a particle of realism, but a composite particle made of exactly 3 quark. A Pion has a quark and an anti-quark. A composite sub-atomic particle is any particle with min 2 fundamental sub-atomic particles (that are those sub-atomic particles that cannot be further divided). I modelled composite particle as having min 2 fundamental or anti-fundamental sub-atomic particles.

Charge on Quarks is interesting. Protons have a charge of one and neutrons are, well, neutral. They are made up from quarks that have charge that “add up” to the charge of the resulting particle. So, we end up with one third and two thirds positive and negative charges. this sounds silly, but what we’re really talking about is “amount” of charge. Charge is given an integer value because some particles (atoms) have twice as much charge as another and so on. As we found out about sub-atomic particles that have charge that are fractions of this conventional amount, we start to get fractional charges. This would suggest that charge should be an “amount” that has a quality of being positive or negative. waht I did was multiply the one third and two third charges by three, giving me the integer; if this permeated through, I’d have to multiply my “traditional” charges on atoms by 3 as well.

I have included “unicorn” entities such as the Higgs boson and the graviton. These particles have been postulated, but (as far as I know), we’ve as yet no evidence for their existence — apart from a theoretical postulation.

There are particles and anti-particles. Waht is the relationship between the two? I don’t want to use the standard restriction upon a class:

Class: Electron
        SubClassOf: SubAtomicParticle
that isAntiParticleOf  some Positron

as this means “each and every electrron is an anti-particle of at least one positron”; we don’t want to say this at all. We need a higher order statement that the class “electron” has a relationship with the class “Positron”. this relationship is also symmetric. OWL2’s punning is a candidate for saying this — or we use simple annotation properties. Also, one goal is to have the hierarchy of the anti-particles built by the hierarchy of the particles. That is, as we infer the one so we infer the heirarchy of the other.

Here are a few things that I’d like to be able to do (but lack the knowledge of the physics):

  1. I need mass in the ontology;
  2. I need energy in the ontology — perhaps even inferring the equivalence of mass and energy.
  3. The other things I’ve forgotten.

An Update to the Amino Acids Ontology

December 18, 2010

I’ve been cleaning up the amino acids ontology by removing unwanted axioms, rationalising the annotations and, importantly for this exercise, adding lots of new defined classes. The latest amino acids ontology is available. In short, the amino acids ontology takes each of the 20 amino acids used in biology and describes them according to their polarity, hydrophobicity, charge, size and the aliphatic/aromatic nature of the side chain.

I’ve put in lots of defined classes such as aliphatic amino acid; positively charged amino acid, polar amino acid, hydrophilic amino acid all being defined as equivalent to the appropriate restriction. These are straiht-forward and have the expected effect. The class:

Class: SmallAminoAcid
                EquivlentTo: AminoAcid
                        that hasSize some Small

After reasoning (hasSize is functional), this has thamino acids Cysteine, Aspartate, Glutamate, Asparagine, Proline and Valine as subclasses. a similar class for Tiny amino acids has Alanine, glycine, Serine and Threonine.

If we then create a class such as:

Class: SmallPolarAminoAcid
        equivalentTo:
        SmallAminoAcid
                and
        PolarAminoAcid

after reasoning we have the amino acids Cysteine and Asparagine, underneath, and, of course, this is a kind of SmallAminoAcid and a kind of PolarAminoAcid.

this is all straight-forward. Some more interesting things happen because of the covering axiom in the ontology. For AminoAcid, we have the following axiom:

Class: AminoAcid
        EquivalentTo: [in amino-acid.owl]
        Alanine
         or Cysteine
         or Aspartate
         or Glutamate
         or Phenylalanine
         or glycine
         or Histidine
         or Isoleucine
         or Lysine
         or Leucine
         or Methionine
         or Asparagine
         or Proline
         or Glutamine
         or Arginine
         or Serine
         or Threonine
         or Valine
         or Tryptophan
         or Tyrosine

This is a closure axiom that says that if an individual is an amino acid, then it has to be one of these amino acids. It also says that is an individual is a member of one of these classes, then it must be an amino acid. the useful semantics here is that we’re saying the only amino acids that can exist are these 20 amino acids that are used in biology. Chemically this covering axiom is not justified, but it is biologically true (as long as we ignore modified amino acids). As we now “know” all the amino acids, we start inferring more things. For instance:

LargeAromaticAminoAcid and AromaticAminoAcid both have Phenylalanine, Histidine, tryptophan, and Tyrosine as subclasses. In our locally closed world, we can now infer that these two classes are equivalent (they have the same extents). This reveals a small biochemical truth — being aromatic means being large and aromatic; all the aromatic side chains in biological amino acids make the side chains large.

Now, if we make a class such as:

Class: SmallPositiveAminoAcid
        EquivalentTo: SmallAminoAcid and PositiveChargedAminoAcid

it becomes unsatisfiable. With the covering axiom we “know” all the amino acids. We know this intersection will be empty in this version of the world and so it is unsatisfiable as there are no amino acids that are both small and positively charged — and the autmated reasoners has told us so.

We have lots of examples of inferred quivalence in this ontology. LargeChargedAminoAcid and LargePositiveChargedAminoAcid are the same thing; if an amino acid is large and charged, then it is positively charged. similarly, SmallPolarAminoAcid and SmallPolarAliphaticAminoAcid (Cysteinine and Asparagine) – an amino acid cannot be small and aromatic, so has to be aliphatic. Finally, we see that the classes SmallChargedPolarAminoAcid, SmallChargedAminoAcid and NegativeChargedAminoAcid are all equivlaent and subsume Aspartate, glutamate and Tyrosine, and are equivalent. Again, some small biochemical truth is here – to be small an amino acid has to be aliphatic. If an amino acid is small and charged then it is negatively charged and so on.

all these inferences about equivalence and unsatisfiability are driven by the covering axiom. Ontologically this is not right, this is not an exhaustic list of all amino acids, but just the biologically “interesting” ones. If the covering axiom was undesireable, it is useful in development to see which classes are redundant and unsatisfiable. the unsatisfiable classes can have no instances, but it may be useful to this for teaching reasons (a trivial example, but it makes the point well enough) as one can show why the classes are unsatisfiable because this “closed” world has no instances of the types described and this reflects some simple biochemistry. It may be that biochemically, one may wish to highlight the equivalence of many forms and how the chemistry drives such a view. The covering axiom reveals this and the equivalences and unsatisfiability would not be revealed without the covering axiom.

The TAMBIS Ontology

December 18, 2010

The TAMBIS ontology (TaO) was the first ontology that I had any hand in building, though it was really only the smallest of hands — Pat Baker did the overwhelming amount of the work on TaO; I am, however, forever grateful that TAMBIS and Pat’s work gave me access to this field of research. The TaO was built for the Transparent Access to Multiple Bioinformatics Information Sources project. The TaO describes some basic molecular biology and its associated bioinformatics — so proteins and protein database records.

The TaO drove a query interface. the TaO was represented in Grail, a knowledge representation LANGUAGE that looks like a description logic that more or less corresponds to the EL+ profile of OWL 2. The TaO provided the building blocks of the domain and the query interface allowed a user to build up description of that which they wished to retrieve. For example, a protein has parts that are motifs; motifs indicate a function, so a description can be built of proteins that have motifs indicating particular functions.

The contents of the TaO were mapped to values within the bioinformatics resources accessible within TAMBIS. A query rewriting engine took the conceptual query and re-wrote it in a form (essentially a little programme) that would retrieve the instances from the bioinformatics resources that "filled" the query class.

TAMBIS worked, but was very much ahead of its time in terms of technology and its ability to support the goals of TAMBIS. It would work much better today than it did in the late 1990s. Some of my obsevations on this are:

  • Programmatic access to the bioinformatics resources was terrible. we used screen-scraping to retrieve data. this was, of course, very fragile. Today, many bioinformatics resources are available as Web services and this avoids some of these problems.
  • Querying of the resources was poor and not just because of the lack of programmatic access. Keyword searches over fields in records was possible, but a proper query language was lacking. SRS came the closest to providing such a facility and we did use SRS behind some of our resource wrappers.
  • In TAMBIS we had to try and work out the conceptualisation implicit within the resources we were trying to access. These conceptualisations skewed the TaO in many respects — the resources were too explicit in the TaO itself. Since the TaO was developed the terminology within many resources has become much more accessible through the efforts of the Gene Ontology and related efforts in the OBO consortium. A common conceptualisation across resources makes the life of the mapper and queerier much easier.
  • Related to this would be the building of the TaO itself. Today’s TaO would be ontologies such as the Gene Ontology, Sequence Ontology and so on, joined up in a description logic, compositional fashion such that a user could describe a protein with a given function, that took part in a particular process in a given cellular location and had a particular sequence feature. This would enable nice descriptions of instances to be made and, what is more important, the resources marked up with GO etc. could actually answer the question. TAMBIS could ask nice questions, but the underlying resource’s ability to answer them was lacking.
  • On top of all this is bioinformatics’ inability to have a common identification scheme for the entities that it describes. It is very difficult to know if a protein described in one record is the same as one described in another database’s record.(Indeed, what is it that a protein record describes: A single protein, a pool of proteins; a consensus over many varients of the same kind of protein; and so on. the same goes for other kinds of entity.) This makes it almost impossible to adequately integrate resources with any confidence. Much of a bioinformatician’s time is taken in doing mappings between resources and, of course, no one trusts any one else’s mappings, so they are continually re-done. It sometimes seems that there is a danger that if this fundamental blem were to be fixed, then much of a bioinformatician’s craft knowledge would disappear and some science might ensue. Of course, bioinformaticians squeeze out some marvellous answers from the resources we have, but it often seems that bioinformaticians are trying to run encumbered by a huge weight; the weight of issues of identity. It should be added, of course, that a solution is non-trivial, though I think it more of a social problem than anything else.

When the TAO was conceptualised we conflated aspects of biology and bioinformatics. we described proteins as having molecular weights and accession numbers and so on. This is not a good representation of the world of information that we modelled. More properly we should have separately described proteins and their representation. this would have meant a cleaner separation of, for instance, the many representations of a protein and the actual protein. One has to ask, of course, whether this would have made any practical difference other than to have added complexity to query formulation. The myGrid ontology used in the annotation of Web services within Taverna etc. originally had such a separation, but this was for creation of the terminology, not for querying. this clean separation has now gone in favour of something much lightger weight — more akin to a SKOS representation. SKOS is meant for artefacts designed for indexing and information retrieval and this is, after all, what TAMBIS did, so such a representation might well have had advantages.

Towards the end of the TAMBIS project the Ontology Inference Layer (one of the forehhrunners of OWL) appeared. I did a little experiment in re-working the TaO in OIL. This was my first real attempt at ontology modelling and I did it without access to any ontology editing tools at all (at that point we had the editor OILEd, but Java was at that time inaccessible to my screenreader) and reasoning tools. My only way of checking what I’d done was to take my ontology along to OILEd’s developer (Sean Bechhofer) and ask him to classify it — something I only did a few times as I didn’t want to cause too much hassle. the consequence of this is that I’d add scores of acxioms without any resort to a reasoner (a bad mistake). This new version was about two week’s effort to just play around with OIL and to experiment with re-conceptualising the TaO. This history is why this particular ontology has had a longevity and an ewmbarrassment factor (for me) that was very much unintended.

In short this new TaO was pretty terrible. I was only just learning the ins and outs of modelling with universal and existential quantification (GRAIL had only existential quantification); so there are all kinds of mistakes from that direction. there are also a fair number of simple howlers. As I didn’t have access to a reasoner, I had no idea (apart from inspection) of when my axioms made an unsatisfiable ontology — it turned out that there were all sorts of contradictions in the ontology. There’s also some pretty horrible gaffs.

The reason that this ontology has somewhat haunted me is that it was used as a test for the development of explanations for DL reasoners. The ability to provide explanations of unsatisfiabilities and inconsistencies in DL ontologies [1,2] is a great boon, but that this dreadful ontology of mine (unfortunately labelled as the "TAMBIS Ontology" the TAMBIS ontology is the one Pat Baker wrote) is a little unfortunate. Anyway, the 147 unsatisfiable classes can be tracked down to only a few causes and this is some comfort [1,2].
this ontology of mine should be the "Not TAMBIS" Ontology.

All the various forms of the TaO and the "play" versions of the ontology are available:

  • The big version of the TaO in GRAIL.
  • A smaller version of the TaO in GRAIL that was actually fully supported in the TAMBIS software.
  • New TAMBIS in OIL.
  • New TAMBIS in OWL.

References

  • [1] Bijan Parsia, Evren Sirin, Aditya Kalyanpur: Debugging OWL ontologies. WWW 2005: 633-640
  • [2] Aditya Kalyanpur, Bijan Parsia, Evren Sirin, James A. Hendler: Debugging unsatisfiable classes in OWL ontologies. J. Web Sem. 3(4): 268-293 (2005)