Archive for May, 2011

Unicorns in my Ontology

May 26, 2011

A realist ontology only represents portions of reality; that is, classes of things that really really have instacnes out there in this one world in which we live. I’m told that if I’m not a realist, then I’m prepared to have unicorns in my ontology; that is, just fil the ontology up with nonsense. I’m not a realist and I’ve been criticised for therefore being willing to have unicorns in my ontologies. whilst I try and describe entities in biology, I have to deal with things that seem problematic from the realist perspective (I was involved with Phil Lord in writing a paper about some of these points). colour models, numbers, mathematics, all appear to create more heat than light from a realist perspective. I also remain to be convinced that I need a true account of numbers, units etc. just to be able to describe biological phenomena. the critical thing is a common way of doing it and this is not the same as saying the only way to achieve that common approach is to capture the “reality” of numbers and units and so on.

Here are several unicorns that I’m happy to have in one of my ontologies that apparently make my ontologies poora :

  1. Newtonian mechanics works well enough for virtually everything I need to model. However, modern physics tells us that Newtonian mechanics and the separation of time and space is not correct. We should model with only space-time. This makes life too hard for no apparent benefit, except at the very big and very small and most of the time it doesn’t matter. So, I’m happy to have the unicorn of separate classes for time and space in my ontology. It is worth noting that BFO takes this approach as well; so, BFO has unicorns and is thus “a bit realist”.
  2. My second unicorn is the cannonical anatomy. Typically, anatomies describe a cannonical organism; some idealised version of the organism. Of course, the ideal, for example, human being does not exist. So, cannonical anatomies describe an entity that doesn’t exist — another unicorn. Like space and time, the realist ontologies of OBO also model this non-existent entity. I’m happy to have the unicorn of the cannonical anatomy.
  3. My third unicorn are qualities of processes or occurants. BFO tells us that processes do not have qualities. This means, for instance, that a reaction cannot have a rate or velocity. The examples of this in biology and beyond are too numerous to even begin to properly enumerate. Again, we’re told that, according to BFO, that occurants cannot have qualities, so anything described like this doesn’t exist….; another unicorn.
  4. The Higgs boson and the graviton are conjectured to exist. I’m happy to place a class of Higgs boson into my ontology underneath sub-atomic particle, if that is the prevailing view of physicists. We have no evidence, other than the theoretical conjecture, for the particle, but it fits into the current models of sub-atomic physics. One should annotate the class with information about its status, but it is a useful class — I will wish to describe experiments from the large hadron Collider as to what they are about or for what they are looking — the Higgs Boson. I don’t need the overhead of putting Higgs boson under some hypothesis in some information ontology, though I might be willing to have a hypothesis about my Higgs boson that is a kind of sub-atomic particle. I do need to ask questions such as “describe the experiments that are about sub-atomic particles”; I don’t think I need the overheads of an ontological account of the reality of the Higgs Boson being a hypothesised class; I can just label the class as being hypothesised or putative or unproven.

Our amino acid ontology is full of simplifying assumptions that make it non-realist. It is full of arbitary defined classes combining the attributes of charge, polarity, aliphatic/aromatic side chain, size (a simple value partition that wil not be “real”), hydrophobicity, and so on. For instance, how we talk about charge is a simplifying assumption. Aimino acids have different charges in diffeerent conditions, but it is convenient to talk, for instance, of lysine having a positive charge; it helps biochemists explain its role in proteins.

As has been observed by others, a realist approach to modelling a domain is fraut with issues. Even ontologies that claim a realist stance cannot really claim to be properly realist with non-existent anatomies and useful, but untrue, simplifying assumptions about physics. Any non-real entity makes an ontology not a realist ontology, just as one cannot be a bit pregnant.

Such useful untruths may be practical, but the need to compromise on reality so we can have practicality is all too telling for a reality only approach; practicality will make an ontology non-realist. We need practicality in capturing what we need to say to accomplish what we need to do in describing our data. We don’t describe real entities very much, we describe data bout real entities; collections of those entities; approximations and smears of probability about entities. Precise simple rules are all very good, but when a complex world has to be changed in order to fit the simplistic model, things have gone wrong.

Simple messages are, however, easier than complex messages; compromise is always a bit mealy mouthed. The criteria for pragmatism and the rules that govern it are woolly, but we do need to move away from “just ask the guru” or “I feel it in my bones” – something that is all too common in bio-ontology building at the moment. The pragmatics agenda will be developed.

An Ontology of the Periodic Table Using electronic structure of the atom

May 5, 2011

I’ve been doing an experiment on inferring the groups of the periodic table from their physicochemical properties. It is hard and only sort of works.

Mendeleev’s Periodic Table also reflects the electronic structure of the atom that was revealed decades after the first version of the Periodic table was published. The light metals of the alkaline earths and alkali metals correspond to atoms with valence electrons in the S-shell; the transition metals are those with valence electrons in the D-shell and the non-metals have valence electrons in the P-shell.

It is much easier, of course, to define classes representing the groups of the Periodic Table using electronic structure. For example, a definition of an alkali metal atom is:

Class: AlkaliMetalAtom

    EquivalentTo: [in atoms]
        * AlkaliMetalAtom,
        Atom
         and (hasValenceElectronShell some
            (SShell
             and (contains exactly 1 Electron)
             and (hasOrder some integer[>= 2])))
         and (hasValenceElectronShell only
            (SShell
             and (contains exactly 1 Electron)
             and (hasOrder some integer[>= 2])))

Here I’ve said that any atom that has a valence shell that is an S-shell and that this electron shell contains exactly one elecgtron is to be recognised as an alkali earth atom. Note that I’ve put in a grubby fix to deal with the first S-shell. We nubmer electron shells; S1, S2, S3, S4, and so on are the S-shells that occur each time we start a new period. The first S-shell gives us hydrogen and helium; neither of which count as alkali metals or alkaline earths. So, I’ve put an index number on the shell. Each instance of S-shell doesn’t have an index. Also note that my modelling of valence shell is bad; I’ve done the usual thing of conflating a class into a property. I should really have something on the shell indicating that its role ifs valence or that it is outer rather than inner. However, I did it a quick and grubby way.

I’ve also put an atomic number on each atom class, and this is, of course, not “real”.the atomic number may correspond to the number of protons, but atomic numbers don’t exist. If anything, they are second order things; they belong to the class and not to the individuals themselves.

Anyway, that axiomisation all basically works. Here are a few more examples:

Class: HalogenAtom

    EquivalentTo: [in atoms]
        * HalogenAtom,
        Atom
         and (hasValenceElectronShell some
            (PShell
             and (contains exactly 5 Electron)))

for the noble, or inert, gases below, note the use of the index on shell to incorporate helium into the group:

Class: NobleGasAtom

    EquivalentTo: [in atoms]
        * NobleGasAtom,
        Atom
         and ((hasValenceElectronShell some
            (PShell
             and (contains exactly 6 Electron)))
         or (hasValenceElectronShell some
            (SShell
             and (contains exactly 2 Electron)
             and (hasOrder value 1))))

For some of the groups or families within the transition elements the story becomes slightly more complex. take as an example the nikel family:

Class: NickelFamilyAtom

    EquivalentTo: [in atoms]
        * NickelFamilyAtom,
        (Atom
         and (hasValenceElectronShell some
            (DShell
             and (contains exactly 8 Electron)))
         and (hasValenceElectronShell some
            (SShell
             and (contains exactly 2 Electron)))
         and (hasValenceElectronShell only
            (DShell
             or FShell
             or SShell)))
         or (Atom
         and (hasValenceElectronShell some
            (DShell
             and (contains exactly 9 Electron)))
         and (hasValenceElectronShell some
            (SShell
             and (contains exactly 1 Electron)))
         and (hasValenceElectronShell only
            (DShell
             or FShell
             or SShell)))
         or ((hasValenceElectronShell some
            (DShell
             and (contains exactly 10 Electron)))
         and (hasValenceElectronShell only
            (DShell
             and (contains exactly 10 Electron))))

Elements in this family have 8 electrons in the D-shell and 2 electrons in their outermost S-shell. Platinum atoms, however, have 9 electrons in their outermost D-shell and 1 in their outermost S-shell. an alternative modelling style would allow me to “add up” electrons and avoid this clumsiness.

Class: PlatinumAtom

    SubClassOf: [in atoms]
        * NickelFamilyAtom,
        Atom
         and (hasValenceElectronShell some
            (DShell
             and (contains exactly 9 Electron)
             and (hasOrder value 5)))
         and (hasValenceElectronShell some
            (FShell
             and (contains exactly 14 Electron)
             and (hasOrder value 4)))
         and (hasValenceElectronShell some
            (SShell
             and (contains exactly 1 Electron)
             and (hasOrder value 6)))
         and (hasValenceElectronShell only
            ((DShell
             and (contains exactly 9 Electron)
             and (hasOrder value 5))
             or (FShell
             and (contains exactly 14 Electron)
             and (hasOrder value 4))
             or (SShell
             and (contains exactly 1 Electron)
             and (hasOrder value 6))))
         and (hasAtomicNumber value 78)

Not the most exciting ontology I’ve ever built, but it does touch quite a lot of nice modelling points. There is another way to build the same ontology that allows less clumsy modelling and to simply state the number of electrons in the valence shell and “distribute” them around different shells to achieve the same effect. I’ll do this way and show it at a later date.

This version of the Periodic Table ontology is available.

using the DateTime Data type to Describe Birthdays

May 5, 2011

I recently used a fancy datatype in OWL for the first time. So far, for things like the family History Knowledgebase (FHKB), I’ve been modelling dates as years only and using integers to do this. This has no real consequences, apart from integer having a zero and the Gregorian system not doing so. Obviously use of the datetime data property offers some added extras and I finally got around to trying them out.

The XML data type of dateTime is used in OWL. It has the format yyyy-mm-ddThh:mm:ss. So, an individual’s birthday might be represented as:

Individual: p001

    Types: [in time]
        Person

    Facts:  [in time]
     bornOn  "1934-05-06T01:20:03"^^dateTime

This means that this person was born on the 6th of May 1934 at 20 minutes and 3 seconds bpast one o’clock in the morning. This is all sort of fine, except that this would be an unusual degree of precision in anyone’s understanding of their birth time. Some medical notes would record appearance of head, placenta and so on, but most people don’t know this time. Plus we’d have to make some definition of birth (that also accommodated caesarean birth).

So, we have to use facets to put a degree of vagueness into a birth day:

Individual: p008

    Types: [in time]
        bornOn some dateTime[>= "1943-05-22T00:00:00"^^dateTime , <= "1943-05-22T23:59:59"^^dateTime]

This means that this person was born at some point between midnight on 22nd May 1943 and 23 59 59 hours on the same day. From an open world point of view I’d like to be able to just leave it open as to the time of a particular date on which a person was born. However, leaving the time portion of the dateTime datatype out is syntactically invalid.

One makes classes in the TBox in the usual way also using the datatypes as I have done here. A 1960’s birth is defined as:

Class: SixtiesChild

    EquivalentTo: [in time]
        * SixtiesChild,
        Person
         and (bornOn some dateTime[>= "1960-01-01T00:00:00"^^dateTime , <= "1969-12-31T23:59:59"^^dateTime])

A bit more interesting is modelling baby boomer:

Class: BabyBoomer

Annotations:
comment "\"A baby boomer is a person who was born during the demographic post-world war two baby boom.\" - Taken from Wikipedia http://en.wikipedia.org/wiki/Baby_boomer. the definition here is from the US Census Office as stated on the Wikipedia page."

    EquivalentTo: [in time]
        * BabyBoomer,
        Person
         and (bornOn some dateTime[>= "1946-01-01T00:00:00"^^dateTime , <= "1964-12-31T23:59:59"^^dateTime])

The representation is the same style, but more issues come into play. The notion of baby boomer seems altogether vague, but seeks to capture the post second world war increase in births per capita of population. One thing is that definitions seems to vary. Also it would seem strange to have precise boundaries; one second is the difference between being a baby boomer and not. Finally, there seems to be a geographical and citizenship element to the definition. Different countries have different definitions; the Canadian baby boom is different to the US baby boom. Then if I’m born out of the UK to UK parents, in a country gthat doesn’t have a baby boom, am I still a baby boomer? If I’m counted in the UK’s census and I have a UK birth certificate, then I am. If I had birth certificates in two different countries (perfectly possible) and one country had a baby boom and one didn’t, then I could be a member of the baby boom and not member of a baby boom cohort.

The final definition I tried to write was for babies born in the early hours of the morning. So, I might write something like:

Class: earlyHoursBirth

        EquivalentTo: Person
                that bornAt some time[>= ""00:00:00"^^time, <= "02:59:59"^^time]

Where I’d wish to use just the time portion of the dateTime datatype. This doesn’t work. My birthdays ontology, very much a little toy, is available.

One final note is that the syntax for these kinds of definition is vile.