Archive for the ‘Ontologies’ Category

Getting emotional about ontologies

October 22, 2014

It’s taken a long time, but we’re finally publishing our paper about evaluating the Emotion Ontology (EM). Evaluating ontologies always seems to be hard. All too often there is no evaluation, or it ends up being something like “I thought about it really hard and so it’s OK”, or “I followed this method, so it’s OK”, which really amounts to not evaluating the ontology. Of course there are attempts made to evaluate ontologies with several appealing to the notion of “large usage indicates a good ontology”. Of course, high usage is to be applauded, but it can simply indicate high need and a least bad option. High-usage should imply good feedback about the ontology; so we may hope that high usage would be coupled with high input to, for instance, issue trackers (ignoring the overheads of actually issuing an issue request and the general “Oh, I can’t be bothered” kind of response) – though here I’d like to see typologies of requests issued and how responses were made.


Our evaluation of the Emotion Ontology (EM) fits into the “fitness for purpose” type of evaluation – if the EMdoes its job, then it is to some extent a good ontology. A thorough evaluation really needs to do more than this, but our evaluation of the EM is in the realm of fitness for purpose.


An ontology is all about making distinctions in a field of interest and an ontology should make the distinctions appropriate to a given field. If the ontology is doing this well, then an ontology delivers the vocabulary terms for the distinctions that need to be made in that field of interest – if we can measure how well people can use an ontology to make the distinctions they feel necessary, then the ontology is fit for purpose. In our paper we attempted to see if the EM makes the distinctions necessary (and thus the appropriate vocabulary) for a conference audience to be able to articulate their emotional response to the talks – in this case the talks at ICBO 2012. That is, the EM should provide the vocabulary distinctions that enables the audience to articulate their emotional response to talks. The nub of our null hypothesis was thus that the EM would not be able to let the audience members articulate their emotions such that we can cluster the audience by their response.


The paper about our evaluation of the EM is:


Janna Hastings, Andy Brass, Colin Caine, Caroline Jay and Robert Stevens. Evaluating the Emotion Ontology through use in the self-reporting of emotional responses at an academic conference. Journal of Biomedical Semantics, 5(1):38, 2014.


The title of the paper says what we did. As back-story, I was talking to Janna Hastings at ISMB in Vienna in 2011 and we were discussing the Emotion Ontology that she’d been working on for a few years and this discussion ended up at evaluation. We know that the ontology world is full of sects and that people can get quite worked up about work presented and assertions made about ontologies. Thus I thought that it would be fun to collect the emotional responses of people attending the forthcoming International Conference on biomedical Ontology 2012 (ICBO), where I was a programme co-chair. If the EM works, then we should be able to capture self-reported emotional responses to presentations at ICBO. We, of course, also had a chuckle about what those emotions may be in light of the well-known community factions. We thought we should try and do it properly as an experiment, thus the hypothesis, method and analysis of the collected data.





Colin Caine worked as a vacation student for me in Manchester developing the EmOntoTag tool (see Figure 1 above), which has the following features:

  • It presented the ICBO talk schedule and some user interface to “tag” each talk with terms from the EM. The tags made sentences like “I think I’m bored”, “I feel interested” and “I think this is familiar”.
  • We also added the means by which users could say how well the EM term articulated their emotion. This, we felt, would give us enough to support or refute our hypothesis – testing whether the EM gives the vocabulary for self-reporting emotional response to a talk and how well that term worked as an articulation of an emotion. We also added the ability for the user to say how strongly they felt the emotion – “I was a bit bored”, “I was very bored” sort of thing.
  • We also used a text-entry field to record what the audience members wanted to say, but couldn’t – as a means of expanding the EM’s vocabulary.


We only enabled tagging for the talks that gave us permission to do so. Also, users logged in via a meaningless number which were just added to conference packs in such a way that we couldn’t realistically find out whose responses were whose. We also undertook not to release the responses for any individual talk, though we sought permission from one speaker to put his or her talk’s emotional responses into the paper.


The “how well was the EM able to articulate the necessary emotion you felt?” score was significantly higher than the neutral “neither easy nor difficult” point. So, the ICBO 2012 audience that participated felt the EM offered the appropriate distinctions for articulating their emotional response to a talk. The bit of EmOntoTag that recorded terms the responders wanted, but weren’t in the EM included:

  • Curious
  • Indifferent
  • Dubious
  • Concerned
  • Confused
  • Worried
  • Schadenfreude
  • Distracted
  • Indifferent or emotionally neutral


There are more reported in the paper. Requesting missing terms is not novel. The only observation is that doing the request at the point of perceived need is a good thing; having to change UI mode decreases the motivation to make the request. The notion of indifference or emotionally neutral is interesting. It’s not really a term for the EM, but something I’d do at annotation time, that is, “not has emotion” sort of thing. The cherry-picked terms I’ve put above are some of those one may expect to be needed at an academic conference; I especially like the need to say “Schadenfreude”. All the requested terms, except the emotionally neutral one, are now in the EM.


There’s lots of data in the paper, largely recording the terms used and how often. A PCA did separate audience members and talks by their tags. Overall, the terms used were of the positive valence “interested” as opposed to “bored”. These were two of the most used terms; other frequent terms were “amused”, “happy”, “this is familiar” and “this is to be expected”.


The picture below


Shows the time line for the sample talk for which we had permission to show the emotional responses. Tags used were put into time slot bins and the size of the tags indicates the number of times that tag was used The EM appraisals are blue, the EM’s emotions are red and the EM’s feelings are green. We can see that, of the respondants, there was an over-whelming interest, with one respondant showing more negative emotions: “bored”, “bored”, “tired”, “restless” and “angry”. Of course, we’re only getting the headlines; we’re not seeing the reason or motivation for the responses. However, we can suspect that the negative responses mean that person didn’t like the presentation, but that there was a lot of interest, amusement and some pleasure derived from understanding the domain (“mastery pleasure”).


We think that this evaluation shows that the EM’s vocabulary works for the self-reporting of emotional response in an ontology conference setting. I’m prepared to say that I’d expect this to generalise to other settings for the EM. We have, however, only evaluated the ontology’s vocabulary; in this evaluation we’ve not evaluated its structure, its axiomatisation, or its adherence to any guidelines (such as how class labels are structured). There is not one evaluation that will do all the jobs we need of evaluation; many aspects of an ontology should be evaluated. However, fitness for purpose is, I think, a strong evaluation and when coupled with testing against competency questions, some technical evaluation against coding guidelines, and use of some standard axiom patterns, then an evaluation will look pretty good. I suspect that there will be possible contradictions in some evaluations – some axiomatisations may end up being ontologically “pure”, but militate against usability and fitness for purpose. Here one must make one’s choice. All in all, one of the main things we’ve done is to do a formal, experimental evaluation of one aspect of an ontology and that is a rare enough thing in ontology world.


Returning to the EM at ICBO 2012, we see what we’d expect to see. Most people that partook in the evaluation were interested and content, with some corresponding negative versions of these emotions. The ontology community has enough factions and, like most disciplines, enough less than good work, to cause dissatisfaction. I don’t think the ICBO audience will be very unusual in its responses to a conference’s talks; I suspect the emotional responses we’ve seen would be in line with what would be seen in a twitter feed for a conference. Being introduced to the notion of cognitive dissonance by my colleague Caroline jay was enlightening. People strive to reduce cognitive dissonance; if in attending a conference one decided, for example, that it was all rubbish one would realise one had made a profound mistake in attending. Plus, it’s a self-selecting audience of those who, on the whole, like ontologies, so overall the audience will be happy. It’s a pity, but absolutely necessary, that we haven’t discussed (or even analysed) the talks about which people showed particularly positive or negative responses, but that’s the ethical deal we made with participants. Finally, one has to be disturbed by the two participants that used the “sexual pleasure” tags in two of the presentations – one shudders to think.

Patterns of bioinformatics software and database usage

September 27, 2014


I published a blog on the rise and rise of the Gene Ontology. This described my Ph.D. student Geraint Duck’s work on bioNerDS, a named entity recogniser for bioinformatics databases and software. In a survey of Genome Biology and BMC Bioinformatics full text articles we saw that the Gene Ontology is in the top ten of mentioned resources (a fact reflected in our survey of the whole of 2013’s PMC). This interesting survey was, however, a bit of a side-show to our goal of trying to extract descriptions of bioinformatics and computational biology method from text. Geraint has just presented a paper at ECCB 2014 called:


Geraint Duck, Goran Nenadic, Andy Brass, David L. Robertson, and Robert Stevens. Extracting patterns of database and software usage from the bioinformatics literature. Bioinformatics, 30(17):i601-i608, 2014.


That has edged towards our ultimate goal of extracting bioinformatics and computational method from text. Ideally this would be in a form that people wishing to use bioinformatics tools and data to analyse their data could consult a resource of methods and see what was commonly done, how and with what it was done, what’s the latest method for data, who’s done each method and so on and so on.


Geraint’s paper presents some networks of interacting bioinformatics software and databases that shows patterns of commonly occurring pairs of resources appearing in 22,418 papers from the 2013 PMC corpus that had the MeSH term “Bioinformatics” as a tag. When assembled into a network, there are things that look remarkably like methods, though they are not methods that necessarily appear in any one individual paper. What Geraint did in the ECCB paper was:


  1. Take the results of his bioNerDS survey of the articles in PMC 2013 labelled with the MeSH term “Bioinformatics”.
  2. Removed all resources that were only mentioned once (as they probably don’t really reflect “common” method).
  3. Filter the papers down to method sections.
  4. Get all the pairs of adjacent resources.
  5. Assuming the most used ordering (“Software A takes data from Database B” or “Data from Database B is put into Software A”), we used a binomial test to find the dominant ordering and assumed that was the correct ordering (our manually sampled and tested pairs suggests this is the case).
  6. Resources were labelled as to whether they are software or a database. A network is constructed by joining the remaining pairs together.

The paper gives the details of our method for constructing patterns of usage and describes the evaluations of each part of the method’s outputs.


Some pictures taken from the paper of these networks created from assembling these ordered pairs of bioinformatics resources are:


Figure 1 A network formed from the software recovered by bioNerDS at the 95% confidence level


This shows the network with only bioinformatics software. In Figure 1 we can see a central set of sequence alignment tools, split into homologue search, multiple sequence alignment and pairwise sequence alignment tools), which reflects the status of these core, basic techniques in bioinformatics based analyses. Feeding into this are sequence assembly, gene locator and mass spectroscopy tools. Out of the sequence analysis tools come proteomic tools, phylogeny tools and then some manual alignment tools. Together these look like a pipeline of core bioinformatics tasks, orientated around what we may call “bioinformatics 101” – it’s the core, vital tasks that many biologists and bioinformaticians undertake to analyse their data.


The next picture shows a network created from both bioinformatics software and databases. Putting in both software and databases in Figure 2, we can see what the datasets are “doing” in the pipelines above: UniProt and GEO are putting things into BLAST; GenBank links into multiple sequence alignment tools; PDB links into various sequence prediction and evaluation tools.


Figure 2 A network formed from the bioinformatics and database recovered by bioNerDS at the 95% confidence level


Finally, we have the same network of bioinformatics software and databases, but with the Gene Ontology node (which we count as a database) highlighted.


Figure 3 The same network of bioinformatics software and databases, but with the Gene Ontology and its associates highlighted.


In another blog I spoke about the significance of the Gene Ontology, as recorded by bioNerDS, and this work also highlights this point. In this network we’re seeing GO as a “data sink”, it’s where data goes, not where it comes from – presumably as it is playing its role in annotation tasks. However, its role in annotation tasks, as well as a way of retrieving data, fits sensibly with what we’ve seen in this work. It may well be that we need a more detailed analysis of the language to pick up and distinguish where GO is used as a means of getting a selection of sequences one wants for an analysis – or to find out if people do report this activity. Again we see GO with a central role in bioinformatics – a sort of confirmation of its appearance in the top flight of bioinformatics resource mentions in the whole PMC corpus.


What are we seeing here? We are not extracting methods from the text (and certainly not individual methods from individual papers). What we’ve extracted are patterns of usage as generalised over a corpus of text. What we can see, however, are things that look remarkably like bioinformatics and computational biology method. In particular, we see what we might call “bioinformatics 101” coming through very strongly. It’s the central dogma of bioinformatics… protein or nucleic acid sequences are taken from a database and then aligned. Geraint’s paper also looks at patterns over time – and we can see change. Assuming that this corpus of papers from PMC is a proxy for biology and bioinformatics as a whole and that, despite the well-known inadequacy of method reporting, the methods are a reasonable proxy for what is actually done, BioNerDS is offering a tool for looking at resources and their patterns of usage.

Learning about a domain from an ontology

June 20, 2014

One of the things I (and I think we collectively have done to a great extent) is forgotten about or neglected ontology as “tutorial”. We used to talk about this way back in TAMBIS days and others did so as well. The idea is that by looking at an ontology I can learn about a field of interest. Our idea in TAMBIS was that one should be able to look at the TAMBIS ontology and learn about the basics of molecular biology and an operational aspect of bioinformatics (though this exact idea was never explored or evaluated). Ontologies are often described as the “background” knowledge of a discipline; they contain the entities in a domain, their definitions, descriptions and inter-relatedness. From this, a “reader” of an ontology should be able to get some kind of understanding of a domain.

With an ontology, there are two ways I can learn about a field of interest: First, I can look at an ontology for that field, explore it and from that derive an understanding of how the entities of that field “work”; Second, I can write an ontology about that field and, in doing so, do the learning. This latter one only works for small topics or learning at a fairly superficial level. I’ve done this for heraldry; cloud nomenclature; anatomy of flowers; plate armour; galenic medicine; and a few others. This isn’t scalable; we can’t all write ontologies for a field of interest, just to learn about it. I have, however, found it a useful way to help myself structure my understanding, even if the resulting ontologies rarely, if ever, amount to very much at all (these have also largely been for fun and not an endeavour to drive some research).


Is this tutorial aspect of ontology going to give a full understanding? For most ontologies of which I’m aware, looking at that ontology will not act like a college course in that subject area. Looking at an ontology is more like looking at an encyclopaedia; it is a list of things and descriptions of those things, which is all an ontology is really trying to do. A so-called reference ontology can fit into this encyclopaedic role well; an application ontology should do so, but just for that application area. However, I should be able to look at an ontology or a collection of ontologies and get a decent overview of a domain.


Having said this, however, we can make quite a good encyclopaedia from an ontology or set of ontologies, especially if there are an adequate number of semantic relationships between entities, as well as good editorial and other metadata around those entities. I say “ontologies” as just having an encyclopaedia or ontology of molecular function (as an example) tells me what molecular functions there are and how they’re organised, but it doesn’t give me, as a learner, much of a biological context. This isn’t the fault of the ontology; I just need to look at a broader picture of biology to really learn anything. If I could ask questions such as “what molecular functions exist in the mitochondria of mammals and in what processes do they participate”, then I have something to work with (I suspect). There then, of course, remains the question of how all this information knowledge should be presented. I feel there’s mileage in a standard sort of encyclopaedic form, using the label (term), synonyms, natural language definitions,, together with the structure of the ontology to present something useful.


I’m still sort of taken with the idea of ontology as tutorial; I should be able to look at the ontologies from a field of interest and learn about that field of interest. It probably won’t be an in-depth learning; shallower even than that offered by the excellent resource Wikipedia, which can readily be used as an introduction to a subject area. However, I should be able to get a decent enough view of a field of interest from its ontologies that I can structure my learning from other resources.

The Software Ontology (SWO)

June 19, 2014

Our paper on the Software Ontology (SWO) has just been published in the Journal of Biomedical Semantics (JBMS) thematic issue on ontologies. The paper is:


James Malone, Andy Brown, Allyson Lister, Jon Ison, Duncan Hull, Helen Parkinson, and Robert Stevens. The software ontology (swo): a resource for reproducibility in biomedical data analysis, curation and digital preservation. Journal of Biomedical Semantics, 5(1):25, 2014.


There’s also a lot of information about how we went about making the SWO at the SWO blog.


We now have a range of bio-ontologies covering sequences, gene products, their functions, the processes in which they participate, cellular and gross anatomy, to diseases and phenotypes. These are primarily used to describe the entities in the masses of data biology now produces. More recently, there’s been work on describing the investigations by which these data were produced and analysed; the SWO fits into the ontology landscape at this location. The data is just a load of stuff; we detect things in these datasets with some software and the provenance trail of how these entities were detected needs to include the software that was used.


The SWO describes software, the software suites of which it is a part, its inputs and outputs, the tasks it supports, its versions, licencing its interface, and its developers. It doesn’t capture the hardware upon which the software runs, the software’s dependencies, cost of ownership (not the price in lucre, but does it need a lot of sys admin kind of thing), software architecture… (see the paper and blog for more)


The scope of the SWO is thus wide and we could have included a whole lot more than we did; much of the stuff not included is important and useful, but resources are scarce and some of the features, like the hardware, is v hard to represent. One of the major problems in writing an ontology is scope and mission creep – how do we stop modelling the world and spending inordinate amounts of time on pathological edge cases? To help us in this we used some Agile techniques in producing the SWO. Perhaps the most useful was the “planning poker” and “buy a feature” games we played. In the SWO project we used a bunch of stakeholders to help us out and the use of these techniques in the SWO went something like this:


  1. We did the usual thing of asking for competency questions (which play the role of user stories); clustering them and drawing out a set of features that needed to be modelled.
  2. For the planning poker, we asked people to estimate the effort needed to represent the feature on a numeric scale. Here the trick is that everyone has cards with notional costs written upon them. All cards are held up simultaneously to prevent bias from the first to reveal his or her card. Discussion ensues and a consensus effort for the ontological feature is decided upon.
  3. We then did the same thing for choosing a feature. Depending on the values for effort an amount of “money” is calculated and distributed evenly amongst the stakeholders; there is not enough money to buy everything. Each feature has a cost and each stakeholder can spend his or her money on the features he or she thinks most important. negotiating and so on takes place and features to be modelled are either bought or not bought.

This actually worked well and produced a list of prioritised SWO features. We didn’t do it often enough, as priorities and cost estimations change, but features to be modelled could be seen to be changed on one iteration of the planning. In the SWO we think this technique struck a good balance between what was needed and what was achieveable.


We also needed to add content for these features to the SWO. In the first round this was driven by what our customers needed – this was largely, but not exclusively, the EBI’s Gene Expression Atlas. Later on, we’ve been a bit more systematic about what to put into the SWO. Using a named entity recogniser for bioinformatics software and databases (BioNERDS) we’ve done a survey of all PMC for mentions of said bioinformatics databases and software. We pulled out the top 50 of these software mentions and we’re slowly ploughing our way through those (I’ve put this list at the end of this Blog).


The paper itself is one in the JBMS thematic series on ontologies; it does for ontologies what the NAR annual database issue does – describes, in this case, ontologies, their state of play and what updates have happened. This is what the SWO paper does. It has the motivation – we need to know how our data were produced and analysed and software plays a crucial role in this analysis. The paper describes what features were bought by our stakeholders, how we axiomatised descriptions of these software features and outlines some of the more tricky modelling issues. My two favourite tricky bits were:


  1. Versions of software. The vast variety of versioning schemes is horrid to represent; we did it with individuals of the class “version name”representing a version for a given bit of software. These versions are linked to preceding and succeeding versions to support the obvious queries. It’s not beautiful, but works well enough.
  2. Licences for software. Again, this has to support the variety of the multitude of licences,but the interesting thing here is to be able to infer that, for instance, a bit of software is open source – the paper describes the axiom pattern to do this trick.



The paper also describes the SWO’s merger with EDAM, which has brought a lot of content into the SWO. The SWO is being used, and not just by the EBI (the paper has some examples) and will continue to grow. The SWO represents a complex field of human developed artefacts. In doing so the SWO team has very much taken a pragmatic approach in its representation. The SWO is already quite complex, but we have tried to avoid being too baroque.


Here’s the top 50 as produced by BioNERDS (it’s actually 49 and there’s a couple of glitches in this data, but it’s good enough)










Tree View


UCSC Genome Browser






































Microarray Suite




Exploring what authors actually do when creating an ontology

May 12, 2014


Following our qualitative investigation into the issues people have in using ontologies, we’ve been delving further into what a group of experienced ontology authors actually do when they’re creating an ontology. This is all part of a wider goal of exploring the HCI of ontology authoring so that we can gain a better understanding of how people go about authoring, their patterns of activity and when they need support in understanding the consequences of their actions in a cognitively and perceptually challenging situation. This is all part of the EPSRC funded “What if…?” project where we’re looking at “what if..?” questions – that is, “what happens if I add this axiom….?”; The work reported here has been done by Markel Vigo, Caroline jay and myself. In brief, what we’ve done is to:

  • Instrument a version of Protégé 4 so that all actions (button presses, edits, mouse movements etc.) are recorded and time-stamped in a log-file – this is Protégé for user studies (Protégé4US);
  • We used an eye-tracker to record at what authors were looking at while they were authoring their ontology;
  • Capture full screen and audio recordings;
  • Ask our participants to undertake a series of ontology authoring tasks.

The tasks we asked the authors to undertake were based on creating an ontology of potato varieties that described their cropping time, yield and culinary role. One of the tricks about this kind of experiment is to find examples that are reasonably accessible to a wide variety of authors; this example arrived because I was planting this year’s potatoes. In three stages, we asked authors to add the classes and restrictions necessary for each aspect of potato varieties, increasingly complex defined classes, and then to reason and look at the ontology.

The paper entitled “Protégé4US: Harvesting Ontology Authoring Data with Protégé” at the Human Semantic-Web Interaction (HSWI) workshop at ESWC 2014 is available and gives more details on the work, below I pick out a few high-lights. The pictures are generated from the log files, which enable re-construction of what the author has done, from button presses, mouse clicks, etc. to the OWL constructs used in the tasks.

For instance, the picture below reconstructs the authoring events visualised as a web diagram, where the thickness of the arrow indicates a higher frequency of transitions between events (circles indicate reflexive transitions). For this particular user, we can observe how some interesting transitions stand out:

  • The expansion of a class hierarchy is followed by another expansion; similarly, the selection of an entity is followed by another selection. This suggests that users are drilling down the hierarchy; also, they click on classes to view their description.
  • The reasoner is invoked when entities have been modified. For instance, adding a property to a class or making a class defined is often followed by saving the file and invoking the reasoner.

The time diagram below shows a complementary visualisation of the same participant, where the Y-axis indicates the event and the X-axis is the time elapsed in minutes. The blue blocks denote the time in between events and the red dots are mouse events such as mouse hovers or mouse clicks. These have been plotted as well so that we know when there is user activity.


In the strategies described above we said that users click on classes to view their descriptions. This is a hypothesis supported by our preliminary data analysis and by our observations. Eye-tracking data will accurately shed light on what users do during the periods of interaction inactivity, especially in situations in which users are looking to the consequences of their actions in Protégé.

As well as the log-files, we also collected self-reported OWL and Protégé expertise. With this information we were able to explore, for example, correlations between expertise and task completion, time and task completion, number of actions and task completion time, the ngrams of UI actions, the Protégé tabs used, and, as described above, patterns of activity in what authors are doing as they create an ontology. More things are reported in the paper, but this rich recording of what authors do enables us to explore many aspects of authoring and suggests hypotheses for further investigation.

The HSWI paper shows the kinds of analysis it is feasible to do with a tool such as Protégé4US and the things we pull out in the paper are:

  • We identified two types of users based on how they use the tabs of Protégé;
  • Find correlates between interaction events and performance metrics that corroborates our initial insights: a higher number of times the reasoner is invoked and the class hierarchy is expanded indicates trouble and thus, longer completion times;
  • Visualise emerging activity patterns: e.g. an ontology is saved before invoking the reasoner and after modifying an entity.

This suggests that Protégé4US has potential to deliver data whose analysis will expand our knowledge about the ontology authoring process, identify its pitfalls, propose design guidelines and develop intelligent authoring tools that anticipate user actions in order to support ontology authoring in the future. Next comes more analysis and doing more interesting ontology authoring tasks and eventually looking at authors actually doing their ontology day jobs on the ontologies they actually create. There’s not much HCI around in the ontology and OWL field, especially in the evaluation of ontology tools and looking at what users actually do in fine detail. This work and Protégé4US is a first step in this direction.


Competence questions, user stories and testing

February 2, 2014

The notion of competency questions as a means of gathering requirements for and a means of evaluation of an ontology comes from a paper by Gruninger and Fox in 1994: “These requirements, which we call competency questions, are the basis for a rigorous characterization of the problems that the enterprise model is able to solve, providing a new approach to benchmarking as applied to enterprise modelling and business process engineering. Competency questions are the benchmarks in the sense that the enterprise model is necessary and sufficient to represent the tasks specified by the competency questions and their solution. They are also those tasks for which the enterprise model finds all and only the correct solutions. Tasks such as these can serve to drive the development of new theories and representations and also to justify and characterize the capabilities of existing theories for enterprise modelling.” And “We use a set of problems, which we call competency questions that serve to characterize the various ontologies and microtheories in our enterprise model. The microtheories must contain a necessary and sufficient set of axioms to represent and solve these questions, thus providing a declarative semantics for the system.” Here we read “enterprise model” as ontology (or, more correctly, that an enterprise model may have as a part an ontology, as a KR can have other thins than an ontology…).


Below you can see examples of what we gathered as competency questions during some Pizza tutorials. They mostly take the form of example questions:


  • Find me pizza with hot spicy peppers
  • What sorts of pizza base are there?
  • What vegetarian pizzas are there?
  • What pizzas are there with more than one type of cheese?
  • What kinds of pizza contain anchovy?



    What we usually do is to cluster these in several ways to find the major categories we need in the ontology; we also extract example class labels and so on. It also feeds into the abstractions; gathering together Vegetable, fish, meat as types of ingredient. The CQ can also pull out things like qualities of these ingredients – spiciness and so on. Usually there are many versions of the same kind of questions. A few examples are:


  • Find pizza with ingredient x
  • Find pizza with ingredient x, but not y
  • Find pizza without ingredient z
  • Find pizza with ingredient that has some quality or other


We can view the informal, natural language competency questions like user stories in agile software engineering techniques. We use a typical template for a user story:


As a role I want to do task for a benefit


Usually, “benefit” boils down to money. We can adapt the “five whys” technique for problem solving; ask why the role holder of the user story why they want the task (return on investment) and, when applied with some skill, one can get to a root justification for the user story. Often it is money, but sometimes users ask for edge cases – this is often especially true of ontology types – some fun, intricate or complex modelling or logic can ensue for no real return. I’ve done this kind of thing a bit and found it rather useful at weeding out spurious user stories, but also getting better justifications and thus higher priorities for a user story.


I’ll carry on this blog with the CQ


“Find pizza with anchovy but not capers”


We could take our example CQ and do the following (in the context of the Intelligent Pizza Finder):


“As a customer I wish to be able to find pizza that have anchovy but no capers, because I like anchovy and don’t like capers”


And abstract to


As a customer I want to find pizzas with and without certain ingredients to make it easier to choose the pizza I want.


The benefit here bottoms out in money (spending money on something that is actually desired), but goes through customer satisfaction through finding what pizza to buy with more ease. Such a user story tells me that my ontology must describe pizza in terms of their ingredients, and therefore have a description (hierarchy) of ingredients, as well as needing to close down descriptions of pizzas (a pizza has this and that, and only this and that (that is, no other). Other CQ user stories give me other requirements:


As a vegetarian customer I want to be able to ask for vegetarian pizzas, otherwise I won’t be able to eat anything.


This suggests I need abstractions over my ingredients. User stories can imply other stories; an epic user story can be broken down into smaller (in terms of effort) user stories and this would seem like a sensible thing to do. If CQ are thought of in terms of user stories, then one can bring in techniques of effort estimation and do some planning poker. We did this quite successfully in the Software Ontology.


In engineering, and especially Agile software engineering, these CQ or user stories also gives me some acceptance tests – those things by which we can test if the product is acceptable. A competence question obviously fits neatly into this – my ontology should be competent to answer this question. Acceptance tests are run against the software, with inputs and expected outputs; a user story is not complete until the acceptance test(s) is passed. For competence questions as acceptance tests, input data doesn’t really make sense, though results of the competence question do make sense as “output” data.


If we take a natural language CQ such as


Find me pizza with anchovy, but no capers


We may get a DL query like


Pizza and (hasTopping some AnchovyTopping) and (hasTopping only not CaperTopping) 


Which I can use as a test. I was stumped for a while about how not necessarily having any ontology and not knowing the answer makes running the test “before” and knowing whether it has passed or failed hard. However, it may all fall out easily enough (and may have been already done in some environments); here’s the scenario


  1. I have my query: Pizza and (hasTopping some AnchovyTopping) and (hasTopping only not CaperTopping) and no ontology; I’m setting up the ontology and a “test before” testing style.
  2. The test fails; I can pass the test by adding Pizza, hasTopping AnchovyTopping and CaperTopping to my currently empty ontology; the test passes in that the query is valid
  3. I also add pizzas to my test that I expect to be in the answer – NapolitanaPizza; again, the test fails
  4. I add NapolitanaPizza and the test is valid in that the entities are there in the ontology, but I need to add NapolitanaPizza as a subclass of Pizza for there to be any chance of a pass.
  5. I do the addition, but still the test fails; I need to re-factor to add the restrictions from NapolitanaPizza to its Ingredients (tomatotopping, CheeseTopping, Olivetopping, Anchovytopping and Capertopping)
  6. My test passes



I’m bouncing between the test itself passing a validity test and the ontology passing the test. It’s easier to see these as tests all working in a test after scenario, but it can work in a test before scenario, but seems a bit clunky. This could perhaps be sorted out in a sensible environment. I could even (given the right environment) mock up parts of the ontology and supply the query with some test data.


My example query does imply other test queries as it implies other bits of pizza ontology infra-structure. There’s an implication of a pizza hierarchy and an ingredients hierarchy. We’d want tests for these. Also, not all test need be DL Queries – we have SPRQL too (see, as an example, tests for annotations on entities below).


There are other kinds of test too:

  1. Non-logical tests – checks that all classes and properties have labels; there’s provenence information and so on.
  2. Tests that patterns are complied with – normalisation, for instance, could include test for trees of classes with pairwise disjoint siblings.
  3. Tests to check that classes could be traced back to some kind of top-level ontology class.
  4. Tests on up-to-dateness with imported portions of ontology (Chris Mungall and co describe continuous integration testing in GO).


Some or all of which can probably be done and are being done in some set-ups. However, as I pointed out in a recent blog about the new wave of ontology environments needs to make these types of testing as easy and as automatable and reproducible as it is in many software development environments.

Issues in authoring ontologies

January 31, 2014

We’ve recently had a short paper accepted to Computer Human Interaction (CHI) where we describe the outcomes from a qualitative study on what ontologists do when they author an ontology. The paper is called “Design Insights for the Next Wave Ontology Authoring Tools” and the motivation is a desire to understand how people currently go about authoring ontologies (the work was carried out by Markel Vigo in collaboration with Caroline Jay and me). Ultimately we want to understand how people judge the consequences of adding axioms so that we can support the answering of “what if…?” questions (and we’ll do this by creating models of ontology authoring). So, if I add some axiom to a large system of axioms what will be the consequences. As we work up to this, we’re doing studies where we record what people are doing as they author an ontology, logging all activities in the Protégé 4 environment, as well as capturing screen recordings and eye-tracking data.


This first study was a qualitative study where we asked 15 experienced ontology builders a series of open questions:


  • Can you describe the authoring tasks you perform?
  • How do you use tools to support you in these tasks?
  • What sort of problems do you encounter?


You can read the details of the method and analysis in the paper. We chose to do this study with experienced ontology authors as this will, in the fullness of time, inform us about how authoring takes place without any confounding factors such as not fully understanding ontologies, OWL or tools being used. Understanding issues faced by novices also needs to be done, but that’s for another time.


The 15 participants partition into three groups of five: Ontology researchers; ontology developers; and ontology curators. These, in turn, are CS types who do research on ontology and associated tools and techniques; ontology developers are CS types that work closely with domain experts to create ontologies; curators are those that have deep domain knowledge and maintain, what are often, large ontologies.


The tools participants use are (here I list those with numbers of users above one): Protégé (14 users), OWL API (6), OBO-Edit (4) and Bioportal (3). We didn’t choose participants by the tools they used; these are the tools that the people we talked to happened to use.


The analysis of the interviews revealed themes based on the major tasks undertaken by ontologists; the problems they encounter and the strategies they use to deal with these problems.


  1. Sense-making, exploration and searching: Knowing the state of the ontology, finding stuff, understanding how it’s all put together – “making sense” of an ontology.
  2. Ontology building: Efficiently adding axioms to an ontology en mass and effectively support what we called “definition orientated” ontology building.
  3. Reasoning: Size and complexity of ontologies hampering use of ontologies.
  4. Debugging: Finding faults and testing ontologies.
  5. Evaluation: is it a good thing?


The paper describes in more detail the strategies people use in these five themes. For instance, speeding up reasoning by restricting the ontology to a profile like OWL EL and using a fast reasoners like ELK; chopping up the ontology to make reasoning faster; relying on user feedback for evaluation; using repositories and search tools to find ontologies and re-use parts of them; using the OWL API to programmatically add axioms; and so on (the paper gives more of the strategies people use).


There will be other issues; there will be ones we may not have encountered through our participants and there will be important issues that were in our interviews, but may not have been common enough to appear in our analysis.


There may well be tools and techniques around that address many of the issues raised here (we’ve done some of them here in Manchester). However, this sample of ontology authors don’t use them. Even if tools that address these problems exist, are known about and work, they don’t work together in a way that ontology authors either can use or want to use. So, whilst we may have many useful tools and techniques, we don’t have the delivery of these techniques right. What we really need to build the new wave of ontology authoring environments are models of the authoring process. These will inform us about how the interactions between author and computational environment will work. This qualitative study is our first step on our way to elucidating such a model. The next study is looking at how experienced ontology authors undertake some basic ontology authoring tasks.

Manchester Advanced OWL tutorial: Family History

January 28, 2014

Manchester Family History Advanced OWL Tutorial

Dates: 27th/28th February 2014

Time: 10am – 5pm

Location: Room G306a Jean McFarlane Building, University of Manchester.

The Bio-Health Informatics Group at The University of Manchester invites you to participate in a newly developed OWL Ontology that covers more advanced language concepts for OWL.

The overall goal for this tutorial is to introduce the more advanced language concepts for OWL. This new tutorial builds on the Manchester Pizza Tutorial, by exploring OWL concepts in greater depth, concentrating on properties, property hierarchies, property features and individuals.


The topic of family history is used to take the tutee through various modelling issues and, in doing so, using many features of OWL 2 to build a Family History Knowledgebase (FHKB). The exercises involving the FHKB are designed to maximise inference about family history through use of an automated reasoner on an OWL knowledgebase (KB) containing many members of the Stevens family. The aim, therefore, is to enable people to learn advanced features of OWL 2 in a setting that involves both classes and individuals, while attempting to maximise the use of inference within the FHKB.


By the end of the tutorial you will be able to:

  1. Know about the separation of entities into TBox and ABox;
  2. Use classes and individuals in modelling;
  3. Write detailed class expressions;
  4. Assert facts about individuals;
  5. Use the effects of property hierarchies, property characteristics, domain/range constraints to drive inference;
  6. Use property characteristics and subproperty chains on inferences about individuals
  7. Understand and manage the consequences of the open world assumption in the TBox and ABox;
  8. Use nominals in class expressions;
  9. Appreciate some of the limits of OWL 2;
  10. Discover how many people in the Stevens family are called “James”.


The tutorial is led by Professor Robert Stevens and run by a team of experienced OWL users and researchers from Manchester.


Supplementary material for the tutorial can be found at:

The cost of the course is £250 per day.


Registration and Further Information

To register, please email Kieran O’Malley

( prior to February 21st 2014. Payment options will be returned to you following reservation. For further information please visit the website at:

Generating natural language from OWL and the uncanny valley

January 28, 2014

There is this phenomenon called the uncanny valley where, in situations like CGI, robotics etc., if as the human-like thing gets closer and closer to being human, but not quite human, then the human observer is sort of weirded or creeped out. If the human-like thing obviously isn’t human, say a simple cartoon, all is OK, but if it is almost human then people really don’t like it.


In our recent work on natural language generation (NLG) from OWL I’ve noticed a related phenomenon; readers aren’t weirded out by the generated and somewhat clunky English, but are irritated or piqued by it in a way they wouldn’t be by, for instance, some Manchester Syntax for the same axioms or some hand-crafted, but not perfect, text. The Manchester Syntax is further away from “naturalness”, but apparently less irritating – perhaps because the expectations are less. Manchester syntax for some axioms is easy enough to make a “correct” instance of what it is (Manchester Syntax); it’s not so easy to make natural language “natural”, but if we get close-ish, we’ve met a “valley”, perhaps of irritation. It’s not really an uncanny valley we’ve seen in our work with natural language generation from OWL ontologies, but when we generate sentences and paragraphs from OWL readers like the NLG form, but are caused irritation by English that is almost English, but not quite “natural” natural language. As we’ll see, this may be the nature of the errors; they’re basic errors in, for instance, the use of articles and plurals – not grammar fascism.


Doing NLG from an OWL axiom is sort of obvious; an axiom is a correlate of a sentence and we have nouns (and adjectives) in the form of classes and individuals, then properties (relationships) often do verby like things. A class or concept is the correlate of a paragraph; it’s what we want to say on a topic. So, we can take a set of axioms for classes from Snomed CT like


Class: Heart Disease

SubClassOf: (Disorder of Cardiovascular System) and (is-located-in some Heart Structure)


and similarly for hypertensive heart disease:


Class: Hypertensive heart disease

SubClassOf: (Heart Disease) and (is-associated-with some Hypertensive disorder)


And produce paragraphs like


A heart disease is a disorder of the cardiovascular system that is found in a heart structure.




A hypertensive heart disease is a heart disease that is associated with a hypertensive disorder.


These paragraphs are oK (produced by OntoVerbal), but are not “beautiful” English prose. In these cases, we’ve got the articles right etc, but it all seems a little plodding. There is some clunkiness that is a little irritating, but overall I think they’re pretty good and give a decent view on a set of axioms that can be fairly hard work to read. It is possible to produce better English, but at the cost of making a bespoke verbaliser for each ontology, especially for the “unpacking” of complex class labels to get articles and plurals correct; OntoVerbal is generic (though we did a little local fixing to help out with articles for Snomed classes). However, what we did do in OntoVerbal is to try and generate coherent, structured paragraphs of text for a class’ axioms. To get this coherence (rather than a set of sentences from unordered axioms for a class) we used rhetorical structure theory (RST) and mapped various types of OWL axioms to roles within RST. Example RST roles are evidence, motivation, contrast, elaboration, result, cause, condition, antithesis, alternative, list, concession and justification. these may be implicit within a text, but are often signalled by “discourse markers”; as “because” for evidence, “in order to” for enablement, “although” for antithesis, “but” for concession, “and” for list, “or” for alternative, etc You can see how we put all of this together in our IJACSA paper.



In the IJACSA paper we did an evaluation to look at the acceptability of these types of rendering and whether they were faithful enough to the OWL to allow “round-tripping” – that people experienced in OWL could take the verbalised OWL and re-produce the OWL axioms in Manchester Syntax. We also looked at the quality by comparing the machine verbalisations to human generated verbalisations. The argument is that a human produces good quality text (under the constraints given), so if the OntoVerbal text is similar to the human written text, then it should be of reasonable quality. Below are the OWL axioms from SNOMED for 10 classes. There are natural language verbalisations generated by a human (not in the project) and by OntoVerbal.




OWL input




pelvic structure SubClassOf: lower trunk structure

lower trunk structure SubClassOf: structure of subregion of the trunk

pelvic structure SubClassOf: the pelvis and lower extremities and the abdomen and the pelvis and lower trunk structure

A lower trunk structure is a kind of structure of subregion of the trunk. A more specialised kind of lower trunk structure is pelvic structure. Another relevant aspect of lower trunk structure is that a pelvic structure is defined as the pelvis and lower extremities, the abdomen and the pelvis and a lower trunk structure.

A lower trunk structure is a structure of the subregion of the trunk. A pelvic structure is a subtype of a lower trunk structure.


procedure on the brain SubClassOf: intracranial procedure

intracranial procedure SubClassOf: procedure on the central nervous system

intracranial procedure SubClassOf: procedure on the head

intracranial procedure EquivalentClass procedure by site and has a procedure site some intracranial structure

An intracranial procedure is a kind of procedure on the central nervous system and procedure on the head. A more specialised kind of intracranial procedure is procedure on the brain. Additionally, an intracranial procedure is defined as a procedure by site that has a procedure site in an intracranial structure.

An intracranial procedure is a procedure on the central nervous system and a procedure on the head. Any procedure by site in which the procedure site is an intracranial structure is also an intracranial procedure. A procedure on the brain is a subtype of intracranial procedure.


abdominal vascular structure SubClassOf: abdominal and pelvic vascular structure

abdominal and pelvic vascular structure SubClassOf: vascular structure of the trunk

abdominal vascular structure SubClassOf: abdominal structure and abdominal and pelvic vascular structure

An abdominal and pelvic vascular structure is a kind of vascular structure of the trunk. A more specialised kind of abdominal and pelvic vascular structure is abdominal vascular structure. Another relevant aspect of abdominal and pelvic vascular structure is that an abdominal vascular structure is defined as an abdominal structure and an abdominal and pelvic vascular structure.

An abdominal and pelvic vascular structure is a vascular structure of the trunk. An abdominal vascular structure is a subtype of an abdominal and pelvic vascular structure.


chronic disease of the genitourinary system SubClassOf: chronic disease

chronic disease of the genitourinary system SubClassOf: disorder of the genitourinary system

chronic hypertensive uraemia SubClassOf: chronic disease of the genitourinary system

chronic disease of the genitourinary system EquivalentClass: chronic disease and disorder of the genitourinary system and has a finding site some structure of the genitourinary system

Chronic disease of the genitourinary system is a kind of chronic disease and disorder of the genitourinary system. A more specialised kind of chronic disease of the genitourinary system is chronic hypertensive uraemia. Additionally, chronic disease of the genitourinary system is defined as chronic disease that is a disorder of the genitourinary system, and has a finding site in a structure of the genitourinary system.

A chronic disease of the genitourinary system is a chronic disease and a disorder of the genitourinary system. Any chronic disease which is also a disorder of the genitourinary system and is found in the structure of the genitourinary system is also a chronic disease of the genitourinary system. A chronic hypertensive uraemia is a subtype of a chronic disease of the genitourinary system.


finding of the head and the neck region SubClassOf: finding of the body region

head finding SubClassOf: finding of the head and the neck region

finding of the head and the neck region EquivalentClass: finding of the body region and has a finding site some head and neck structure

head finding EquivalentClass: finding of the head and the neck region and has a finding site some head structure

A finding of the head and the neck region is a kind of finding of the body region. A more specialised kind of finding of the head and the neck region is head finding. Additionally, A finding of the head and the neck region is defined as a finding of the body region that has a finding site in a head and neck structure. Another relevant aspect of finding of the head and the neck region is that a head finding is defined as a finding of the head and the neck region that has a finding site in a head structure.

A finding of the head and the neck region is a finding of the body region. Any finding of the body which is found in a head and neck structure is also a finding of the head and neck region. A head finding is a subtype of the finding of the head and the neck region.


nephrosclerosis SubClassOf: degenerative disorder

degenerative disorder SubClassOf: disease

arteriosclerotic vascular disease SubClassOf: degenerative disorder

degenerative disorder EquivalentClass: disease and has an associated morphology some degenerative abnormality

Degenerative disorder is a kind of disease. More specialised kinds of degenerative disorder are nephrosclerosis and arteriosclerotic vascular disease. Additionally, degenerative disorder is defined as disease that has an associated morphology in a degenerative abnormality.

A degenerative disorder is a disease. Any disease which has an associated morphology of degenerative abnormality is also a degenerative disease. Nephrosclerosis and arteriosclerotic vascular disease are subtypes of degenerative disease.


kidney graft material SubClassOf: urinary tract material

kidney graft material SubClassOf: solid organ graft material

kidney graft material SubClassOf: urinary tract material and solid organ graft material

transplant of the kidney EquivalentClass: kidney operation and solid organ transplant and renal replacement and has a method some surgical transplantation action and has a direct substance some kidney graft material and has an indirect procedure site some kidney structure

A kidney graft material is a kind of urinary tract material and solid organ graft material. Another relevant aspect of kidney graft material is that a transplant of the kidney is defined as a kidney operation that is a solid organ transplant, and is a renal replacement, and has a method in a surgical transplantation action, and has a direct substance in a kidney graft material, and has an indirect procedure site in a kidney structure.

Kidney graft material is a urinary tract material and a solid organ graft material. A kidney operation, solid organ transplant and renal replacement which has a method of surgical transplantation action, a direct substance of kidney graft material and an indirect procedure site of kidney structure is a type of transplant of the kidney.


graft SubClassOf: biological surgical material

tissue graft material SubClassOf: graft

tissue graft material SubClassOf: graft and body tissue surgical material

A graft is a kind of biological surgical material. A more specialised kind of graft is tissue graft material. Another relevant aspect of graft is that a tissue graft material is defined as a graft and a body tissue surgical material.

A graft is a biological surgical material. Tissue graft material is a subtype of graft as well as a body tissue surgical material.


benign essential hypertension complicating and/or reason for care during pregnancy SubClassOf: essential hypertension complicating and/or reason for care during pregnancy

essential hypertension complicating and/or reason for care during pregnancy SubClassOf: essential hypertension in the obstetric context

essential hypertension complicating and/or reason for care during pregnancy SubClassOf: pre-existing hypertension in the obstetric context

essential hypertension complicating and/or reason for care during pregnancy SubClassOf: essential hypertension in the obstetric context and pre-existing hypertension in the obstetric context

benign essential hypertension complicating and/or reason for care during pregnancy SubClassOf: benign essential hypertension in the obstetric context and essential hypertension complicating and/or reason for care during pregnancy

Essential hypertension complicating and/or reason for care during pregnancy is a kind of essential hypertension in the obstetric context and pre-existing hypertension in the obstetric context. A more specialised kind of essential hypertension complicating and/or reason for care during pregnancy is benign essential hypertension complicating and/or reason for care during pregnancy. Another relevant aspect of essential hypertension complicating and/or reason for care during pregnancy is that benign essential hypertension complicating and/or reason for care during pregnancy is defined as benign essential hypertension in the obstetric context and essential hypertension complicating and/or reason for care during pregnancy.

An essential hypertension complicating and/or reason for care during pregnancy is an essential hypertension in the obstetric context and a pre-existing hypertension in the obstetric context. A benign essential hypertension complicating and/or reason for care during pregnancy is a subtype of essential hypertension complicating and/or reason for during pregnancy.


procedure on artery of the abdomen SubClassOf: procedure on the abdomen

procedure on artery of the abdomen SubClassOf: procedure on artery of the thorax and the abdomen

abdominal artery implantation SubClassOf: procedure on artery of the abdomen

procedure on artery of the abdomen EquivalentClass: procedure on artery and has a procedure site some structure of artery of the abdomen

A procedure on artery of the abdomen is a kind of procedure on the abdomen and procedure on artery of the thorax and the abdomen. A more specialised kind of procedure on artery of the abdomen is abdominal artery implantation. Additionally, a procedure on artery of the abdomen is defined as a procedure on artery that has a procedure site in a structure of artery of the abdomen.

A procedure on artery of the abdomen is a procedure of the abdomen and a procedure on artery of the thorax and the abdomen. Any procedure on artery which has a procedure site of structure of artery of the abdomen is also a procedure on artery of the abdomen. An abdominal artery implantation is a subtype of procedure on artery of the abdomen.




You can see that the verbalisations are fairly similar. Given the task of being faithful to the OWL and enabling “round-tripping”, very similar texts are produced by a human and OntoVerbal; the machine and human verbalisations are of a very similar quality. Evaluators could use both to round-trip to the OWL axioms, but did better with the OntoVerbal generated axioms. This is, we think, at least in part due to OntoVerbal being more “complete” in its verbalisation. The human verbalisation is smoother, but presumably not as smooth as a description written by a human domain expert could be (though do look at James Malone’s blog on this topic). However, I suspect that such smooth, natural language texts would be much harder to match to a set of OWL axioms.


Where does this leave my uncanny valley or valley of irritation for generated natural language? Domain expert humans writing definitions without the constraint of being faithful to the ontology’s axioms will probably avoid the valley of irritation; if there’s too much “ontologising” in the verbalisation there will be irritation (this came up in another paper on an earlier verbalisation); if there’s clunky English there is irritation. In a generic tool like OntoVerbal this is probably inevitable and I suspect it’s irritating as these are minor English errors that are always irritating as they disrupt reading. However, the use of rST does seem to give OntoVerbal’s NLG verbalisations a good level of coherence and fluency, even if they’re not perfectly fluent. They are also cheap to produce…. As they are close to the OWL they give an alternative view on those axioms – one thing I’d like to find out is if a verbalised view is any better (or worse) at allowing error spotting – and whther it is the verbalisation or just the alternative that does the job). One could also provide a variety of verbalisations – hand-crafted, luxury ones; ones close to the OWL and ones with and without the often inpenetrable words used in ontologies (especially for relationships).

My first publication discovered

December 24, 2013

I’ve been poking around in the long-tail of my publications as gathered by Google Scholar. Within this I found the following little publication:


Five glycyl tRNA genes within the noc gene complex of Drosophila melanogaster.

YB Meng, RD Stevens, W Chia, S McGill, M Ashburner

Nucleic acids research 16 (14), 7189-7189 1988


And this must be me. I did my undergraduate biochemistry project with bill Chia and I sequenced, by hand, some tRNA genes in drosophila. This is the first I’ve known of this publication and it has made me happy – that my sausage like fingers clumsily squirting stuff around willy nilly in bill’s lab actually earnt me a name on the paper; it is a lovely thing to find.


This should be my opportunity to drone on about pouring polyacrylamide gels, doing dideoxy reactions, running gels, exposing autoradiograms, reading gels, etc etc., but that’s enough of that. I should also perhaps say that using the lab’s BBC microcomputer to run a programme over-night to find tRNA genes was the start of my interest in bioinformatics – but it wasn’t. It was, however,a continuation of an interest in what was then known as molecular biology; ultimately bioinformatics has been a way of carrying on that interest.