Archive for January, 2014

Issues in authoring ontologies

January 31, 2014

We’ve recently had a short paper accepted to Computer Human Interaction (CHI) where we describe the outcomes from a qualitative study on what ontologists do when they author an ontology. The paper is called “Design Insights for the Next Wave Ontology Authoring Tools” and the motivation is a desire to understand how people currently go about authoring ontologies (the work was carried out by Markel Vigo in collaboration with Caroline Jay and me). Ultimately we want to understand how people judge the consequences of adding axioms so that we can support the answering of “what if…?” questions (and we’ll do this by creating models of ontology authoring). So, if I add some axiom to a large system of axioms what will be the consequences. As we work up to this, we’re doing studies where we record what people are doing as they author an ontology, logging all activities in the Protégé 4 environment, as well as capturing screen recordings and eye-tracking data.

 

This first study was a qualitative study where we asked 15 experienced ontology builders a series of open questions:

 

  • Can you describe the authoring tasks you perform?
  • How do you use tools to support you in these tasks?
  • What sort of problems do you encounter?

 

You can read the details of the method and analysis in the paper. We chose to do this study with experienced ontology authors as this will, in the fullness of time, inform us about how authoring takes place without any confounding factors such as not fully understanding ontologies, OWL or tools being used. Understanding issues faced by novices also needs to be done, but that’s for another time.

 

The 15 participants partition into three groups of five: Ontology researchers; ontology developers; and ontology curators. These, in turn, are CS types who do research on ontology and associated tools and techniques; ontology developers are CS types that work closely with domain experts to create ontologies; curators are those that have deep domain knowledge and maintain, what are often, large ontologies.

 

The tools participants use are (here I list those with numbers of users above one): Protégé (14 users), OWL API (6), OBO-Edit (4) and Bioportal (3). We didn’t choose participants by the tools they used; these are the tools that the people we talked to happened to use.

 

The analysis of the interviews revealed themes based on the major tasks undertaken by ontologists; the problems they encounter and the strategies they use to deal with these problems.

 

  1. Sense-making, exploration and searching: Knowing the state of the ontology, finding stuff, understanding how it’s all put together – “making sense” of an ontology.
  2. Ontology building: Efficiently adding axioms to an ontology en mass and effectively support what we called “definition orientated” ontology building.
  3. Reasoning: Size and complexity of ontologies hampering use of ontologies.
  4. Debugging: Finding faults and testing ontologies.
  5. Evaluation: is it a good thing?

 

The paper describes in more detail the strategies people use in these five themes. For instance, speeding up reasoning by restricting the ontology to a profile like OWL EL and using a fast reasoners like ELK; chopping up the ontology to make reasoning faster; relying on user feedback for evaluation; using repositories and search tools to find ontologies and re-use parts of them; using the OWL API to programmatically add axioms; and so on (the paper gives more of the strategies people use).

 

There will be other issues; there will be ones we may not have encountered through our participants and there will be important issues that were in our interviews, but may not have been common enough to appear in our analysis.

 

There may well be tools and techniques around that address many of the issues raised here (we’ve done some of them here in Manchester). However, this sample of ontology authors don’t use them. Even if tools that address these problems exist, are known about and work, they don’t work together in a way that ontology authors either can use or want to use. So, whilst we may have many useful tools and techniques, we don’t have the delivery of these techniques right. What we really need to build the new wave of ontology authoring environments are models of the authoring process. These will inform us about how the interactions between author and computational environment will work. This qualitative study is our first step on our way to elucidating such a model. The next study is looking at how experienced ontology authors undertake some basic ontology authoring tasks.

Advertisements

Manchester Advanced OWL tutorial: Family History

January 28, 2014

Manchester Family History Advanced OWL Tutorial

Dates: 27th/28th February 2014

Time: 10am – 5pm

Location: Room G306a Jean McFarlane Building, University of Manchester.

The Bio-Health Informatics Group at The University of Manchester invites you to participate in a newly developed OWL Ontology that covers more advanced language concepts for OWL.

The overall goal for this tutorial is to introduce the more advanced language concepts for OWL. This new tutorial builds on the Manchester Pizza Tutorial, by exploring OWL concepts in greater depth, concentrating on properties, property hierarchies, property features and individuals.

 

The topic of family history is used to take the tutee through various modelling issues and, in doing so, using many features of OWL 2 to build a Family History Knowledgebase (FHKB). The exercises involving the FHKB are designed to maximise inference about family history through use of an automated reasoner on an OWL knowledgebase (KB) containing many members of the Stevens family. The aim, therefore, is to enable people to learn advanced features of OWL 2 in a setting that involves both classes and individuals, while attempting to maximise the use of inference within the FHKB.

 

By the end of the tutorial you will be able to:

  1. Know about the separation of entities into TBox and ABox;
  2. Use classes and individuals in modelling;
  3. Write detailed class expressions;
  4. Assert facts about individuals;
  5. Use the effects of property hierarchies, property characteristics, domain/range constraints to drive inference;
  6. Use property characteristics and subproperty chains on inferences about individuals
  7. Understand and manage the consequences of the open world assumption in the TBox and ABox;
  8. Use nominals in class expressions;
  9. Appreciate some of the limits of OWL 2;
  10. Discover how many people in the Stevens family are called “James”.

 

The tutorial is led by Professor Robert Stevens and run by a team of experienced OWL users and researchers from Manchester.

 

Supplementary material for the tutorial can be found at: http://owl.cs.manchester.ac.uk/publications/talks-and-tutorials/fhkbtutorial/

The cost of the course is £250 per day.

 

Registration and Further Information

To register, please email Kieran O’Malley

(kieran.omalley@manchester.ac.uk) prior to February 21st 2014. Payment options will be returned to you following reservation. For further information please visit the website at:

http://owl.cs.manchester.ac.uk/

Generating natural language from OWL and the uncanny valley

January 28, 2014

There is this phenomenon called the uncanny valley where, in situations like CGI, robotics etc., if as the human-like thing gets closer and closer to being human, but not quite human, then the human observer is sort of weirded or creeped out. If the human-like thing obviously isn’t human, say a simple cartoon, all is OK, but if it is almost human then people really don’t like it.

 

In our recent work on natural language generation (NLG) from OWL I’ve noticed a related phenomenon; readers aren’t weirded out by the generated and somewhat clunky English, but are irritated or piqued by it in a way they wouldn’t be by, for instance, some Manchester Syntax for the same axioms or some hand-crafted, but not perfect, text. The Manchester Syntax is further away from “naturalness”, but apparently less irritating – perhaps because the expectations are less. Manchester syntax for some axioms is easy enough to make a “correct” instance of what it is (Manchester Syntax); it’s not so easy to make natural language “natural”, but if we get close-ish, we’ve met a “valley”, perhaps of irritation. It’s not really an uncanny valley we’ve seen in our work with natural language generation from OWL ontologies, but when we generate sentences and paragraphs from OWL readers like the NLG form, but are caused irritation by English that is almost English, but not quite “natural” natural language. As we’ll see, this may be the nature of the errors; they’re basic errors in, for instance, the use of articles and plurals – not grammar fascism.

 

Doing NLG from an OWL axiom is sort of obvious; an axiom is a correlate of a sentence and we have nouns (and adjectives) in the form of classes and individuals, then properties (relationships) often do verby like things. A class or concept is the correlate of a paragraph; it’s what we want to say on a topic. So, we can take a set of axioms for classes from Snomed CT like

 

Class: Heart Disease

SubClassOf: (Disorder of Cardiovascular System) and (is-located-in some Heart Structure)

 

and similarly for hypertensive heart disease:

 

Class: Hypertensive heart disease

SubClassOf: (Heart Disease) and (is-associated-with some Hypertensive disorder)

 

And produce paragraphs like

 

A heart disease is a disorder of the cardiovascular system that is found in a heart structure.

 

and

 

A hypertensive heart disease is a heart disease that is associated with a hypertensive disorder.

 

These paragraphs are oK (produced by OntoVerbal), but are not “beautiful” English prose. In these cases, we’ve got the articles right etc, but it all seems a little plodding. There is some clunkiness that is a little irritating, but overall I think they’re pretty good and give a decent view on a set of axioms that can be fairly hard work to read. It is possible to produce better English, but at the cost of making a bespoke verbaliser for each ontology, especially for the “unpacking” of complex class labels to get articles and plurals correct; OntoVerbal is generic (though we did a little local fixing to help out with articles for Snomed classes). However, what we did do in OntoVerbal is to try and generate coherent, structured paragraphs of text for a class’ axioms. To get this coherence (rather than a set of sentences from unordered axioms for a class) we used rhetorical structure theory (RST) and mapped various types of OWL axioms to roles within RST. Example RST roles are evidence, motivation, contrast, elaboration, result, cause, condition, antithesis, alternative, list, concession and justification. these may be implicit within a text, but are often signalled by “discourse markers”; as “because” for evidence, “in order to” for enablement, “although” for antithesis, “but” for concession, “and” for list, “or” for alternative, etc You can see how we put all of this together in our IJACSA paper.

 

 

In the IJACSA paper we did an evaluation to look at the acceptability of these types of rendering and whether they were faithful enough to the OWL to allow “round-tripping” – that people experienced in OWL could take the verbalised OWL and re-produce the OWL axioms in Manchester Syntax. We also looked at the quality by comparing the machine verbalisations to human generated verbalisations. The argument is that a human produces good quality text (under the constraints given), so if the OntoVerbal text is similar to the human written text, then it should be of reasonable quality. Below are the OWL axioms from SNOMED for 10 classes. There are natural language verbalisations generated by a human (not in the project) and by OntoVerbal.

 

 

Number

OWL input

OntoVerbal

Ontologist

1

pelvic structure SubClassOf: lower trunk structure

lower trunk structure SubClassOf: structure of subregion of the trunk

pelvic structure SubClassOf: the pelvis and lower extremities and the abdomen and the pelvis and lower trunk structure

A lower trunk structure is a kind of structure of subregion of the trunk. A more specialised kind of lower trunk structure is pelvic structure. Another relevant aspect of lower trunk structure is that a pelvic structure is defined as the pelvis and lower extremities, the abdomen and the pelvis and a lower trunk structure.

A lower trunk structure is a structure of the subregion of the trunk. A pelvic structure is a subtype of a lower trunk structure.

2

procedure on the brain SubClassOf: intracranial procedure

intracranial procedure SubClassOf: procedure on the central nervous system

intracranial procedure SubClassOf: procedure on the head

intracranial procedure EquivalentClass procedure by site and has a procedure site some intracranial structure

An intracranial procedure is a kind of procedure on the central nervous system and procedure on the head. A more specialised kind of intracranial procedure is procedure on the brain. Additionally, an intracranial procedure is defined as a procedure by site that has a procedure site in an intracranial structure.

An intracranial procedure is a procedure on the central nervous system and a procedure on the head. Any procedure by site in which the procedure site is an intracranial structure is also an intracranial procedure. A procedure on the brain is a subtype of intracranial procedure.

3

abdominal vascular structure SubClassOf: abdominal and pelvic vascular structure

abdominal and pelvic vascular structure SubClassOf: vascular structure of the trunk

abdominal vascular structure SubClassOf: abdominal structure and abdominal and pelvic vascular structure

An abdominal and pelvic vascular structure is a kind of vascular structure of the trunk. A more specialised kind of abdominal and pelvic vascular structure is abdominal vascular structure. Another relevant aspect of abdominal and pelvic vascular structure is that an abdominal vascular structure is defined as an abdominal structure and an abdominal and pelvic vascular structure.

An abdominal and pelvic vascular structure is a vascular structure of the trunk. An abdominal vascular structure is a subtype of an abdominal and pelvic vascular structure.

4

chronic disease of the genitourinary system SubClassOf: chronic disease

chronic disease of the genitourinary system SubClassOf: disorder of the genitourinary system

chronic hypertensive uraemia SubClassOf: chronic disease of the genitourinary system

chronic disease of the genitourinary system EquivalentClass: chronic disease and disorder of the genitourinary system and has a finding site some structure of the genitourinary system

Chronic disease of the genitourinary system is a kind of chronic disease and disorder of the genitourinary system. A more specialised kind of chronic disease of the genitourinary system is chronic hypertensive uraemia. Additionally, chronic disease of the genitourinary system is defined as chronic disease that is a disorder of the genitourinary system, and has a finding site in a structure of the genitourinary system.

A chronic disease of the genitourinary system is a chronic disease and a disorder of the genitourinary system. Any chronic disease which is also a disorder of the genitourinary system and is found in the structure of the genitourinary system is also a chronic disease of the genitourinary system. A chronic hypertensive uraemia is a subtype of a chronic disease of the genitourinary system.

5

finding of the head and the neck region SubClassOf: finding of the body region

head finding SubClassOf: finding of the head and the neck region

finding of the head and the neck region EquivalentClass: finding of the body region and has a finding site some head and neck structure

head finding EquivalentClass: finding of the head and the neck region and has a finding site some head structure

A finding of the head and the neck region is a kind of finding of the body region. A more specialised kind of finding of the head and the neck region is head finding. Additionally, A finding of the head and the neck region is defined as a finding of the body region that has a finding site in a head and neck structure. Another relevant aspect of finding of the head and the neck region is that a head finding is defined as a finding of the head and the neck region that has a finding site in a head structure.

A finding of the head and the neck region is a finding of the body region. Any finding of the body which is found in a head and neck structure is also a finding of the head and neck region. A head finding is a subtype of the finding of the head and the neck region.

6

nephrosclerosis SubClassOf: degenerative disorder

degenerative disorder SubClassOf: disease

arteriosclerotic vascular disease SubClassOf: degenerative disorder

degenerative disorder EquivalentClass: disease and has an associated morphology some degenerative abnormality

Degenerative disorder is a kind of disease. More specialised kinds of degenerative disorder are nephrosclerosis and arteriosclerotic vascular disease. Additionally, degenerative disorder is defined as disease that has an associated morphology in a degenerative abnormality.

A degenerative disorder is a disease. Any disease which has an associated morphology of degenerative abnormality is also a degenerative disease. Nephrosclerosis and arteriosclerotic vascular disease are subtypes of degenerative disease.

7

kidney graft material SubClassOf: urinary tract material

kidney graft material SubClassOf: solid organ graft material

kidney graft material SubClassOf: urinary tract material and solid organ graft material

transplant of the kidney EquivalentClass: kidney operation and solid organ transplant and renal replacement and has a method some surgical transplantation action and has a direct substance some kidney graft material and has an indirect procedure site some kidney structure

A kidney graft material is a kind of urinary tract material and solid organ graft material. Another relevant aspect of kidney graft material is that a transplant of the kidney is defined as a kidney operation that is a solid organ transplant, and is a renal replacement, and has a method in a surgical transplantation action, and has a direct substance in a kidney graft material, and has an indirect procedure site in a kidney structure.

Kidney graft material is a urinary tract material and a solid organ graft material. A kidney operation, solid organ transplant and renal replacement which has a method of surgical transplantation action, a direct substance of kidney graft material and an indirect procedure site of kidney structure is a type of transplant of the kidney.

8

graft SubClassOf: biological surgical material

tissue graft material SubClassOf: graft

tissue graft material SubClassOf: graft and body tissue surgical material

A graft is a kind of biological surgical material. A more specialised kind of graft is tissue graft material. Another relevant aspect of graft is that a tissue graft material is defined as a graft and a body tissue surgical material.

A graft is a biological surgical material. Tissue graft material is a subtype of graft as well as a body tissue surgical material.

9

benign essential hypertension complicating and/or reason for care during pregnancy SubClassOf: essential hypertension complicating and/or reason for care during pregnancy

essential hypertension complicating and/or reason for care during pregnancy SubClassOf: essential hypertension in the obstetric context

essential hypertension complicating and/or reason for care during pregnancy SubClassOf: pre-existing hypertension in the obstetric context

essential hypertension complicating and/or reason for care during pregnancy SubClassOf: essential hypertension in the obstetric context and pre-existing hypertension in the obstetric context

benign essential hypertension complicating and/or reason for care during pregnancy SubClassOf: benign essential hypertension in the obstetric context and essential hypertension complicating and/or reason for care during pregnancy

Essential hypertension complicating and/or reason for care during pregnancy is a kind of essential hypertension in the obstetric context and pre-existing hypertension in the obstetric context. A more specialised kind of essential hypertension complicating and/or reason for care during pregnancy is benign essential hypertension complicating and/or reason for care during pregnancy. Another relevant aspect of essential hypertension complicating and/or reason for care during pregnancy is that benign essential hypertension complicating and/or reason for care during pregnancy is defined as benign essential hypertension in the obstetric context and essential hypertension complicating and/or reason for care during pregnancy.

An essential hypertension complicating and/or reason for care during pregnancy is an essential hypertension in the obstetric context and a pre-existing hypertension in the obstetric context. A benign essential hypertension complicating and/or reason for care during pregnancy is a subtype of essential hypertension complicating and/or reason for during pregnancy.

10

procedure on artery of the abdomen SubClassOf: procedure on the abdomen

procedure on artery of the abdomen SubClassOf: procedure on artery of the thorax and the abdomen

abdominal artery implantation SubClassOf: procedure on artery of the abdomen

procedure on artery of the abdomen EquivalentClass: procedure on artery and has a procedure site some structure of artery of the abdomen

A procedure on artery of the abdomen is a kind of procedure on the abdomen and procedure on artery of the thorax and the abdomen. A more specialised kind of procedure on artery of the abdomen is abdominal artery implantation. Additionally, a procedure on artery of the abdomen is defined as a procedure on artery that has a procedure site in a structure of artery of the abdomen.

A procedure on artery of the abdomen is a procedure of the abdomen and a procedure on artery of the thorax and the abdomen. Any procedure on artery which has a procedure site of structure of artery of the abdomen is also a procedure on artery of the abdomen. An abdominal artery implantation is a subtype of procedure on artery of the abdomen.

 

 

 

You can see that the verbalisations are fairly similar. Given the task of being faithful to the OWL and enabling “round-tripping”, very similar texts are produced by a human and OntoVerbal; the machine and human verbalisations are of a very similar quality. Evaluators could use both to round-trip to the OWL axioms, but did better with the OntoVerbal generated axioms. This is, we think, at least in part due to OntoVerbal being more “complete” in its verbalisation. The human verbalisation is smoother, but presumably not as smooth as a description written by a human domain expert could be (though do look at James Malone’s blog on this topic). However, I suspect that such smooth, natural language texts would be much harder to match to a set of OWL axioms.

 

Where does this leave my uncanny valley or valley of irritation for generated natural language? Domain expert humans writing definitions without the constraint of being faithful to the ontology’s axioms will probably avoid the valley of irritation; if there’s too much “ontologising” in the verbalisation there will be irritation (this came up in another paper on an earlier verbalisation); if there’s clunky English there is irritation. In a generic tool like OntoVerbal this is probably inevitable and I suspect it’s irritating as these are minor English errors that are always irritating as they disrupt reading. However, the use of rST does seem to give OntoVerbal’s NLG verbalisations a good level of coherence and fluency, even if they’re not perfectly fluent. They are also cheap to produce…. As they are close to the OWL they give an alternative view on those axioms – one thing I’d like to find out is if a verbalised view is any better (or worse) at allowing error spotting – and whther it is the verbalisation or just the alternative that does the job). One could also provide a variety of verbalisations – hand-crafted, luxury ones; ones close to the OWL and ones with and without the often inpenetrable words used in ontologies (especially for relationships).