We’ve continued our work investigating the human-computer interaction of authoring an ontology. We had a couple of papers last year looking at some qualitative aspects of ontology authoring through interviews with experienced ontologists. We wanted to follow this up with quantitative work looking at the activities during the addition of axioms when authoring an ontology. I’m pleased to say we’ve just had a long paper accepted for CHI 2015 with the following details:
Markel Vigo, Caroline Jay and Robert Stevens. Constructing Conceptual Knowledge Artefacts Activity Patterns in the Ontology Authoring Process. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems: CHI 2015; 18 Apr 2015-24 Apr 2015; Seoul, Korea.
I reported some early work in this quantitative study. In this latest work we’ve taken the following approach:
- We’ve instrumented a version of Protégé 4.3 (P4) to record every keystroke, mouse click and so on in a time-stamped log file (it’s called Protégé4US – the “Us” is for “user studies”). We divided the events into interaction events (interacting with the ontology and its axioms via the class and property hierarchies and the axiom description window), authoring events (typing an axiom, class declaration, etc.) environment events (invoking the reasoner, getting an explanation, etc.).
- We had experienced ontology authors perform a series of tasks to build an ontology of potatoes. Three tasks of increasing difficulty involving making various defined classes over descriptions of some 15 potato varieties, the creation of which was also part of the tasks.
- Whilst this happened we recorded what was happening on the screen.
- Finally, we recorded eye-tracking data as to where the author’s gaze fell during the ontology authoring.
In capturing eye-tracking data, the screen of Protégé4US is divided up into areas of interest as shown below. This picture shows the main view as an area of interest; other views involve classes, properties and individuals and these have their own areas of interest defined. These AOI are used to determine the dwell time of eye gaze during the tasks.
The patterns of ontology authoring activity we found were:
- An exploration cycle. The asserted class hierarchy is expanded after ontology loading – in over 31% of the time an expansion is followed by an expansion as users appear to familiarise themselves with the structure of the ontology. Eventually, this behaviour appears to become directed as an author chooses a class to edit. In contrast, the expansion of the inferred class hierarchy appears to be more exploratory as the authors check what has happened post reasoning, perhaps answering the question “have I found all the changes?”.
- An editing cycle. Here an entity is selected, followed by selection of another entity 37% of the time or selection of the description area 29% of the time. Once selected, a description will be modified 63% of the time and followed by selection of another entity 59% of the time. This looks like selecting an entity, inspecting its description and then either editing it or moving on to another entity, each decision based on the content of the description.
- A reasoning cycle. Just prior to the reasoner being invoked, the ontology is saved 40% of the time; a defined class is created (17%). After the reasoner is run, 41% of the time participants observe the change on the asserted class hierarchy and then look at a description where the effects of reasoning can be seen. The inferred class hierarchy is inspected post-reasoning 30% of the time, which is again followed by the expansion of the hierarchy 43% of the times.
These activity patterns are shown in the following pictures.
Overall, we can see the following flow of events:
- Initial exploration of the ontology.
- A burst of exploration coupled with editing.
- Reasoning followed by exploration.
An activity pattern is a common sequence of events. The details of our analysis that led to these activity patterns are in the paper, but some of the pretty pictures and the basic analysis steps that gave us these patterns are below.
This is a simple log plot of the number of each type of event recorded across all participants. The top three events – entity selected, description selected and edit entity:start account for 54% of events. Interaction events account for 65% of events, authoring events for 30% and environment events for 5%. There’s a lot of a few events and interaction with P4 accounts for most things.
This picture shows the N-grams of consecutive events. We can see lots of events like expanding the class hierarchy (either asserted or inferred) occurring many times one after the other, indicating people moving down through the hierarchy – the class hierarchy seems to be a centre of interaction – looking for classes to edit and checking for the effects of reasoning.
Those are the events themselves, but what happens after each event. Below there is a plot of transitions from event to event (note the circles around to the same event and the thickness of the lines indicating the likelihood of the event occurring). A matrix of number of transitions from event to event gives a fingerprint for each user. We see that the fingerprints revealed by these transitions from state to state are the same within individuals for each task; that is, each task is operationalised in P4 in the same way.
The inter-user similarity is also high, suggesting patterns of events (though there is also evidence of some different styles here too). What is below is a 16×16 matrix showing the correlation of the fingerprints (i.e. transition matrices of all participants).
The eye-tracking data showed that the class hierarchy received by far the most fixations (43%) and their attention 45% of the time. The edit entity dialogue has 26% of the fixations and the same for attention, and the description area 17% of the fixations and 15% attention. If we look at events over time we begin to see patterns, but with gaps. Some of these gaps can be filled by looking at where the eye gaze dwell – e.g., a user is looking at the description area and not interacting via events. The picture below shows the distribution of dwell times on each area of interest on the P4 user interface – note these numbers tell the same sort of story as the P4US event logging.
Each cell of the following matrix conveys the number of fixations between areas of interest. In other words, it indicates where users will look in t based on t-1 (the x-axis indicate the origin while the y-axis is the destination). The darker the cell is the more transitions there are between areas. We find that given a fixation on a given area the most likely next fixation is on the same area.
We also see other transitions:
- From class hierarchy to description area (and vice versa).
- From the class addition pop-up to class hierarchy.
- From the edit entity dialogue to the class hierarchy.
- From the edit entity dialogue to the description area.
Again, we se the class hierarchy being central to the interactions.
To find the activity patterns themselves, we next merged the eye-tracking and N-gram analyses. First we collapsed consecutive events and fixations of the same type. We then took the resulting N-grams of size > 3 and extended them one at a time until it only yielded repeated and smaller N-grams of the merged data. This analysis resulted in the editing, reasoning and exploration activities outlined at the top.
So what do we now know?
It appears that the class hierarchy is the centre of activity in Protégé. Authors look at the asserted class hierarchy to find the entity he or she wishes to edit and then edits in the description window. The inferred class hierarchy is used to check that what is expected to have happened as a result of reasoning has indeed happened. While activity in each of these windows involves a certain amount of “poking about”, the activity in the asserted class hierarchy looks more directed than that in the inferred class hierarchy. Design work that can ease navigation and orientation within the ontology and checking on the results of reasoning – all results, not just those the author expects to happen – would be a good thing.
Adding lots of axioms by hand is hard work and presumably error prone. Bulk addition of axioms is desirable, along with the means of checking that it’s worked OK.
There is a lot of eye-gaze transition between the class hierarchy and the editing area. In P4 these are by default these are top-left and bottom right. Defaulting to adjacency of these areas could make authoring a little more efficient.
We see experienced authors tend to save the ontology before reasoning. An autosave feature would seem like a good thing – even if the reasoners/Protégé never fell over.
Finally, rather than letting the author hunt for changes in the class inferred hierarchy changes should be made explicit in the display; in effect showing what has happened. This would be a role for semantic diff. Knowing what has changed between the ontology before and after a round of editing and reasoning could help – authors look for changes in the inferred class hierarchy – presumably the ones they are expecting to have happened; there may be other, unforeseen consequences of changing axioms and showing these semantic differences to users could be a boon. To do this we’ll be looking to exploit the work the work at Manchester on the Ecco semantic diff tool.
The paper has more detailed design recommendations. However, what this work does show is that we can gain insight into how ontologists operationalise their work by extracting patterns from log data. The task we used here (everyone built the same ontology to do the same tasks) is not so ecologically valid. It does provide a base-line and allowed us to develop the analysis methods. One pleasing aspect is that the findings in this quantative work largely supported that of our qualitative work. The next thing is to use Protégé4US while people are doing their everyday authoring jobs (do contact me if you’d be willing to take part and do a couple of hours work on Protégé 4US and receive our gratitude and an Amazon voucher). I expect to see the same patterns, but perhaps with variations, even if it’s just in timings, frequency and regularity.