Archive for May, 2012

An Expedition in Semantic Publishing

May 19, 2012

Overview

To explore what “semantic publishing” means I pushed at the boundries by submitting an ontology of amino acids in the RDF/XML representation of OWL to the Sepublica semantic publishing workshop. The ontology captures the semantics of a domain, it is represented in a Semantic Web language, and the ontology is published on the Web. So, is it a semantic publication? Can a workshop on semantic publication deal with a semantic publication? The upshot is that my provocative submission does seem to count as a semantic publication, but we do need some words around the published semantics to help us out – that is, some narrative. Ultimately, we want semantic literature and literature of semantics.


The Sepublica Narrative

This blog reports on an expedition I made into semantic publishing with my friend Phil Lord. This was all done in the context of the Sepublica 2012 semantic publishing workshop at ESWC in Crete. It all started as a bit of fun testing the boundaries of what I could get away with, but, by provoking discussion, its also had some very interesting effects and reactions. All in all it’s led to something rather good and fun.

What I do on this blog is to report on motivation, what I actually did, the responses to it and what’s come out in the end. First of all, however, I thank the reviewers and the Sepublica organisers for joining in and letting me publish reviews for the “semantic publication” and the email dialogues I had with the Sepublica people (this has made the blog rather longh, but I think it supports this narrative).

So, the back story: Phil and I did a “proper” article for Sepublica about light-weight semantic publishing in the knowledge blog platform. On submission, I noticed the following on the Sepublica instructions to authors web page:

…We also invite submissions in XHTML+RDFa or in the format or YOUR semantic publishing tool. However, to ensure a fair review procedure, authors must additionally export them to PDF.

— Sepublica Organisers

My first reaction was “I wonder what will happen if I submit just an RDF document?”; that is, an OWL ontology in its RDF/XML syntax. This is where the “trying it on” bit comes in; can I take it literally and “publish” a document in RDF as a contribution to this workshop? My reasoning went like this:

  • An OWL ontology captures the semantics of a field of interest;
  • It is a document;
  • it has an RDF serialisation;
  • It has a URI, so it can be on the Web and found.

So, an OWL ontology is a semantic document, in RDF and published on the Web – anything on the Web is published… So, that’s is indeed what I did. The longest bit of the process was choosing the ontology that I had lying around that could work for the “expedition” into semantic publishing; this must be one of the cheapest publications I’ve ever done. I chose the Amino Acids Ontology, which is a small ontology that captures the basic biochemistry of amino acids and does so in a way that exploits automated reasoning.

Here’s the next bit of the story:

  • I chose the amino acids ontology. Phil and I originally made this to show off the wizards in the OWL plugin for Protege 3 and how we could use them to very rapidly create this ontology of amino acids.
  • I put a Dublin Core “title” annotation property to give my document a title;
  • I added myself and Phil as authors (though other people have contributed to the ontology over time as the annotations on the ontology describe);
  • I made my own “abstract” annotation property to give the document an “narrative abstract”.
  • and that was my semantic publication finished.

Here is a fragment of the ontology’s “title page”:

Annotations:
    title "Semantic Publishing of Knowledge about Amino Acids"@en,
    author "Robert Stevens and Phillip Lord",
    abstract "We semantically publish knowledge about the amino acids commonly
    described within biochemistry. The classification of amino acids is based
    on Taylor's article (PMID:3461222) from 1986 published in the Journal of
    Theoretical Biology. The ontology goes further than the static paper
    version; it combines many aspects of the physicochemical properties Taylor
    uses to classify amino acids to give a rich, multi axial classification of
    amino acids. Taylor's original description of the amino acid's
    physicochemical properties are captured with value partitions and
    restrictions on the amino acid classes themselves. A series of defined
    classes then establishes the multi-axial classification. By publishing
    this knowledge about amino acids as a semantic document in the form of an
    ontology we persue an agenda of disruptive technology in publishing. Blogs
    about the published semantics of amino acids may be found at
    https://robertdavidstevens.wordpress.com/2010/12/18/an-update-to-the-amino-
acids-ontology/
    and links following."@en,

So, it has some minimal trappings of a traditional publication. This also gives an outline of our atttitude to the ontology as a semantic publication; the ontology is a semantic artefact, but we do link out to some blogs that give some narrative on the ontology…


Submitting a Semantic Publication to Sepublica

Next came submitting the publication to EasyChair. The instructions above said we had to give a PDF version of the submission to ease the review process. As pointed out elsewhere, this is a sad irony of fora on alternative or next generation publishing – they use “lumpen PDF”… So, I saved my ontology as Manchester OWL Syntax and turned it into PDF. This had two motivations – one was “that will show them….” (is there anything as useless as a Manchester syntax dump of an ontology converted to PDF?) and the second was to submit both this ludicrous document with the more sensible RDF version of the ontology. Unfortunately, the EasyChair Sepublica site wasn’t set up to take other than PDF; the organisers changed it for me, but the only way to get RDF in was to zip it up and submit one file. So, I zipped up the RDF and submitted the ontology, but without the silly PDF version.

This is where the dialogue started.

Dear both, I was trying to take a look at the paper you submitted “Semantic Publishing of Knowledge about Amino Acids” the problem I had was that the uncompressed zip file generates an OWL file (nothing wrong with the OWL file, I opened it protege) but there is not an actual paper -as in a PDF file. Could u resubmit and make sure to include the actual paper.

— Sepublica Organisers

and

The ontology is our submission. the workshop pages said that RDF submissions were acceptable and that’s what we submitted. The ontology is a document that captures, in a computational form, the semantics of amino acids. The URI resolves to a web address from which the semantic publication can be read, so I think this counts. As the instructions for authors said, I did produce a PDF version of our publication, but the RDF one works much better. I think we’ve fulfilled the instructions to authors — is there anything else we need to do?

— Robert Stevens

and the reply was:

you are right, no problem

— Sepublica Organisers

Which was the right answer – good for them. At this point the submission was sent to the reviewers.


The Reviews

The intstructions sent to the reviewers were:

please note: This is a true semantic publication.

It does not quite stick to the rules (as the authors didn’t submit a PDF export limited to 12 pages), but nevertheless we (= Alex and me) decided not to reject it before reviews.

We recommend that you treat this submission as if it were a paper describing an ontology.

You can read the ontology with your favorite ontology editor (e.g. Protégé), but we also recommend that you open it in a text editor to see the publication-style “header”. The blog post linked from the “abstract” should also be considered part of this submission.

— Sepublica organisers

Below is what we got back.

Note: This submission has been evaluated as an ontology rather than as a paper. The blog has also been read in order to better evaluate this work.

What is the target research problem? The ontology represents the 20 amino acids used in biology as well as their characteristics such as polarity, size, etc.

What are the strong points and weak points of the paper? The ontology proposed is well documented and highly relevant for bioinformatics and related domains.

Does the paper evaluate its contribution? Is it aware of related work?

Further comments (if applicable) There should be a formal submission of the corresponding paper. I would like to see an evaluation against competency questions. I would also like to know how this ontology has been used. Do the authors have a particular project in mind? How can it be used in conjunction with protein ontologies or others? Why it is better to represent this information in an ontology rather than other formats? I think the ontology itself is interesting but even more would be its use.

Minor issues (if applicable) d be nice to know how that domain can benefit from this ontology.

— Reviewer One

The research problem that the authors are somewhat sarcastically addressing is the question of how to publish machine readable (“semantic”) documents. Though they don’t really spell this out in text, they are proposing that the important knowledge in a publication should be be represented and distributed as an OWL ontology that is completely distinct from a traditional body of text that would be distributed as a PDF for human readers… As they state: “publishing this knowledge about amino acids as a semantic document in the form of an ontology we persue an agenda of disruptive technology in publishing”

One interesting effect of their disruptive submission is that, as a reviewer, I am forced to attempt to examine the knowledge content directly without recourse to complain about grammar, document structure or image quality – which I think is a positive. This raises the problem that, not being a biochemist, I need some way to tell whether their ontology is correct. Sadly, the only real way that I, as a lowly human reader, can evaluate the knowledge content of the ontology is to go back to read the papers from which this knowledge was extracted in the first place…

I like the authors main point if I may guess it as something like: “We have a great knowledge representation language ready for use in publishing called OWL and we should use it directly in the publishing process”. But I don’t think that we can escape from also publishing knowledge in a form that human readers can easily consume.

So, what to do with this submission? I think it would be most useful for the meeting if the authors would make their proposal for OWL-based semantic publishing explicit by writing an editorial style article (in English) that states their case. They should, of course, include an OWL version of this editorial so that we can verify that their reasoning is sound.

— Reviewer Two

This is all very interesting, but before we unpack the reviews, I should come clean about the disappointment of having the “paper” accepted for presentation at Sepublica; Phil and I really wanted it to be rejected on the grounds that there was no publication. This would have enabled us to say that the whole thing is completely ridiculous… However, our bluff was called and the reviewers and the Sepublica organisers have gone with it and good things look like they’re coming out of the whole expedition.

Reviewer One says he/she is reviewing it as an ontology not a paper… even though the ontology (in our view) is a semantic publication. Reviewer one just plays it with a straight bat – let’s just treat it as an ontology. It may be that the two should be indistinguishable – should an ontology be treated any different from a paper? The axioms of the ontology are a theory about the domain; it doesn’t have the form of the traditional scientific paper, but can it be treated as such – is this comment about it not being a paper eventually going to be “old fashioned”? The interesting point is that he/she wants descriptions of the ontology’s use; something that is not in the ontology, it is in the narrative surrounding the ontology (or would be if we had a use other than teaching for this ontology). Phil Lord has talked about literate ontology as an analogy to literate programming; we should be able to have narrative for the ontology surrounding that ontology. There is, however, something to distinguish between what the ontology says about its field of interest and what we want to say about the ontology as an artefact.

Reviewer one also says he/she read the blogs to gget the narrative, which sort of plays to this need for a narrative. however, I stilll think that the basic point that the ontology is a semantic publication holds; it may need more narrative, but I remain to be convinced “There should be a formal submission of the corresponding paper.”. Finally, the comment “Why it is better to represent this information in an ontology rather than other formats?” is fun for a workshop on semantic publishing – is this (our ontology) a good way to publish semantics for a field? I claim that this ontology captures a lot of basic biochemistry of amino acids; the background chemistry belongs elsewhere, but this ontology captures an early lecture in biochemistry. The biological and chemical implications of the amino acid’s characteristics are beyond what we’ve done, but I’m happy to argue that the ontology as it stands is a good way of publishing basic knowledge about the semantics of amino acids. If it isn’t, then we’ve been wasting an awful lot of time on ontologies.

The point about narrative comes out even more in Reviewer two’s review. I’ve never had a paper of mine described as having an element of sarcasm – I’m very proud of this achievment. Reviewer two said:

One interesting effect of their disruptive submission is that, as a reviewer, I am forced to attempt to examine the knowledge content directly without recourse to complain about grammar, document structure or image quality – which I think is a positive.

— Reviewer two

which I do understand, but a good part of this tedious element of reviewing is to make sure the publication can be understood as a publication. Can we do this with an ontology? An ontology should have a tutorial or reference aspect,but we don’t really know how best to present them for many applications. I’m prepared to state, however, that OWL isn’t the way to present an ontology to users (except, perhaps, the authors) and various graph visualisations are only part of the solution, but all of this is another story.

This raises the problem that, not being a biochemist, I need some way to tell whether their ontology is correct. Sadly, the only real way that I, as a lowly human reader, can evaluate the knowledge content of the ontology is to go back to read the papers from which this knowledge was extracted in the first place…

— Reviewer Two

perhaps one day “papers” will be assessed against the ontologies that capture background knowledge. However, this reviewer is right to point out that it is difficult to review an ontology (too much ontology evaluation/review) is based on “would I have done it this way….”. A wider question is how does one evaluate/review any semantic publication?

Finally, we have:

We have a great knowledge representation language ready for use in publishing called OWL and we should use it directly in the publishing process’. But I don’t think that we can escape from also publishing knowledge in a form that human readers can easily consume.

— Reviewer Two

which is and isn’t what we’re saying. I couldn’t write my whole publication in OWL or even all of FOL – and not wish to either. This all gets to the heart of it; we want semantic publishing, but we also want narrative. What counts as a semantic publication – a lump of RDF; a trad paper with a bit of RDF or OWL or FOL in it or a trad paper with some typed links? Whatever the nature of a semantic publication or a semantic scientific publication we do need narrative.

Reviewer Two ended up saying ” …make their proposal for OWL-based semantic publishing explicit by writing an editorial style article (in English) that states their case. They should, of course, include an OWL version of this editorial so that we can verify that their reasoning is sound.”, which is exactly in the right vein; I take my hat off to him/her.

What we did was write a little position paper outlining what we did – this makes the point that semantic publishing needs narrrative. This goes for semantic scientific publishing, but also for data as well. I like the comment about representing the position paper as an OWL ontology to check reasoning. I actually thought about this and it would be a good exercise, but not one I felt I could turn around in the few days available for making our proceedings version – perhaps it’s worth pointing out that writing a trad paper is actually easy compared to writing the ontology version – especially if you take away all the poncing around one has to do when publishing in a trad forum. Getting narrative structure into a semantic document would be fun, or do we want a proper hypertext document where whatever route you take through the structure one gets the same message?


The final bit

The general instructions to authors for Sepublica’s final, camera ready version was:

Dear Robert,

You have already received the comments by the reviewers in a previous email. Please take them carefully into account when preparing your camera-ready paper for the proceedings.

The final paper and the signed copyright form are due on

FRIDAY APRIL 13 23:59 (Hawaii time)

This is a firm deadline for the production of the proceedings.

  1. FINAL PAPER: Please submit the files belonging to your camera-ready paper using your EasyChair author account. Follow the instructions after the login for uploading two files:
    (a) either a zipped file containing all your LaTeX sources
        or a Word file in the RTF format, and
    (b) PDF version of your camera-ready paper.

The final submission must be in LNCS format (instructions: http://www.springer.de/comp/lncs/authors.html). Research papers are strictly limited to 12 pages, position papers to 5 pages, and system/demo descriptions must be between 2 and 5 pages. 2. COPYRIGHT: The copyright form can be found below. It is sufficient for one of the authors to sign the copyright form. You can scan the form into PDF or any other standard image format, but even a text file with your name entered is sufficient.

— Sepublica Organisers

and this was my supplementary message from Sepublica:

Dear authors,

of course the “must be LNCS, N pages” etc. restriction is not applicable in your case.

However, I need something for the old-fashioned PDF version of the proceedings. Please find some suggestions below. If you should just upload the ontology file, I’m going to print it to a PDF from a text editor – but there should be nicer ways.

Maybe a title page in LNCS style, up to and including the abstract.

Then, a pretty-print of your ontology might follow. We are not going to print the proceedings on paper, so we do not really have physical page limits.

Please be innovative! 🙂

— Sepublica Organisers

One thing this strongly suggests is that we don’t really know what to do with a semantic publication. I don’t think my PDF of the Manchester Syntax for the ontology either counts as a pretty print or is it really the way to do a semantic publication anyway. This was, however, when the Sepublica organisers turned the tables on me, effectively saying ‘OK, so you’re publishing semantically – let’s get on with it’. Even though I think I’ve published semantically, I sort of gave in at this point and did the aforementioned position paper. I do, however, believe that my ontology is a semantic publication; we just don’t yet know how to handle semantic publications.

We have a feeling that semantic publishing must be a good thing, but it’s all rather uncharted territory at the moment. We want material in our scientific publications to be more computationally accessible. We want semantically described data. but what is a semantic publication? How much semantic content does a publication have to have to be a semantic publication? Perhaps the goal for the next Sepublica is to not have PDF as the output, but to challenge the community to do some semantic form of publication to test some boundries.

Advertisements