The rise and rise of the Gene Ontology

Geraint Duck, one of our Ph.D. students, has just published a paper on a named entity recogniser for databases and software used in bioinformatics and computational biology. This is a wider project looking at extracting computational biological methods from text. As part of the paper about the BioNERDS tool, we did a survey of databases and software reported in the full-texts of Genome Biology and BMC Bioinformatics in PMC. More recently we’ve done a full survey of PMC, but the paper just reports on the two journals. The paper’s full reference is

 

Geraint Duck, Goran Nenadic, Andy Brass, David Robertson, and Robert Stevens. bionerds: exploring bioinformatics’ database and software use through literature mining. BMC Bioinformatics, 14(1):194, 2013. (DOI: 10.1186/1471-2105-14-194).

 

Here I want to report on the survey and, in particular, what it says about the reported usage of the gene Ontology. We surveyed BMC Bioinformatics and Genome Biology; the former has a remit to report development of bioinformatics methods, tools, databases, the latter has a remit to report more on the use of those rsources to actually “do biology2 – though, of course, there is overlap. The table below shows the top nine resources for each journal over the life-time of each journal.

 

BMC Bioinformatics

Genome Biology

Resource

Count

Resource

Count

R

1922

R

574

GO

1102

GO

516

BLAST

870

BLAST

430

analysis

696

GenBank

414

PDB

631

GEO

287

Network

553

Ensembl

266

Q

494

S4

229

GenBank

468

tRNA

195

KEGG

463

analysis

193

GEO

416

RefSeq

175

 

 

These numbers are the documents in which the resource was mentioned. There are a few resources that are over-reported – “network” and “analysis” are both real bioinformatics resources, but with highly inconvenient names for text-mining. “analysis” is not an unusual word to find in reports of science, so calling a tool “analysis” is, we think, something of an infelicity. However, the textp-miners dealt with this kind of thing for gene and protein names, so I’m sure we will also do so.

 

In both journals the Gene Ontology is up there in the top resources reported in the literature. It’s up there with the usual suspects. R is now top-dog, with BLAST, GO, Ensembl, KEGG, GEO and Genbank. I’m reasonably happy in concluding that the Gene Ontology is one of the central resources in these journals –.

 

We also had a look at the GO’s usage over time. We calculated the relative use of the GO by dividing the number of documents mentioning GO by the number of documents in that year in each journal).

 

We can see the mentions of GO in BMC Bioinformatics increasing fairly rapidly until 2005 and then increasing more slowly, and even tailing off a bit, thereafter (the paper has more details on these trends – normalising and statistical testing etc.), but these trends appear to be OK). The picture in Genome Biology is a little less clear, but GO becomes an established 0–resource. My suspicion is that numbers appear to tail off (as they do for other resources) as they become part of the fabric and no longer explicitly mentioned, also, there are more resources to use and cite, so competition is fierce – I’ve no evidence for these thoughts, but that’s my conjecture).

 

In these two journals GO is a “top” resource – we have an ontology that is a key resource for bioinformatics and computational biology. Something happens in 2005/2006 to GO’s usage (the paper has some plots of acceleration of usage too) – some kind of saturation, establishment as “a top resource”, or something else. A similar picture is seen in the whole of PMC – GO is in the top ten – I’ll report on that, and on how other ontologies fare, in another post. However, the take home message is that there is an ontology that is a central resource in bioinformatics and computational biology. That it is the GO is no surprise.

Advertisements

One Response to “The rise and rise of the Gene Ontology”

  1. Patterns of bioinformatics software and database usage | Robert Stevens' Blog Says:

    […] published a blog on the rise and rise of the Gene Ontology. This described my Ph.D. student Geraint Duck’s work on bioNerDS, a named entity recogniser […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: