Archive for the ‘Reproducibility’ Category

Reporting the Age and Sex of mice in Research Papers

August 2, 2016

Oscar Florez-Vargas has just published a paper in the journal eLife as part of his Ph.D. work here in Manchester. The paper is:

Oscar Flórez-Vargas, Andy Brass, George Karystianis, Michael Bramhall, Robert Stevens, Sheena Cruickshank, and Goran Nenadic. Bias in the reporting of sex and age in biomedical research on mouse models. eLife, 5:e13615, 2016.

Earlier in his Ph.D. Oscar did some studies “in depth” into the quality of methods reporting in parasitology experiments in research papers. I reported on this study in a blog “being a credible virtual witness” – the gist of this is that for a research paper to act as a credible witness for a scientific experiment there must be enough of that experiment reported such that it can be reproduced. Oscar found that the overwhelming majority of the papers in his study failed to report the minimal features required for reproducibility. We did another study, led by Michael Bramhall during his Ph.D., with similar findings “quality of methods reporting in animal models of colitis” published in Inflammatory Bowel Diseases.

In both studies, the reporting of experimental method was found to be wanting.

Two of the important factors to report in the experiments in these and other studies are the age and the sex of mice; both factors have significant impact on many aspects of an organism’s biology. The original studies Oscar did were in Depth for many factors in a relatively small area of biology and a smallish number of papers captured by following a systematic review; this time, we wanted to do a broad survey across these two factors. We chose age and sex as they are important factors across many aspects of an animal’s biology and, hence, they influence the outcome of experiments.

We used text analytics on all papers in the PMC full-text collection that had mouse as the focus of their study; this amounted to 15,311 papers published between 1994 and 2014. I won’t report the details of the study here, but we got good recovery of these two factors and were able to report the following observations:

  • The reporting of both sex and age of mice has increased over time, but by 2014 only 50% of papers using mice as the focus of their study reported both the age and sex of those mice.
  • There is a distinct bias towards using female mice in studies.
  • There are sex biases between six pre-clinical research areas; there’s a strong bias to male mice in cardiovascular disease studies and a bias towards female mice in studies of infectious disease.
  • The reporting of age and sex have steadily increased; this change started before the US Institute of Medicine report in 2001 or the ARRIVE guidelines that called for better reporting of method.
  • There were differences in the reporting of sex in the research areas we tested (cardiovascular diseases; cancer; diabetes mellitus; lung diseases; infectious diseases; and neurological disorders). Diabetes had the best reporting of sex and cancer the worst. Age was also reported the least well in cancer studies. Taking both sex and age into account, neurological disorders had the best reporting.
  • We also looked at reporting of sex in four sub-groups of study type (genetics, immunology, physiopathology and therapy): male mice were preferred in genetics studies and female mice preferred in immunological studies.

Age and sex of mice used is important in experiments as it is an important factor in the biology being studied. It is difficult to understand why exactly these factors are not better reported. Reporting of sex and age is done simply in about 40 characters of text; so it’s not a space issue. Previous studies in both human and animal models concluded that males were studied more than females; our study contradicts these studies. This bias towards female mice may be because of practical factors: they are smaller – therefore they need less drug, inoculum, etc. to be administered); are less aggressive to each other and to experimenters, and are cheaper to house. Our study did have a large sample size and focused on only one model (mouse) and this may be a factor in why our study has different outcomes to others. Nevertheless, there appear to be biases in the choice of mouse sex to be used in experiments. The profound effects of sex on an organism’s biology has influenced the creation of the journal of Biology of Sex Differences. As sex influences so many aspects of biology one would suppose that balance for sex of mice used would be a good thing. In this regard, the NIH is engaging the scientific community to improve the sex balance in research. We have used some straight-forward text-analytics to undertake this study and it has enabled some very interesting questions to be asked and has highlighted some very interesting issues that may affect with what certainty we interpret results reported in papers and their broader applicability. It should be entirely possible to use text-analytics in a similar way for other experimental factors both pre- and post-publication.

Patterns of bioinformatics software and database usage

September 27, 2014


I published a blog on the rise and rise of the Gene Ontology. This described my Ph.D. student Geraint Duck’s work on bioNerDS, a named entity recogniser for bioinformatics databases and software. In a survey of Genome Biology and BMC Bioinformatics full text articles we saw that the Gene Ontology is in the top ten of mentioned resources (a fact reflected in our survey of the whole of 2013’s PMC). This interesting survey was, however, a bit of a side-show to our goal of trying to extract descriptions of bioinformatics and computational biology method from text. Geraint has just presented a paper at ECCB 2014 called:


Geraint Duck, Goran Nenadic, Andy Brass, David L. Robertson, and Robert Stevens. Extracting patterns of database and software usage from the bioinformatics literature. Bioinformatics, 30(17):i601-i608, 2014.


That has edged towards our ultimate goal of extracting bioinformatics and computational method from text. Ideally this would be in a form that people wishing to use bioinformatics tools and data to analyse their data could consult a resource of methods and see what was commonly done, how and with what it was done, what’s the latest method for data, who’s done each method and so on and so on.


Geraint’s paper presents some networks of interacting bioinformatics software and databases that shows patterns of commonly occurring pairs of resources appearing in 22,418 papers from the 2013 PMC corpus that had the MeSH term “Bioinformatics” as a tag. When assembled into a network, there are things that look remarkably like methods, though they are not methods that necessarily appear in any one individual paper. What Geraint did in the ECCB paper was:


  1. Take the results of his bioNerDS survey of the articles in PMC 2013 labelled with the MeSH term “Bioinformatics”.
  2. Removed all resources that were only mentioned once (as they probably don’t really reflect “common” method).
  3. Filter the papers down to method sections.
  4. Get all the pairs of adjacent resources.
  5. Assuming the most used ordering (“Software A takes data from Database B” or “Data from Database B is put into Software A”), we used a binomial test to find the dominant ordering and assumed that was the correct ordering (our manually sampled and tested pairs suggests this is the case).
  6. Resources were labelled as to whether they are software or a database. A network is constructed by joining the remaining pairs together.

The paper gives the details of our method for constructing patterns of usage and describes the evaluations of each part of the method’s outputs.


Some pictures taken from the paper of these networks created from assembling these ordered pairs of bioinformatics resources are:


Figure 1 A network formed from the software recovered by bioNerDS at the 95% confidence level


This shows the network with only bioinformatics software. In Figure 1 we can see a central set of sequence alignment tools, split into homologue search, multiple sequence alignment and pairwise sequence alignment tools), which reflects the status of these core, basic techniques in bioinformatics based analyses. Feeding into this are sequence assembly, gene locator and mass spectroscopy tools. Out of the sequence analysis tools come proteomic tools, phylogeny tools and then some manual alignment tools. Together these look like a pipeline of core bioinformatics tasks, orientated around what we may call “bioinformatics 101” – it’s the core, vital tasks that many biologists and bioinformaticians undertake to analyse their data.


The next picture shows a network created from both bioinformatics software and databases. Putting in both software and databases in Figure 2, we can see what the datasets are “doing” in the pipelines above: UniProt and GEO are putting things into BLAST; GenBank links into multiple sequence alignment tools; PDB links into various sequence prediction and evaluation tools.


Figure 2 A network formed from the bioinformatics and database recovered by bioNerDS at the 95% confidence level


Finally, we have the same network of bioinformatics software and databases, but with the Gene Ontology node (which we count as a database) highlighted.


Figure 3 The same network of bioinformatics software and databases, but with the Gene Ontology and its associates highlighted.


In another blog I spoke about the significance of the Gene Ontology, as recorded by bioNerDS, and this work also highlights this point. In this network we’re seeing GO as a “data sink”, it’s where data goes, not where it comes from – presumably as it is playing its role in annotation tasks. However, its role in annotation tasks, as well as a way of retrieving data, fits sensibly with what we’ve seen in this work. It may well be that we need a more detailed analysis of the language to pick up and distinguish where GO is used as a means of getting a selection of sequences one wants for an analysis – or to find out if people do report this activity. Again we see GO with a central role in bioinformatics – a sort of confirmation of its appearance in the top flight of bioinformatics resource mentions in the whole PMC corpus.


What are we seeing here? We are not extracting methods from the text (and certainly not individual methods from individual papers). What we’ve extracted are patterns of usage as generalised over a corpus of text. What we can see, however, are things that look remarkably like bioinformatics and computational biology method. In particular, we see what we might call “bioinformatics 101” coming through very strongly. It’s the central dogma of bioinformatics… protein or nucleic acid sequences are taken from a database and then aligned. Geraint’s paper also looks at patterns over time – and we can see change. Assuming that this corpus of papers from PMC is a proxy for biology and bioinformatics as a whole and that, despite the well-known inadequacy of method reporting, the methods are a reasonable proxy for what is actually done, BioNerDS is offering a tool for looking at resources and their patterns of usage.

Being a credible virtual witness

September 19, 2014

Tim Clark introduced me to the notion of a scientific paper acting as a virtual witness upon a scientific investigation. We, the readers, weren’t there to see the experiment being done, but the scientific article acts as a “witness statement” upon the work for us to judge that work. There’s been a deal of work over recent time about how poorly methods are described in scientific papers – method is key to being able to judge the findings in a paper and then to repeat and reproduce the work. Method is thus central to a scientific paper being a “credible virtual witness”. One of the quotes on Wikipedia’s description of credible witness is “Generally, a witness is deemed to be credible if they are recognized (or can be recognized) as a source of reliable information about someone, an event, or a phenomenon”. We need papers to be credible witnesses on the phenomena they report.


We’ve recently added to this body of work on reproducibility with a systematic review of method reporting for ‘omic experiments on a set of parasite host investigations. This work was done by Oscar Florez-Vargas, a Ph.D. student supervised by Andy Brass and me; the work was also done with collaborators in Manchester researching into parasite biology. The paper is:


Oscar Flórez-Vargas, Michael Bramhall, Harry Noyes, Sheena Cruickshank, Robert Stevens, and Andy Brass. The quality of methods reporting in parasitology experiments. PLoS ONE, 9(7):e101131, July 2014.


Oscar has worked for 10 years on the immunogenetics of Chagas disease, which is caused by one of the Trypanosoma parasites. Oscar wished to do some meta-analyses by collecting together various results from ‘omics experiments. He came with one issue of apparently contradictory results – Some papers say that the Th17 immune response, T regulatory cells and Nitric Oxide may be critical to infection and others say that they are not. Our first instinct is to go to the methods used in apparently similar experiments to see if differences in the methods could explain the apparent contradiction; the methods should tell us whether these results can be reasonably compared. Unfortunately the methods of the papers involved don’t give enough information for us to know what’s going on (details of the papers are in Oscar’s article). If we are to compare results from different experiments, we have to base that comparison on the methods by which the data were produced. In a broader context, method lets us judge the validity of the results presented and should enable the results in a paper to be reproduced by other scientists.


This need to do meta-analyses of Trypanosoma experiment data caused us to look systematically at a collection of ‘omic experiments from a series of parasite host experiments (Trypanosoma, Leishmania, Toxoplasma, Plasmodium, Trichuris and Schistosoma, as well as the non-parasitic Mycobacterium). Oscar worked with our collaborating parasitologists to develop a checklist of what essential parameters that should be reported in methods sections. This included parameters in three domains – the parasite, the host and the experimental infection. Oscar then used the appropriate PRISMA guidelines in a systematic review of 23 Trypanosoma spp. papers and 10 from each of the other organisms from the literature on these experiments (all the details are in the paper – we aimed to have our method well reported…).


We looked for effects on the level of reporting from organism and publication venue (various bibliometric features such as impact factor, the journal’s h-index and citations for the article).


Perhaps not unsurprisingly the reporting of methods was not as complete as one may wish. The mean of scores achieved by Trypanosoma articles through the checklist was 65.5% (range 32–90%). The method reporting in the other organisms was similarly poor, except in Trichuriasis experiments, which achieved the highest scores and included the only paper to score 100% in all criteria. We saw no effect of publication (some negative correlation with Google Scholar citation levels, though this is confounded by the tendency of older publications to have more citations). There has been no apparent improvement in reporting over time.


Some highlights of what we found were:

  • Species were described, but strains were not and it’s known that this can have a large effect on outcome;
  • Host’s sex has an influence on immunological response and it was sometimes not described;
  • The passage treatment of the parasite influences its infectivity and this treatment was often not reported;
  • Housing and treatment of hosts (food, temperature, etc.) effects infectivity and response to infection and these were frequently not reported.


We know method reporting tends to be poor. It is unlikely that any discipline is immune from this phenomenon. Human frailty is probably at the root – as authors, we’d like to think all the parameters we describe are taken into account in the experimental design. The fault is presumably in the reporting rather than the execution. Can we get both authors and reviewers to use checklists? The trick is, I suspect, to make such checklists help scientists do their work – not a stick, but some form of carrot. This is a similar notion that Phil Lord has used in discussing semantic publishing – the semantics have to help the author do their work, not be just another hindrance. We need checklists in a form that help scientists write their methods sections. Methods need to become a first class citizen in scientific writing, rather than a bit of a chore. Method is vital to being a credible virtual witness and we need to enable us all to be credible in our witness statements.