The job of scientists is to find stuff out, and the more useful that stuff is, the better. But science is a continuous process, and few answers are ever final. We are constantly learning more and more about the world around us and readjusting our theories and hypotheses. Thus, the result of scientific exploration is not a  static “fact,” but instead a knowledge claim that is reported in a scientific article, and which the reader is asked to believe. About a decade ago it became apparent — first in psychology, then elsewhere — that many of these knowledge claims, even those we had elevated to the status of canonical beliefs, turned out to be false. The findings could not be replicated, and the “replication crisis” was born.  

Since then, many of us have been worrying away at the issue, trying to work out what’s been going wrong in science. I first got involved in this area as a trainee neurologist, interested in conducting clinical trials of new treatments for stroke. Looking at the data coming out of the lab, there was no shortage of promising drugs we could test. But when we looked in detail, all of the drugs that had been tested in clinical trials were ineffective, with only one exception (a clot-busting treatment more reminiscent of a plumbing solution than true neuroprotection). It turns out the same has been true across the neurological diseases, including Alzheimer’s disease, Huntington’s disease, Parkinson’s disease, Amyotrophic Lateral Sclerosis, and brain tumors. There was a fundamental mismatch between the research being done in the lab and what we were finding in the clinic, and it came at a real human cost. So how do we better align the two? 

Critically, we need to be much more strategic in the way we use published research findings (from the lab) to inform what we do next (e.g. clinical trials). We need to combine and integrate — systematically — the full granularity of research claims and their provenance to provide a richness of detail to our understanding. This process of integrating information at scale is well established in the “-omics,” where oceans of data about genes (genomics) or proteins (proteomics), for example, are combined in a single knowledge system. I propose we do the same thing in combining the oceans of data represented in published research claims, in a new approach I’m calling “publomics.” 

But first, how did we get here? 

At least part of the reason for the replication crisis was the weaknesses in the provenance of the original knowledge claims. When we look at how those original studies were conducted, we find a number of common shortcomings. These include:

No blinding. Experiments must have a test and a control arm. “Blinding” is when the researcher doesn’t know whether the subject they are analyzing (be it proteins, cells, mice, or human subjects) is the test or the control. Results are more reliable if the researcher is blinded, and this is standard part of clinical trials, but is often not practiced or unreported in the preclinical experiments, leading to biased results. We also want the experimental subjects to be allocated to the test and control arm at random — but again, this is infrequently practiced and reported in the lab.

Small sample sizes. We also need to know that the study was big enough to justify the claims made. For example, it’s pretty obvious that men are on average taller than women. But to have a good chance of detecting this difference in an experiment, you’d need at least 20 people to find a statistically significant result. Unfortunately, such huge effects are rare and unlikely in biology, and larger study populations are needed to make well-supported claims. Results based on small sample sizes can skew the literature with knowledge claims based on artifacts. 

Publication bias. Perhaps unsurprisingly, the scientific literature is full of positive results. Null or negative results have traditionally been hard to publish and are often relegated to abandoned hard drives, despite the fact that they represent valuable knowledge claims. Research users looking for evidence get a distorted view, which — like Johnny Cabot in the 1944 film Here Come the Waves  — accentuates the positive, eliminates the negative, and latches on to the affirmative. While it might work in the movies, it’s not a good basis for drug development or true scientific progress.

These and other common design flaws combine to mislead the clinical trialist, the research funder, and the investor as to what is sound, rigorous science. More importantly, this lets down patients — the people, and their families, who suffer the consequences of these diseases. It is heartbreaking to see people go through cycle after cycle of excitement and promise and hope, only to see another promising therapy fail in clinical trials. We owe it to them to do better. 

There has to be a better way

So what would “better” look like? A more realistic appraisal of the prospects for scientific success might not create as much buzz, but it could be a lot more efficient. That would involve scrutinizing all the available evidence, not just cherry picking the best bits. It would involve doing due diligence on the research claims made — were the studies randomized and blinded, with preregistered designs, sufficiently large sample sizes, and power calculations, or are the authors mute on these important details? Just because these things weren’t done doesn’t mean that the findings aren’t true — but it makes it more likely that they aren’t true.

Fortunately, there’s a well-established strategy for finding this out, through approaches called systematic review and meta-analysis. In systematic review, you use a defined literature search strategy to identify relevant research. It doesn’t guarantee that you’ll find everything, but it’s substantially better than starting at the first page of PubMed and choosing the stuff that looks interesting. A systematic review is often followed by a meta-analysis, which allows you to combine, quantitatively, the results from different publications. These analyses can reveal whether the findings from different papers on the same subject are broadly consistent, or all over the place. For example, does the treatment work in both sexes, at all ages, in different species? If so, this would be strong support to the hypothesis. If not, you might be looking at an artifact, edge, or corner case. If the literature doesn’t give you that complete picture, it might make more sense to fill in those gaps with further pre-clinical experimental work before embarking on a clinical trial.

To fully reap the benefits of our collective advances in science and technology for health gain, a careful and systematic evaluation of the available information, with due diligence on the strengths, and relevance of the research claims made, is essential. If this were done at critical stages — such as the initiation of human clinical trials — it would reduce the risk that the premise information on which those trials was based was not well founded. That would benefit the trial participants (who wants to be in a trial where the chances of success are low?), the clinical trialists (who could then focus on trials of agents with a better prospect of success), and those who pay for the trials (de-risking their investment). This isn’t just true for the transition to clinical trial, but at any transition which incurs a major commitment in time, people, or money.

There is of course a catch. If it is that simple, why has it not become standard practice?

Firstly, nobody wants to see their beautiful ideas cast aside — the last thing we want is for someone to show us we are wrong. So while many scientists welcome, in the abstract, the principle that our theories should be tested to destruction — we would very much prefer that that did not happen.

But even if you were absolutely committed to doing a systematic review, there are challenges. This research-on-research, or “meta-research,” is not well funded (generally, tools which are designed to test the theories of scientists to destruction are not well received by funding panels made up of those scientists). That means we haven’t developed the research capacity — we don’t have enough people with the right skills — to do all the systematic reviews and meta-analyses that need done. And the process itself can be onerous. Looking for research into Alzheimer’s disease, for instance, identifies over 400,000 publications. Working out what’s relevant and not, and the quality of the research, is a Herculean task.

This is where technology comes in

If you search PubMed for research on Alzheimer’s disease — one of our most pressing clinical needs — it lists over 170,934 results. At half an hour per article for 40 hours a week that’s over 41 years of reading. It is simply not possible for one person to integrate these claims in an organized knowledge system — our understanding is instead based on abstraction and simplification, and lacks granularity. What if we could take these claims — and their context — and describe phenomena which are always observed, which are never observed, and which are observed in some contexts but not in others? What if we could map these claims from animal research with knowledge claims from human studies, from cell culture? What if we could do it all at the touch of a button?

Fortunately, this is where the emerging tools of big data can help. Advances in text mining and natural language processing allow us to automate many of the steps in systematic review, so that specific research questions can be addressed at much lower cost and effort.

This prospect is what I mean by “Publomics” — taking the automation approach to its logical extreme, to sit with genomics and proteomics and metabolomics and all the other “-omics” that have revolutionized our understanding of biology (and provided a richer understanding of human health as well). Every research artifact — publication, or pre-print, or dataset — would be richly and automatically indexed and curated in a central repository. In addition to the details of the experiments (the cells or animals or people studied, the diseases modeled, the interventions tested, the outcomes measured) and the outcome data (the numbers, not the interpretation), this would also include details of how the researchers sought to reduce risks of bias in their work, for instance through randomisation or blinding or more robust sampling. 

Then, a systematic search, with all the tricky and tedious work involved today, would no longer be required. All that would be needed for a meta-analysis is the selection of the outcomes of interest, and the pressing of a button.

With apologies to my colleagues in the field, we would make systematic review redundant. 

The benefits of this “Publomics” approach extend beyond the replication crisis, by also enabling higher-order analyses, across literatures, of interesting questions like: the concordance between different outcome measures, and the construction of evidence-based networks, describing the different components of mechanistic pathways and how the functioning of those pathways differs for instance across variables like species, sex, and age. 

To show how this might work, let’s take multiple sclerosis as an example.  It is a devastating condition in need of better therapies. When testing new drugs in animal studies, researchers measure and report a number of different metrics, including the effects of drug candidates on inflammation, on damage to nerve fibres (axon loss), on damage to the surrounding myelin (demyelination), and on animal behavior. By combining data from almost 300 animal studies which measured drug effects on at least two of these outcomes, our meta-analysis showed that early treatment which targeted inflammation was highly effective, but as time went on, drugs which did not improve axon loss had no effect on behavior. This nuance is critical, as most patients present with established disease, suggesting that targeting inflammation will not help many patients — and that therapies protecting against axon loss are needed, a critical insight for drug development.

If that’s what you can find when you combine data from 300 experiments, what riches are there in a literature of 30,000 publications, and how can we exploit that richness? Text mining and machine learning approaches can get us some of the way there, but without high-quality input information (risks of bias in the primary research, detailed annotation of the experimental set up), they can only give the broadest overview — like taking an ultra high-resolution picture with an unfocussed camera. With Publomics, we could sort that problem by making sure our focus, our presentation of detail, was razor sharp.

There is already a great example of this approach in the clinical space. The Trialstreamer platform (https://trialstreamer.robotreviewer.net/) uses a high-resolution screen to identify reports of human clinical trials, then uses automation tools to extract information on the study population, the drug being tested, the outcomes measured, and risks of bias in the research design.

But laboratory research is more complicated, because papers usually describe several experiments at once, and separating these out can be challenging (for humans as well as for machines). It is becoming possible, however. In our prototype Systematic Online Living Evidence Summary of animal research testing interventions for Alzheimer’s disease (https://camarades.shinyapps.io/LivingEvidence_AD/), for instance, we have summarized the animal model, intervention, outcomes measured, and risks of bias in almost 20,000 publications using transgenic animal models of Alzheimer’s disease. The next step is to find a way to extract the outcome data, to feed into meta-analytical or machine learning platforms. These are not yet ready for prime time, but they’re not far away.

There will be pushback, of course. The work required to get us there may be seen by some as prohibitive, and the fidelity of our tools less than perfect. There may also be a concern that rendering such understanding at the click of a button might undermine the cachet and price of expert opinion. But we don’t have to choose between experts and machines: Expert opinion that draws from and builds on a Publomics approach is likely to be of much greater value than either approach on its own.   

Having tools like this will transform the way we are able to use, exploit, and deploy research findings from thousands of scientists around the world, and will not only provide one antidote to the replication crisis and give us greater confidence in our understanding of the literature. It might also help us find cures faster and more effectively, transforming the lives of many people whose existence is blighted by disease.

  • Malcolm MacLeod Malcolm Macleod is a Professor at the University of Edinburgh, and co-founder of the Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies (CAMARADES).

Join the Newsletter

Technology, innovation, and the future, as told by those building it.

Thanks for signing up.

Check your inbox for a welcome note.