Why the stakes are so high in the open data debate

It is hard to understate just how much of a currency data has become in medicine. Whether talking about evidence-based medicine, precision medicine, or genomics, the ability to collect and distill data into information, transform it into knowledge, and use that knowledge to drive effective action is at the heart of what modern medicine seeks to accomplish. The centrality of data to this process has created well-entrenched stakeholders, which is why it comes as no surprise that the conversation around open sharing of research data following publication has shifted into controversial territory.

A spark was ignited last January when NEJM editors Dan Longo and Jeff Drazen published an editorial voicing concerns expressed among the trialist research community that resisted the open sharing of research data. What received the most attention was a concern

that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends…that the system will be taken over by what some researchers have characterized as “research parasites.”

Since then, open data advocates have fired back explaining how such “research parasites” serve an important role in discovery and information liquidity, allowing us to arrive at critical medical truths faster in the service of providing more optimal patient care. The debate has additionally swirled around the important issues of patient privacy, costs of data sharing, and the appropriate interpretation of statistics gleaned from repeated looks at the data. While those represent clear challenges, I believe they are quite surmountable. What seems the greatest challenge is appropriately aligning incentives for trialists to perform randomized controlled trials (RCTs) in a world where anyone can explore that trial data and receive credit of their own built from the original investigators’ hard work.

Indeed, this objection forms the core of the International Consortium of Investigators for Fairness in Trial Data Sharing somewhat limited proposal for data sharing as they explain, “adequate incentives for researchers to invest the substantial time and effort required to conduct RCTs and to publish the results in a timely fashion are important.” They argue that opening data up for scrutiny and secondary analysis too soon will reduce investigators’ incentives for designing and running clinical trials.

I admit that though it does seem that securing grant funding, contributing in the best way we know how to the collective body of medical knowledge, or using those findings to improve patient lives ought to seem like enough incentive to continue to run and publish RCTs, I am sympathetic to the trialists. They pour more than money into generating a robust data set designed to answer a specific question. I’m certain they do it for the reasons above. At the same time, though, there is an extremely powerful and pragmatic motive to own the engines of productivity in academia: the data. Academic promotion and consideration for tenure is driven, for better or worse, by the ability to publish as much in quantity as in quality. Absent any change in how promotion in academia is considered, shortening the time investigators have to reap the benefits of their data collection efforts would absolutely threaten their prospects for promotion and job security. Indeed, the keys to promotion in a narrow, but very real way represent what is valued in academia. The open research data movement, then, is extremely disruptive not so much to how research is performed, but to a more fundamental feature of the medical academic enterprise: that publications serve as the coin of the realm.

It seems to me there are two ways to resolve this source of conflict in the debate over open data sharing.

  1. Change the basis by which professors are evaluated for promotion (ie change what is valued in academic medicine).
  2. Provide the trialists some mechanism for receiving credit for secondary analyses that use their generated data up to and including authorship.

I am by no means the first one to suggest these. Correspondence in response to Drazen’s original article each speak to these solutions. In my mind, 2 seems like the only viable approach. In any case, the right path forward in this regime will only be uncovered through a multi-stakeholder engagement including as one editorialist suggests “sponsors of clinical trials, the IOM, the ICMJE, governmental bodies, and regulatory authorities.”

In addition to those mentioned I would suggest one additional group be given voice in this conversation: that of the patients. They are the ones who put themselves at the greatest personal risk to generate this data and are the first benefactors of clinical research. It is sometimes instructive to reflect about what activities we would look back on 50 or 100 years from now and marvel at the barbarism of our ancestors. Indeed, medical history is rife with stories like Tuskegee and the HeLa cell line where we neglect the disenfranchised patient-contributors to research. As its multi-faceted challenges are sorted out, open data sharing seems like an important arena where we can get this right the first time.

