Why scientists must share their research code

From Nature:

George Dyson

Many scientists worry over the reproducibility of wet-lab experiments, but data scientist Victoria Stodden’s focus is on how to validate computational research: analyses that can involve thousands of lines of code and complex data sets.

Beginning this month, Stodden — who works at the University of Illinois at Urbana-Champaign — becomes one of three ‘reproducibility editors’ appointed to look over code and data sets submitted by authors to the Applications and Case Studies (ACS) section of the Journal of the American Statistical Association (JASA). Other journals including Nature have established guidelines for accommodating data requests after publication, but they rarely consider the availability of code and data during the review of a manuscript. JASA ACS will now insist that — with a few exceptions for privacy — authors submit this information as a condition of publication.

Nature spoke to Stodden about computational reproducibility and the emerging norms of sharing data and code.

This is really about what it means to do science. We have publication processes to root out error for research that is done without a computer. Once you introduce a computer, the materials section in a typical scientific paper doesn’t come close to providing the information that you need to verify the results. Analysing complicated data by computer requires instructions consisting of script and code. Hence we need the code, and we need the data. The reproducibility editors will gather the code and gather data and gather workflow information, and we’ll enforce the requirement that the data and code that support the claims in an article are made available.

It means that all details of computation — code and data — are made routinely available to others. If I can run your code on your data, then I can understand …

Continue Reading