An alarming number of scientific papers contain Excel errors

From The Washington Post:


A surprisingly high number of scientific papers in the field of genetics contain errors introduced by Microsoft Excel, according to an analysis recently published in the journal Genome Biology.

A team of Australian researchers analyzed nearly 3,600 genetics papers published in a number of leading scientific journals — like Nature, Science and PLoS One. As is common practice in the field, these papers all came with supplementary files containing lists of genes used in the research.

The Australian researchers found that roughly one in five of these papers included errors in their gene lists that were due to Excel automatically converting gene names to things like calendar dates, or random numbers.

You see, genes are often referred to in scientific literature by symbols — essentially shortened versions of full gene names. The gene “Septin 2” is typically shortened as SEPT2. “Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase” gets mercifully shortened to MARCH1.

But when you type these shortened gene names into Excel, the program automatically assumes they refer to dates — September 2nd and March 1st, respectively. If you type SEPT2 into a default Excel cell, it magically becomes “2-Sep.” It’s stored by the program as the date 9/2/2016.

Even worse, there’s no easy way to undo this automatic formatting once it has happened. Edit -> Undo simply deletes everything in the cell. You can try to convert the formatting from “General,” the default, to “Text,” which you might expect to change it back to the original characters you enter. But instead, changing the formatting to ”Text” makes the cell contents appear as 42615 — Excel’s internal numeric code referring to the date 9/2/2016.

Even more troubling, the researchers note that there’s no way to permanently disable automatic …

