Knowledge from millions of biological studies encoded into one network — that is Daniel Himmelstein’s alluring description of Hetionet, a free online resource that melds data from 28 public sources on links between drugs, genes and diseases. But for a product built on public information, obtaining legal permissions has been surprisingly tough.
When Himmelstein, a data scientist at the University of Pennsylvania in Philadelphia, contacted researchers for permission to reproduce their work openly, several said they were surprised that he had to ask. “It never really crossed my mind that licensing is an issue here,” says Jörg Menche, a bioinformatician at the Research Center for Molecular Medicine of the Austrian Academy of Sciences in Vienna.
Menche rapidly gave consent — but not everyone was so helpful. One research group never replied to Himmelstein, and three replied without clearing up the legal confusion. Ultimately, Himmelstein published the final version of Hetionet in July — minus one data set whose licence forbids redistribution, but including the three that he still lacks clear permission to republish. The tangle shows that many researchers don’t understand that simply posting a data set publicly doesn’t mean others can legally republish it, says Himmelstein.
The confusion has the power to slow down science, he says, because researchers will be discouraged from combining data sets into more useful resources. It will also become increasingly problematic as scientists publish more information online. “Science is becoming more and more dependent on reusing data,” Himmelstein says.
Because a piece of data — a fact — cannot be copyrighted, many scientists think that a publicly posted data set that does not place explicit terms and conditions on access can simply be republished without legal problems. But that’s not necessarily correct, …