Enough with trickle down reproducibility: scientists, open this gate! scientists, tear down this wall!

How does reproducible research actually work in practice?

Karthik Ram

What exactly is reproducibility anyway?

4 kinds of reproducibility

Computational reproducibility and transparency

Scientific reproducibility & transparency

Computational correctness

Statistical reproducibility

Millman et al , 2016

What tools are scientists using?

Huff 2016

What are some obstacles around making research reproducible?

1

Leveling up skills

Biggest bottleneck to adoption of reproducible research practices was related to diversity of skills

More homogeneity in tool familiarity = better reproducibility

2

Dependencies, build systems, and packaging

Scientific software often built on numerous dependencies

Improved build systems for software, data & workflows

3

Testing

Code that went beyond simple script reported testing systematically

However, many scientists were discouraged by the perceived effort of unit testing

4

Publishing

There is still a need for publication formats that allow for effortless collaboration.

5

Data sharing & versioning

Versioning data is hard, as is finding reliable places to archive them

 

 

65+ tools           R, C++, Node    large contributor community 

1

2

3

4

Data retrieval (APIs, data storage services, journals)
Data visualization (e.g. plot.ly)
Data sharing (figshare, Zenodo, dat)
Reproducibility

6

Time and incentives

"time and efforts spent on creating reproducible research are not very well rewarded"

Ram & Marwick, 2016

10.7554/eLife.16800.001

Journal of Open Source Software

joss.theoj.org

Will reproducibility

always be this hard?

Version your code 

Practices you can adopt now

 Open your data
 Automate everywhere 
Document your processes 
Test everything
Avoid excessive dependencies 
DOIs everywhere
Avoid spreadsheets *
Workflow and provenance 
frameworks are hard to adopt

Partial reproducibility is better than nothing

C. Titus Brown

What we need right now is scientists actually using stuff that already exists, not engineers building new stuff that no one will ever use

Start small -- provide raw data, post any scripts,  and versions of programs you used

Karl Broman

See previous talk by Karl Broman

kbroman.org/steps2rr/

The Practice of Reproducible Research

A collection of case studies to be published in spring 2017

Ben Marwick

Justin Kitzes

Katy Huff

Scott Chamberlain
Jeroen Ooms & rOpenSci contributors

inundata.org/talks/jsm2016
http://dx.doi.org/10.5281/zenodo.59737