Nick Tierney and I have been working on a paper about data sharing from the perspective of computational reproducibility and the practicalities of reproducing published research or other analyses. Our original paper was substantially longer and took more of a best practices approach but fell short of the goal as our recommendations were a bit too broad and didn’t apply to any specific domain. Our intention was to capture the most situations that mostly apply to small to medium sized datasets. Despite the widespread availability of data repositories and access to best practices and training, it surprises me to see how little published data is reusable.
In revising the paper to a shorter commentary, it all made sense when I looked at the problem through the lens of Brian Nosek’s culture change pyramid The COS Strategy for Culture Change
. With data, especially FAIR data principles, we made it a requirement to share data upon publication without demonstrating the incentives or personal rewards for doing so. That explains why much of the data shared merely complies with mandated requirements. In contrast, although code isn’t shared as often and there are no requirements to do so, communities of practice around code have laid the necessary ground work and in the right order for us to make this a meaningful requirement in the future.