Like my cousin's t-shirt says...

...freedom isn't free. Though, I doubt his shirt referred to the statistical computing wars that occassionally bubble up among academics.

Over at the wonderful blog Data Colada is a new piece about reproducibility in R. R is a fantastic statistical computing environment. It is free, it is flexible, and it has a following among academics that makes Elon Musk fans look sleepy.

Simonsohn highlights a critical problem with R: reproducibility. The R program itself, as well as the many user packages that academic research depends upon, are frequently updated, making posted code likely to have a shelf life closer to milk than the academic timeline needed for reproducing results. This is a critical issue that any serious researcher will need to build into their workflow.

Simonsohn discusses this issue in terms of reproducing already published work. I suspect it also likely causes issues for folks in between rounds of review prior to publication. If the review process can take anywhere between three months to multiple years, then the problems of package stability torpedoing one's results will likely occur during this early phase of a paper as well. Definitely not something graduate students or early career researchers should hope for.

Now, Simonsohn is incredible and so has developed a package to anchor package updates: groundhog. You use the groundhog package and then you can anchor your program to a particular day's pacakge update.

I have no strong feelings about statistical software environments and I generally feel embarrassment when I see the occassional snarky fight on academic social media about the rank ordering of statistical program supremacy. Use R. Use Python. Use SAS. Use Stata. Use Matlab. Whatever. Do good research and invest more of your thought into research design and underlying statistical and methodological principles. I suspect such an investment will yield much greater returns in the long run.

If you use R, then it seems like you should immediately begin to incorporate groundhog into your default workflow. Easy. But I am left feeling ever so slightly uneasy. I'd just say that this feels a bit like opening up a new credit card to pay off your massive credit card debt. Problem solved! That initial debt is paid off and I'm not behind on my payment! I have a whole 30 days to deal with this new big debt! The fundamental problem with R seems to be that it is dependent on the whims of package updates? Let's solve it with a package that may or may not be stable across time!

Don't read into my argument too strong of a sentiment. R is great, I occassionally use it in my own projects, and I assume groundhog will go far in fixing issues of reproducibility. I'm thinking of making a more permanent transition to it someday. But Simonsohn's post highlights a serious cost to the program that cuts to fundamental principles of academic research. Don't discount this issue because if feels like a cooler or harder or more legitimate software environment.