By Lucas | August 20, 2014
The fifth course in Johns Hopkins Data Science Specialization on Coursera is Reproducible Research. This is the third and final course in the sequence taught by Roger Peng.
Reproducible Research is the course among the first five in the specialization (except The Data Scientist’s Toolbox), where I spent the least time learning new R code. Instead, the emphasis of this course was more philosophical in nature. Here the emphasis was on writing your research findings up in a way that they could be shared with others in such a way that they were considered to be reproducible, though not necessarily replicable. For more on the definition of reproducible research, check out this post from Dr. Peng.
That’s not to say there isn’t much R coding in Reproducible Research, or even less coding. Just like the other classes in the sequence, I still spent a fair amount of time cleaning data and programming R for data analysis. It’s just that the emphasis of the class was on communicating those results in a manner that anyone who was well versed in R could follow my analysis from the very first step to the very last step and reproduce those results.
One of the niftiest features of RStudio that we explored in this class was its ability to easily use Knitr. Using Knitr, we created single documents that combined markdown and R code into one, simple to read document. The output of the code is contained right in the document and the code itself can be revealed or hidden. The document can be outputted as say, a pdf or html file. It’s a really handy tool.
Throughout the course, Dr. Peng emphasized the importance of making your research reproducible. It reminded me a bit of being back in high school and being told I needed to “show my work.” Very compelling examples were shared with the class of the importance of reproducible research. Without a doubt, the most compelling example was the case of the fraudulent cancer research at Duke University, which eventually made its way onto 60 Minutes.
While I do hope the Data Science Specialization leads me to a new career opportunity, I don’t suppose it’s very likely that I’ll end up as a cancer researcher. Will reproducible research be as important to me as those cutting edge medical researchers? Perhaps not, but I can certainly understand why this course was included in the sequence, and even if I only end up sharing my code with a few coworkers down the road, I’ve learned a thing or two about the proper way to share my results with them.
Click here to register for the Johns Hopkins Data Science Specialization on Coursera. (Affiliate link, thanks for your support!)