By Lucas | August 22, 2014
The sixth course in Johns Hopkins Data Science Specialization on Coursera is Statistical Inference. This is the first course in the specialization taught by Brian Caffo. In my review of the R Programming course, I mentioned that there were two places in the sequence that seemed (based solely on my observations of forum comments) to be bogging students down. R Programming was obviously the first. Statistical Inference is the second.
In R Programming, the members of the class without significant programming experience had to fight and scrap to keep up. In Statistical Inference, it seemed to be the members of the class that had been out of the mathematics too long that struggled.
I should preface the rest of my comments by admitting that as an AP Statistics teacher, even of just one year, I had a significant advantage in this class. Probably 75% of the material in Statistical Inference is covered in AP Statistics curriculum, and while Dr. Caffo pushed a little deeper than the average high school senior would go, many quiz questions could have come straight from the an AP Stats exam review book. Obviously, that inspired a lot of confidence, and for me, this was the easiest course in the sequence other than The Data Scientist’s Toolbox.
Dr. Caffo is at his best when he encourages his students to think about the effects of potential changes to a data set. He does this a couple of different ways. First, he occasionally uses visual diagrams of data sets that he’s plotted ahead of time. Second, my favorite method, he uses the “manipulate” package in R. This package allows the teacher or student to use slider bars to make changes to various parameters and have the graph in R react in real time. It almost lets me pretend I’m working with the TI-Nspire again. Lest you miss him announce it the first time, all of the code for the manipulate demonstrations is available on the course Github repo, so you can copy and paste right into RStudio and do the demos along with him.
In Statistical Inference, you will find a lot of the basic concepts of inference such as confidence intervals, p-values, and hypothesis tests. There’s also some basic probability covered. Topics that I had less familiarity with included Poisson distributions (hadn’t used them since an actuary test years ago), resampling techniques (the jackknife and the bootstrap), and multiple testing.
At the time I took Statistical Inference, which was the June session, the grading was entirely made up of 4 quizzes. There were also optional homework assignments, which I found to be very helpful. If you don’t have a deep statistics background, be prepared to spend some time supplementing with outside resources for this class. It is simply too much to expect to pick up everything you need in a short series of lectures. This class covers almost as much material as would be covered a semester at a university, which could be a problem if it is all brand new to you, as it was to some students in the class.
A couple of resources I would suggest are Datacamp, which offers R training in your browser. Take the Data Analysis and Statistical Inference track, which overlaps a lot with this course. A couple of very high quality free eBooks that are popular with people in the Data Science Specialization are An Introduction to Statistical Learning and Open Intro. Open Intro is actually so cheap on Amazon that I picked up a physical copy there.
Summarizing, Statistical Inference is a very challenging course for those that have not got a statistics background. Expect to spend time studying and Googling. You will need to supplement the lectures. If, on the other hand, you already have a firm grasp of introductory level statistics, you should only expect to pick up a few new concepts along the way.
Click here to register for the Johns Hopkins Data Science Specialization on Coursera. (Affiliate link, thanks for your support!)