By Lucas | August 8, 2014
For nearly six months, this blog has gone quiet. Considering I’ve had 3+ years of posting an average of two times a week, that’s quite a stretch without writing. There have been a variety of reasons for my silence, including increasing work and family commitments, but as I alluded to last fall a couple of times on this blog, I’ve also been investigating new career options. The biggest reason for my silence is the significant amount of time that I’ve devoted to career questions over the first half of this year.
While I still enjoy mathematics, technology, and students, I have felt my work at school getting a bit stale the last couple of years, and I would really like an entirely new challenge. An area that’s really caught my interest is data analysis. It incorporates my love of mathematical thinking and technology, and over the last couple of years, it’s been impossible to ignore the wealth of fascinating stories in the news, in blogs, and on podcasts about how data collection and analysis is increasingly influencing our daily lives.
Professional Development Online
Obviously, a desire to make such a significant change in career path requires a new set of skills and some study. While I’ve written a lot about online learning tools on this blog, I’ve been even more amazed to discover the wealth of options available for a more adult audience looking to advance their education. At the beginning, I simply wanted to learn how to do SQL queries, and one of several resources I used was the Microsoft Virtual Academy. It wasn’t long before following discussions on LinkedIn led me to a more ambitious program, though: The Data Science Specialization from Johns Hopkins on Coursera.
The Data Science Specialization
The Data Science Specialization is taught by three professors from the department of biostatistics at Johns Hopkins University: Roger Peng, Jeff Leek, and Brian Caffo. The program is nine 1-month classes plus a capstone course. Those that pay $50 per course to Coursera for the opportunity to earn an identity verified certificate are eligible to take the capstone after completing the previous nine courses. The capstone, which is offered 3 times a year, and for the first time in October 2014, is billed as sort of a miniature internship project with an industry or government partner. After completing the capstone, a student earns the specialization.
The courses are all taught using the open source R language and the RStudio IDE. The recommended prerequisites for attempting the sequence are competency programming in one computer language and mathematics up through high school algebra. In my experience in the sequence, there are people attempting the sequence without any programming experience, and that’s not turned out well for most of them, though some particularly determined souls are successful. I would also suggest that the statistics courses in the sequence do advance through that material very rapidly, and it’d be helpful for anyone attempting the sequence if they had at least some statistics training under their belt.
I’m nearing the end of the sequence now. I have completed the first seven courses already, and I’m taking courses eight and nine in August. I plan on posting a brief review/summary of each over the next couple of weeks. I am so glad that I found this specialization. Doctors Caffo, Peng, and Leek have taught me so much this spring and summer. I can’t honestly say that I have mastered every technique, command, and concept presented. With such a wide ranging curriculum compressed into such a short time, I doubt that many people do.
However, when I think about where I am today compared to where I was when I started the program a few months ago, it’s pretty incredible. I have a friend who recently completed “Dev Bootcamp” for web developers, and I feel like I’ve been on my own sort of boot camp this summer. I now have a firm grasp on the basics (actually, probably well beyond the basics) of coding in R. I’ve spent significant time learning about graphics packages, and how to import and clean data (cleaning data seems to be part of almost every class). I’ve learned a number of ways to communicate with my code from Knitr to Slidfy to RStudio Presenter and RPubs. I’m currently in the process of learning to develop data products for the web via Shiny and machine learning. It seems like just about every class has added a tool or two to my data toolbox.
That’s why it’s funny to think that just a few months ago, I was trying to figure out how to move from “for loops” to the “apply” functions. Here are links to reviews of all 9 classes in the Data Science Specialization on Coursera.
- Course 1: The Data Scientist’s Toolbox
- Course 2: R Programming
- Course 3: Getting and Cleaning Data
- Course 4: Exploratory Data Analysis
- Course 5: Reproducible Research
- Course 6: Statistical Inference
- Course 7: Regression Models
- Course 8: Practical Machine Learning
- Course 9: Developing Data Products
- Thoughts on Completing the 9 Johns Hopkins Data Science Courses