Blogs

Blogs

Look Ma, I'm on Hugo!

The last couple months I’ve been playing around with migrating this blog over from WordPress to Hugo, the static site generator built on Go. Due to “reasons,” many of which probably don’t apply to the average WordPress blogger, and some of which involved me being finicky about certain expectations, this turned out to be more of a hassle than I anticipated. While the new version of the site has been up and running for over a month now, I’ve been slow to post because I’ve continued to spend my time tweaking and improving, but I can feel the old itch to write.

Continue reading

Keep on Movin’ On

  This blog has gone pretty quiet the last 6 months or so, which usually signals I’m up to something new. In fact, this time it’s a move across the country for another career move. My new opportunity takes me to a part of the country which is a big change for a guy who’s never lived outside of rural Central Illinois. Heading South I’m now located in the Queen City of Charlotte, NC.

Continue reading

Retro Game Retrieval Engine Design

I’ve got a new Shiny web app that I’ve embedded on another site where I’m doing some experimental things, and I wanted to talk generally about how I created it. The web app can be found at the following link that allows the user to do interactive searches for similar classic games for home consoles from what are generally known as the third generation (NES, Sega Master System) through the sixth generation (Wii, PS2, Xbox).

Continue reading

Back2School with Vectors, Cosine Similarity, and Word2Vec

Tomorrow, I’ll be making a return visit to the high school where I spent a decade in the mathematics department as a teacher. I’ve got the chance to speak to ten classes over the course of six class periods and tell them a little bit about what I do as a data scientist. Since many of the students will be familiar with concepts like vectors and trigonometry, I’ve decided to do an activity involving the Python gensim package and Word2Vec.

Continue reading

A New Introduction to Spark 2.1 Dataframes with Python and MLlib

A couple of years ago, when I was in the midst of my rookie year as a data scientist, I wrote a blog post and tutorial about using the Python Spark API to build a simple model from housing data with Spark dataframes. Despite the simple nature of the model (a straight train-test split with multivariate linear regression), it was one of the more challenging tutorials I’ve ever written for this blog.

Continue reading

Machine Learning Specialization Cut Short by Coursera

After an extremely long wait, today was the day that the fifth course in Coursera’s Machine Learning Specialization was set to begin. I’ve been with this specialization since it launched in the fall of 2015. Students were initially promised an ambitious slate of six courses, including a capstone that would wrap up by early summer of 2016. With noted husband and wife couple Carlos Guestrin and Emily Fox, previously of Carnegie Mellon and now of the University of Washington, this sounded like a great option.

Continue reading

Minivan Price Comparison With R

With my family growing once again and my 13-year-old Mazda Protégé on the fritz, I recently decided it was time to go minivan shopping. A frugal shopper, some might say cheap, I quickly set my focus on the used, domestic market and found that there are only two competitors here, the Dodge Grand Caravan and the Chrysler Town and Country. Two questions immediately came to mind: As these two minivans are, for all practical purposes identical (manufactured at the same facility, same internals, just different branding), if one compared them with a similar set of features, does one name carry a price premium over the other?

Continue reading

University of Washington Machine Learning Classification Review

I’ve spent the last couple of months working through course three in the University of Washington’s Machine Learning Specialization on Coursera. Course two was regression (review); the topic of the third course is classification. As has been the case with previous courses, this specialization continues to be taught by Carlos Guestrin and Emily Fox. For the classification course, Dr. Guestrin took the lead. The time requirements did increase a bit with this third course, not excessively, but it felt like I was working an extra hour or so a week on it.

Continue reading

Coursera Review–Machine Learning: Regression

I’ve recently completed the second course in the University of Washington Machine Learning Specialization on Coursera, “Machine Learning: Regression.” This comes on the heels of completing course 1, Machine Learning Foundations: A Case Study Approach. This course debuted right at the end of November and wrapped up 6 weeks later (my impression is that these courses are slipping a bit behind the timeline that was originally announced). I’d encourage you to read my review of the first course above, as I was left satisfied with the learning experience I received in the first class, but wondering if some of the concerns that students raised would be addressed.

Continue reading

Constructing a Social Graph With Twitter and Plotly

In a couple of earlier posts, I showed an example of a social graph created from Twitter data and Plotly, a graph of relationships between educational technology enthusiasts on Twitter. Those posts were more for the educator audience that I write for, but increasingly, I’m getting feedback on my posts from other data scientists, so I’ve decided to include my code, both here on this blog and at my Github account.

Continue reading