Exploratory Analysis Of My College Football Pickems
It’s week 3 for College Football, which means another ego-crushing round of College Football Pickems. The first couple of weeks have not been too kind to me. Week 1 wasn’t too bad, but due to procrastination in week 2 I missed several picks for the earlier games. Now I enter week 3 tied for last in the standings. With two weeks worth of data at my disposal and bragging rights on the line, I think it is time for a little analysis to see if I can improve my performance.
Building a simple decision framework
In the long run I would like to attempt to build a predictive model to improve my pick selection. But for the sake of time this week I have simply downloaded my data into a spreadsheet with the intent to conduct some basic exploratory analysis to uncover some patterns in the 2 week dataset that I can apply to make my picks.
Getting the data right
Since Yahoo! does not offer an easy way to export the data, I am forced to copy the HTML tables and paste them into my spreadsheet. This gets the data into the spreadsheet, however, it’s very messy and requires some cleaning. Using some functions I extract useful data to help me create the following variables:
- Favorite Cover (did the favorite cover?)
- Final spread
- Spread group (spread divided by 3)
- Favorite Home (was the favorite the home team?)
- Favorite BCS (was the favorite a BCS team?)
- Underdog BCS (was the underdog a BCS team?)
With this handful of variables I was able to summarize the data in Pivot Table to identify any insights that may be useful.
What do the data show
My first question I wanted to answer was how frequently did the favorites cover? Expecting something close to 50/50 I was surprised by the results…
Wow! The handicappers must be having a rough year like I am. So this would imply that if I had selected the underdog in every match I would have won 58% of my games thus far, which would easily place me at the top of the standings. Oh well. But maybe I can use this simple trend for my week 3 selections. My gut says there is more to it than this so let’s keep digging.
My next question is do favorites cover better at home or away? Survey says…
It looks like it doesn’t matter if the favorite is the home or the away team, so we can scratch that off as an indicator for now. What if the game is between BCS and non-BCS teams?
Now we may be on to something. The data indicate that underdogs are much more successful than average in BCS vs non-BCS matchups. What about BCS vs BCS games?
Hmm…although results still skew towards the underdog they are very close to the expected 50/50 outcome. Now what happens when I factor in the point spread?
Another useful pattern. The ‘Spread Group’ is an aggregation of the actual point spread divided by 3 (representing a field goal) to create higher level groups. It looks like the higher the point spread the less likely the favorites will cover. Let’s get visual with this:
I feel like I am really close. I continue this process of mixing and matching the variables and I end up with the following. Given point spread and underdog BCS affiliation seem to be the best indicators for success, I combine them as follows:
I’d like to continue with this exploration but I’m running short on time, but I think I have enough insights to create a basic College Football decision model.
The College Football Pickem Decision Model
Here is my decision framework based o the exploratory analysis outlined above.
|BCS versus non-BCS||1 – 9||Favorite|
|BCS versus non-BCS||9 –15||Underdog|
|BCS versus non-BCS||18-27||Favorite|
|BCS versus non-BCS||>27||Underdog|
|BCS versus BCS||1-3||Favorite|
|BCS versus BCS||3–12||Underdog|
|BCS versus BCS||12-24||Favorite|
|BCS versus BCS||>24||Underdog|
If I apply this decision framework to the week 3 matchups then I expect to win 84% of the qualifying games (some of the games in my league for this week are off the board). This still has me selecting the favorites 52% of the time, which is several points higher than the first 2 weeks, but the dynamics of the matchups are now changing such that the majority of games are BCS vs BCS matchups. I sincerely doubt I will come close to the 84% expected probability. The sample size is small and there are a number of other variables that should be added to make the model more robust. Additionally, a statistical method such as k-means clustering or principal components analysis would perhaps be a better way to identify segments. Or a logistic regression model to make predict the outcome of each game. But that will be for another day. So for week 3 the picks are in and we'll review the results next week.