Democrats to retain control of the Senate


Final for now - 11/09/14 Having sort of established that my current model was skewed for the Democrats, I didn't attempt another update before the election. As I, less closely, followed it, the elections took a more Republican swing toward the end and the result did in fact end up Republican control of the Senate. I took copies of the final polling results from the RealClearPolitics link as well as FiveThirtyEight predictions from a few days before the election. I will at some point update the model with that data. Hopefully, improving things for the next time around.

Since to me, and I think to a few others, a lot of the election seemed to hinge on the low voter turnout I may try to incorporate that into the model somehow. I may for an election's test data try to have another model to try and predict turnout that could be used in the test data. As well as try to track down past data to use for training

'It' for this for now though, there could still be a new Election Games including something along these lines for 2016?


Update 10/07 I decided to a little quick checking related to the 'pretty strange' probabilities from Weka I mentioned in the last update.
Related 'R' script: hyptest.R.
This determines the differences between my Weka model probabilities and the FiveThirtyEight probabilities. These are assumed to be 'paired' samples. Not independent.

You can run that with R, from a directory with the predictions_1004.csv and senate_test_2014_1004.csv files, with: source("hyptest.R",print.eval=TRUE)
Which should result in...


   Paired t-test

data: pred$probability and test$fivethirtyeight
t = 3.5711, df = 35, p-value = 0.001058
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  3.556429 12.926904
sample estimates:
mean of the differences
           8.241667

The null hypothesis that the difference in means is equal to zero is rejected. You can also see this from the control interval not even including 0. Given my convention of representing Democratic numbers as positive and Republican as negative, the positive control interval and mean seems to confirm the intuition that for whatever reason my model favors Democrats more than FiveThirtyEight. In elections I think the 8.24% average in favor of the Democrats could be huge. Again, at this time I would consider this to be due to insufficient training data for my model and would consider FiveThirtyEight to be more accurate.

Update 10/04. Meant to pretty much just update 'numbers'. Add current polling, FiveThirtyEight, and Cook Report ratings. I decided to throw in probabilities as well this time from the Weka Random Forest predictions. Easy enough to do. Looking at those though shows some pretty strange ones, compared to say FiveThirtyEight. My current thought is that this is due to the lack of training data. So there is over-emphasis of some attribute like incumbent. So, no amount of improving the 'test' data is probably going to get me better results. Given that I probably won't put much more effort into improving things but rather continue efforts into more data collection, past and present to improve not just the 'test' dataset but maybe current and hopefully future 'training' datasets.

Update 09/21. Not really much time again. I decided to update my current 'test' data for the Senate election and then started thinking about how to maybe improve it. Currently I figured I'd keep some sort of average based on current polls provided by Real Clear Politics resulting so far in...
Latest
updtest_1004.R
senate_test_2014_1004.arff
senate_test_2014_1004.csv
senate_predictions_1004.csv
Previous
updtest.R
senate_test_2014_0921.arff
senate_test_2014_0921.csv
senate_predictions_0921.csv

09/24 More or less completed the update. Predictions as above. All changes should be included in updtest.R. Besides polling also current FiveThirtyEight numbers as well as changes to the Cook Report ratings. I think the Cook ratings didn't change but I had somehow missed the 'likely democrat' category. After that for my prognostication it still comes out to a two seat Republican gain. You can, if you like, compare and contrast to other results at Who Will Win The Senate? -- The Upshot Sente Forecasts

Using the poll means and some other ideas from Wikipedia Political Forecasting I myself consider Wikipedia a usually reliable source.

Poll damping sounds a bit like weighted averaging with older polls counting less. Which is, if I remember right, what I do in Election Games. But I really haven't a chance to read up on it much. I looked a little bit at the Iowa prediction market. But this seemed to only have three categories Senate loss/hold/gain. The numbers could very well still be updated for this, the model may or may not see more improvement before the election besides simple poll averaging.

Note: You might have heard about the just released NBC/Marist polls that include results for Arkansas, Kentucky and Colorado. It may occur to you to wonder, does that means any changes for the polling data that I used? It didn't seem to since the only real difference I saw was to make Kentucky more Republican which I already predict it to be. This could possibly be an advantage to the FiveThirtyEight methodology as I understand it. I believe that uses a poll based Bayesian model which is probably easily updatable with new poll results right up to election day. My understanding, anyhow.

You heard it here first

There are many who are predicting that Republicans will assume majority control of the US Senate in this years midterm elections. This would be a significant event in American politics for some time to come. Being able to predict things like this is one of the things that interested me about machine learning. With the elections coming up soon I have put together a quick model to predict the outcome of the Senate races. Finding the data was more difficult than I thought it would be. References below includes most of the links that I ended up using. You would think for something like US Senate races there would be redundant historical data that could be easily found. This is the coming age of big data right? Not so, I had thought to inlude multiple poll results but couldn't even find one poll provider that could be used for all results in the 2010 and 2012 elections. FiveThirtyEight indicates using 949 polls, I don't know where they're finding them. I had intended more past years but again it didn't seem likely I could find the past data. State by state presidential approval ratings were also not easy.

This could possibly be handled by allowing for more missing values in the data or some way to weight some states having different numbers of polls. This could improve accuracy for the current years and allow for inclusion of sparser past years. Much of what is there now correlates a lot. If a state polls Republican it is probably favored in the Cook Race Ratings, as well as the FiveThirtyEight predictions and probably has a low approval rating for President Obama. This should probably be considered a model limitation as well.

What I could add but won't have time for this weekend is something related to the Hotelling distance code I added to the last version of my Election Games application. In the case of missing data for candidates here you could just use averages for the candidates party instead. Something similar to what I'm doing here could end up in the next version of Election Games.

I used Weka once I had the data and my usual first try of Bagging with Random Forest. This got me 94% accuracy on the training data. Also, as per the heading results against the test set that I think would mean the Democrats keeping a majority in the Senate. Most of the conventional wisdom seems to be with the Republicans gaining control. Looking at the data It seemed, for one thing, like Arkansas should probably have been picked as Republican. It is given a Republican edge both in the polls and by FiveThirtyEight, it also has a lower Obama approval rating. The only thing I can figure out is that it has learned even with the limited data that incumbents tend to have an edge. I checked out of curiosity finding this, CNN Poll: Key Arkansas Senate race a dead heat, so it actually could be close. Maybe the prediction isn't that far off.

As limited an attempt as this currently might be, I would be interested in any constructive feedback. If you get better results with the training data. Or, have an idea for an attribute that should be included. Or, want to send in your predictions for the test set, if I get any of these maybe I'll put up a leaderboard after the elections showing who did best. Anything of that sort feel free to email me at EMail: contact

Attributes

A brief description of the data attributes...

year year of the election.

midterm 1 if a midterm election, 0 otherwise.

state two character state name abbreviation.

pres political party of the current president.

incumbent_party political party of the current office holder.

cook_rating Cook political race rating. Current or as close to the election as I found.

polling Number indicating lead in poll. + number democrat or - republican. This started out Rasmussen but I started filling with whatever I found that seemed a close representative poll.

fivethirtyeight Probability of win indicated by FiveThirtyEight. Again, + number democrat or - republican.

pres_approval Presidential approval rating.

class D or R, the actual class.

Files

senate_test_2014.arff Weka preferred arff test data.

senate_train_2014.arff Weka preferred arff training data.

senate_test_2014.csv The csv test data version.

senate_train_2014.csv The csv training data version.

predictions.csv My predictions against the test data.

References

2010 approval ratings
Polling for 2014
FiveThirtyEight predictions for 2012
Some polling from 2012 Rasmussen polling
2012 rest from Wiki 2012 polling
This gives Final and Rasmussen poll results for 2010
The FiveThirtyEight 2010 predictions
Used for the Cook Ratings
FiveThirtyEight blog considering presidential approval ratings.
Used for Gallup presidential approval ratings 2010.
Gallup presidential approval ratings 2013.