Gas Prices

Return to Gas and Oil - 01/10/16

There hasn't been oil selling below $30 a barrel since 2003

Update - 01/31/16

Still working on this a little bit. Also been looking a little at forecasting related for oil price prediction.

As far as data goes, a little more formal arff file type documentation in the front of train_blend.arff. Setting up a separate data page that can be referenced fron here or the gas price model page, or whereever at oil data.

Improvement to the gas price model - 01/16/16

Gas prices go up in the summer right? Maybe this is partly because of increased demand - with more miles being driven, but it is also due to a seasonal change in the blend of the gas. If I include a feature for these blends to the existing model I think I get close to the accuracy for gas price prediction that I originally claimed. This is described in more detail at blends. The dates I used for the model came from this gasbuddy blog. I am assuming for now that the categories - SUMMER,FALL,WINTER, will include all necessary seasonal information that I need.

Quick retraction - 01/12/16

The testing results I gave were erroneous. Before anyone offers me a job. The results were more accurate than they should of been. I had messed up in setting up the training and test datasets. Including some overlapping data. When they say don't train on the same data you test on, they are correct. I set up another pair of datasets where I trained on 2000-2013 and tested on 2014 and 2015. I think the predicted and price means were off by about 17 cents and the maximum error was seventy some cents. The mean was probably more accurate than it deserved to be. Only about half the records had an error of 25 cents or less. Definitely not as good.

However, I did set up yet another pair of datasets tonight. This included the production/consumption data so it could only include the years from 2010 on that are available from the short term energy outlook page. So I set the training data for the years 2010-2014 and tested on 2015...
> mean(test$Price)
[1] 2.388521
> mean(resbal$predicted)
[1] 2.311187
So, off on average less than 8 cents again.
> max(abs(resbal$error))
[1] 0.584
The maximum prediction error is still the rather sizey 58 cents. Not so good.
> table(resbal$error < .10)

FALSE TRUE
24 72
But most of the predictions were off by less than 10 cents. So, including production/consumption information and using later data with fewer years gets some of our better results back. Although again, this was sort of just thrown together and I may of made errors in doing that tonight.

If anyone wants to check me on that this time...
train_bal.arff
test_bal.arff

This still might be worth some effort trying to improve the prediction accuracy although I would like to put time into seeing how well I can predict the oil prices. This probably is not easy, there are oil futures where professionals put money on well these prices are forecast. I would also like to get into timeseries a little as well yet.

But my apologies. My prior claims for accuracy were not valid.


There will be no updated download for this yet this weekend. Hopefully soon, but if you're looking for that it won't be here yet

With gas prices now at lows we haven't seen since the start of the 'great recession' it has again become sort of interesting. I noticed at one point that oil prices were a trending Google search right around New Years. The low oil prices are being blamed for a large part of the problems stocks are having. Although, of course, cheaper gas prices are a benefit to many consumers, myself probably more than most. There has also been even more severe economic turmoil in China. That would incidentally figure into this in ways such as reduced demand for global oil, if the Chinese economy falters. Since oil is traded in dollars currency concerns could figure in. A favorite Chinese approach to problems is devaluing the Yuan. The US Dow Jones and I believe Standard Poor's indices have had their worst start for a year in decades. Dramatic things relating to oil are going on.

Although these days the main reason for the extremely low gas prices is that much more oil is being produced than consumed than usual. Does the data confirm this? [The data I found for this, production/consumption, only goes back to 2010 Short-Term Energy Outlook , and appears to have no api interface] There will hopefully be more on all of this before I move on to something else again.

Before, for currencies I was considering data for individual currencies from Federal Reserve data. This time I thought an index might be better. There is a dollar index. Better suited and easier to obtain historical data for, seemed to be the Trade Weighted Index the Federal Reserve came up with. Some of the api's used by the EIA (Energy Information Administration), have changed. So, my code had to change. I have decided to get away from having the 'ant' build tool drive the whole process. I thought it worked surprisingly well for the purpose but seems even more unlikely now than before to catch on as a popular usage of the tool. This meant for one thing eliminating the use of the ant 'get' task to obtain some of the data, I tracked down some code and came up with my own. So changes in how the data is handled or obtained has been involved in much of what I've done with this so far this time around.

Getting predictive, with some of the additional machine learning knowledge I've acquired over the last couple years would be another reason to return to this project. I did get some of the data loaded into 'R' and saved back into csv files that were then converted to arff for Weka. Considering first the question of whether you can predict gas prices the answer is pretty easy. Yes, if you have oil prices data. If you don't have oil prices, probably not. Oil prices is still the biggest part of gas prices and probably the most variable.

For example for Weka linear regression I had put together a file with a number of attributes, the standard and poor index values, the trade weighted currency index, and oil prices. Also date, although I assumed that would be ignored as irrelevant. It be less irrelevant if I try to figure out time series. Weka reduced this to...
Linear Regression Model

price =

-0.0002 * sp +
0.0344 * oil +
0.7213

With a minimal negative influence from the stock market and the only other attribute contribution coming from oil prices. This gives very accurate results really. I took out a training set of all data from the year 2000. I then made the month of January for the next year the test data. I figured I would do this for every year to make sure the results are consistent but so far only did 2000.

I think this was for RandomForest, I output the Weka results as csv and read them into 'R'...
> mean(res2001$predicted)
[1] 1.397137
So my average predicted price was $1.39.
mean(test2001.csv$Price,rm.na=T)
[1] 1.40102
The average actual price was $1.40
What was the biggest error I actually had?
max(abs(res2001$error))
[1] 0.077
So the farthest off I ever was on predicting gas prices was less than 8 cents. Considering gas prices these days can sometimes move 15 or 20 cents a day that isn't bad. Anyhow, gas prices when given oil prices, are easy and not very interesting. How about predicting the oil prices? There might be more interesting things considered as this goes along. At least I will try to get actual content uploaded at some point.

Introduction

I have for a while had an interest in gas prices. I pump gas pretty much every week day. This in turn led to an interest in oil prices. I heard at one time that the price of oil accounts for 75% of the cost of gas. So I follow the price of oil pretty much daily as well as pumping gas. Every once in a while despite watching the price of oil a gasoline price change up or down will catch me by surprise. Say oil seems to be reasonably steady trading in a narrow range when all of a sudden gas makes a 20 cent move up. What accounts for this?

Gas price changes lag Oil price changes

The importance of oil to gas prices and how gas price changes follow oil price changes with a lag is considered at lag

What is moving the price of oil?

OK, so you've looked at the above, or you already appreciated the importance of the price of a barrel of oil to the cost of a gallon of a gas. The next question then seems to be - so what is currently drviving the price of oil? The normal economics answer of supply and demand seems a little too simplistic. Below is a list of things I have heard can influence this price.

CSV data and POI

In accessing the Yahoo finance historical data for the Dow Jones [1] and Standard & Poor [2] all I could find was csv data not xls, Excel spreadsheet type files. To shorten this a little bit for the time being I wrote some code to batch handle CSV files to POI xls. It allows a parameter that indicates a template for column sort handling and column data type. See the us.hall.poi.CSV2Xls class. Hopefully the code is sort of self documenting for now. Anyhow, I thought someone might just find this useful and decided to upload something at this point.

Working with xsl spreadsheets on OS X without Microsoft Excel

A brief digresstion for a couple of shameless plugs. For working with Microsoft Office format documents I have recently used both NeoOffice [3] and what NeoOffice is based on, OpenOffice [4]. NeoOffice was my first choice because it was the one offering the PPC support that I still required at the time. Since then the upgrade path for NeoOffice now appears to require a donation to go forward. Now I have an Intel machine so I switched to OpenOffice. Both worked fine in meeting my requirements for working with Excel or Word format documents.

Stock Indices and Oil Prices

OK, it's time to start figuring out what we can look at to see what is driving the price of oil. The data in the following spreadsheet, seems to indicate that I was pretty far off in my thinking about what was currently going on with indices and oil prices. I thought the difficulties with our own economy coupled with the current debt crisis in Europe were creating a tidal effect [5]. In this case the tide, unfortunately, would be going out. I believe correlations can show this sort of trend and that the spreadsheet actually does for the "great recession". Notice how the correlations spike in 2008 and remain high through 2010. This was probably a full cycle of a down, a bottom, and a sort of recovery tidal effect.

However, also notice that the correlation is considerably down for 2011 and that the last four weeks it actually went negative. This data is a little stale, the indices did end 'up' this last week. The reason I heard on CNBC for the Friday gain wasn't real heartening though. Paraphrased, "A lack of bad news from Europe". Hurrah? When this is complete I plan on trying to have something to weekly automate bringing the data current. When that happens I will probably eliminate most of this excess accompanying verbiage. The reason for the negative correlation at this time is that the indices tended slightly down while oil prices, also slightly, rose in this period.

NOTE: This correlation behavior apparently is not the case within the indices themselves where the tidal effect seems very evident. Sector ETF Correlations Highest Since Financial Crisis - Seeking Alpha

I thought negative correlations would be indicators for a 'safe haven' trend. This could indicate that oil was being used as a alternative to troubled equity markets. At some point I will probably try to add something for trading volume which I'd assume would also pick up if more money than usual was moving into oil trading. Even just unusual 'speculation' activity with correlations remaining positive. If anyone has any other ideas for how to pick unusual trading activity out of the data that is affecting pricing I would be glad to hear the thoughts. In the current case I do not think these trends are present. Oil prices haven't been showing the steady climb I'd assume if you had bubble/boom type money moving into it, like gold recently seemed to be showing. I believe it just took a couple knee-jerk drops along with the equity markets but then has shown a little more stability in resisting that volativity and is trying to restore itself to some level above those drops. I could of course be as wrong as I was in thinking it would prove to be closely correlating with the equity market surges.

You may of noticed that the DJIA correlates better with, or more closely tracks, the price of oil than does the S&P. Doing a little quick adding with a calculator gave me a average correlation of .6267 for the Dow and .5760 for Standard & Poors. I may throw in something else at some point that should more closely track the oil price. Another Seeking Alpha article was of some interest...
How to track the Price of Crude Oil in 2011 - Seeking Alpha

indices.xls donwload

The long delayed currency spreadsheet. You should see the information for the Fed large currencies index vs. oil prices. Scroll right to see the data for the 'selected currencies'. Right now those are the Euro and the Yuan. The correlations mostly surprised me in how high they are. I might need to double check I'm figuring that right and not somehow doing something wrong to generate high values. Other than that for now I'll let you interpret the data yourself. You might look for signs of impending global economic collapse in the Euro and the notorious Chinese currency manipulation in the Yuan data.

currency.xls donwload

Update - 03/13/12

Well this includes a updated oilPrice.zip again with source if of interest. Also now including links for the spreadsheets above, if simply those are of any interest. The main change since the last still being the yearly data for the indices spreadsheet. It might be noted for this, that I continue to include the average span correlation. I think as a weighted average favoring more recent years it might be some sort of indicator as to whether in recent years the indices are correlating more or less than the normal yearly average. Given the difference in the two averages it is fairly obvious that they have recently been correlating much more with the indices than usual.

I still think the process itself might be more interesting to most people than the actual contents. Being able to access financial data from a variety of online sources, process it and upload something publicly available related to that data could be of more general interest? Also having a process in place that can manage that periodicly more or less automated, if still not exactly real time, is sort of interesting to me.

Still mostly of interest to me is actually the content and how well you can come up with data that indicates what is and might continue to influence the price of oil and in turn gas prices at the pump. But then I pump gas pretty much every weekday.

The currency spreadsheet as previously mentioned should include the same sort of yearly data as indices now does. Otherwise, still next will probably be an attempt at something on oil inventories vs. expectaions. Maybe a little dull compared to currencies.

Download

oilPrice.zip

Possible correlations to oil prices

[1] ^DJI Historical Prices | Dow Jones Industrial Average Stock - Yahoo! Finance
[2] ^GSPC Historical Prices | S&P 500 INDEX,RTH Stock - Yahoo! Finacnce
[3] NeoOffice Home
[4] OpenOffice.org - The Free and Open Productivity Suite
[5] A rising tide lifts all boats - Wikipedia, the free encyclopedia
[6] douglascrockford/JSON-java - GitHub
[7] http://senior.ceng.metu.edu.tr/2009/praeda/wp-content/uploads/2009/01/restclient.java
[8] google-docs-upload - A tool for batch uploading documents to a Google Docs account with recursive directory traversing. - Google Project Hosting