We plan to explore the relationship between food & gas prices and are going to use a machine learning model to examine what might have been. We'll look to see what our model predicts food and gas prices at had Covid-19 not disrupted the world in 2020.
We'll then compare the prediction to the actual data to see how far off the results were.
We picked this topic because world-wide events have changed our lives in an unprecedented way. This change is hard to quantify or to conceptualize. Exploring the link, if there is one, between food and gas prices is a tangible way for us to try to quantify the effect that Covid-19 and the war between Ukraine and Russia has had and has on our daily lives.
We will use two set of data from Kaggle.com:
"Food Prices in US Cities" documents the Consumer Price Index (CPI) for food monthly from 1952 to July 2022.
"US Gasoline and Diesel Retail Prices" documents the prices for various types of gas weekly from 1995 to January 2021.
We hope to answer the question: what would gas prices and food prices be like if the years 2020-2022 had been "normal" and rather uneventful on a worldly scale?
We also hope to explore the relationship between gas and food prices.
- Joe: Square * Suchi & Priyanka: Triangle * Torrey: Circle * Melissa: X *
Torrey Setup a group channel for us to keep up to date between class sessions. We also decided to share email's and phone numbers as team to be sure everyone was reachable. We had some Saturday sessions to collaborate & mostly independant project focus time.
"Food Prices in US Cities" documents the CPI for food monthly from 1952 to July 2022.
"US Gasoline and Diesel Retail Prices" documents the prices for various types of gas weekly from 1995 to January 2021.
Standardizing the data and cleaning it up was a crucial part of this process. Torrey spent hours prepping the information. The Gasoline data originally had a weekly set of dates & values, however the Comodity prices is listed monthly We considered several ways to standardize the weekly set of data. We could do an Average of the monthly price, or use either the first or last week's number of each month. Even after dropping the extra values to focus on a monthly, rather than weekly data set we needed to focus on the Month & Year for our Postgres database with cleaned up information & to examine the data to learn more about it.
Shuchi setup a chart to look at the relationship between the average gas price variance within the different categories. When overlayed with a time based serries looking at CPI & Gas Prices we see a fairly consistent rhythm in the CPI pricing and much more volitility within the gas pricing. In General food pricing has increased at a more rapid pace than the gas prices over time.
Since the various types of gas are displaying a close relationship we can also look with one category for a more simple visualization. This is much more clear.
Shuchi & Priyanka tested 4 different models with gusto. Random Forest Regressor, Linear Regression & Brown Exponential Smoothing, as well as the Triple Exponential Smoothing method. The data time frame ranges from January 1995, thru January 2021. We limited the gas prices to a monthly figure and to standardize to match the CPI monthly update. We filtered the data to drop the last 22 months, so March 2019-2021 so that we could train the models to evaluate the various predictions to compare the results. The resulting graphs are really interesting. In the following images, the training data is Blue, Predictions in Green & Actual Results in Yellow.
The predictions on CPI pricing from Random Forest were more conservative than the actual data. Instead of a slight dip and huge jump, we see the Random Forest Regressor actually predicted a slight decline in price followed by a general leveling off of prices.
The predictions from the Linear Regression model are much more volitile & erratic. We see even bigger jumps in the data and more frequent changes than reality showed us. To my eyes it almost looks like the linear regression model predicted the CPI with a pattern more similar to the gas prices where we see much more fluid prices and instant market adjustments. Whereas with CPI in general changes are gradual and more noticable yearly rather than monthly.
The Brown Simple Exponential Smoothing Method had the most accurate results. Note the strong spike shortly after Covid struck. Thepredictions are nearly spot on and show a similar trajectory of growth in CPI prices. The dramatic spike in costs are still being absorbed but it's fascinating to see that computer models can only go so far in predicting the future. Sometimes we just can't see what's coming.
A near match for the Brown Simple exponential method with it's results.
Though Gas Prices run in near parallel, it's interesting to note that some categorys are always more expensive than others making me wonder about the costs associate with preparing the raw materials for market. This chart has a fun animation so you can watch the prices populate on the graph.
note the inverse relationship between gas & food prices at the onset of the pandemic.
Seeing the numbers is one thing, but having context and being able to look at the data through the lense of Historical moments in time gives the numbers more tangible meaning & helps understand the time scale more clearly.
Melissa & Torrey created Tableau Visuals with several graphs, storys, and dashboards to show our data.Click link to view all these visuals & more!
The presentation we'll be using for class is above.
Luckily all of us have grown used to working remotely with a team, I think we developed a good cadence, and meeting twice a week as a group helped us divide tasks so we weren't all working on the same piece. The crucial skills of open communication helped us to work together and negotiate our different roles so we weren't overlapping on the project too much, though we all helped with various aspects. One of my biggest lessons as the database manager was learning how to merge correctly. We were also able to work in different branches to better allow for changes to take place on the different files. That way we weren't using the same file by mistake and adding in conflicting code. Where we did make mistakes we were able to edit and update our files to resolve the conflicts. We now understand working with a group repository much better.
Though we learned that the prices of Gas & Food continue to rise, the gas pricing proves to be much more volitile. There are so many more flucctuations than we anticipated. We thought the gas pricing would have an immediate effect on the food pricing, but we learned that food pricing is more seasonable, and relatively more stable than gas. We learned much during this project. It was enlightening to look at the history of these comodity prices to learn more about the relationship with historical events. The more we can understand how these prices are affected the better we can plan for the future.