Florida’s Housing Market: A Journey Through Time

Nick K.
4 min readJan 4, 2021
Photo by Kimo

As the sun begins to set on my Flatiron journey, one of the remaining challenges to tackle is working with time series data. An extremely useful tool on the data science workbench, the ability to analyze time series data has many applications in finance and business.

Using housing data from Zillow collected from between 1996 and 2018, the objective was to build a model to forecast the median sale price for each zip code.

Melting The Dataframe

One of the unique challenges in working with this data was altering it from its original format into a more workable one. The data was recorded in a “wide format” and requires some re-shaping to turn it into the more usable “long format.”

Luckily pandas has a built-in method for “melting” a dataframe. Using a small function to convert the existing wide columns into long form is as simple as calling pd.melt and specifying the identifier and variable columns with id_vars and var_name.

Which results in a much more usable long formatted dataframe!

Building an ARIMA Model

The next step is building a model to predict out-of-sample forecasts. Using the pmdarimalibrary (the equivalent of R’s auto.arimafunctionality), we were able to employ a…

--

--