Forecasting Temperature#
🚩 Time Series Forecasting 🚩
The main objective of this project was to develop a methodology for poredicting the temperature in Jena, Germany.
Two approaches have been explored for this purpose.
Statistical Approach:
🔸 Statistical properties of the time series, mainly its decomposition into trend and seasonality, as well as its simple and partial autocorrelations, are analysed to specify a set of SARIMA models suitable for such characteristics.
🔸 The models specified using inner evaluation are compared with time series cross validation.
🔸 The best one is selected and its performance in predicting the temperature in the next 15 days is estimated.
Machine Learning Approach:
🔸 Three forecasting approaches are followed: univariate (recursive and direct), multivariate and multiple.
🔸 For each approach a multitude of models are considered and optimised and compared in the inner evaluation phase with time series cross validation.
🔸 The best model is selected and its performance in predicting the temperature in the next 15 days is estimated.
In addition, an extra approach based on outlier detection and imputation is considered as a preprocessing method that could lead to improvements in the predictive performance of the models. This could allow avoiding some distortion generated by outliers in the training phase.
⚠️ The following task is left as a possible future improvement:
Search for the best model for the short, medium and long term, instead of searching for the best model for a single period. The idea is that the best model to predict the next 15 days does not necessarily have to be the one that best predicts the next 3, or the next 5 etc. You could look for the best model to predict the next 3 days (short term), the best one to predict from day 4 to 7 (medium term) and the best one to predict from day 8 to 15 (long term). The final 15-day prediction would be the predictions given by these three models for each of these periods, respectively.
🛠 Throughout this project, three frameworks have been mainly used:
🔸 Statsmodels: to apply statistical time series models.
🔸 Skforecast: to apply machine learning models to time series, with the three forecasting approaches mentioned above.
🔸 PyTS: a framework I developed as part of this project that allows applying machine learning techniques and models to time series problems.