A Deep Dive into ARIMA, SARIMA, and Their Relationship with Deep Learning for Time Series Forecasting
In recent years, deep learning has become a dominant force in many areas of data analysis, and time series forecasting is no exception. Traditional models like ARIMA (Autoregressive Integrated Moving Average) and its seasonal extension SARIMA have long been the go-to solutions for forecasting time-dependent data. However, newer models based on Recurrent Neural Networks (RNNs), particularly Long Short-Term Memory (LSTM) networks, have emerged as powerful alternatives. Both approaches have their strengths and applications, and understanding their relationship helps in choosing the right tool for the right problem.
In this blog post, we’ll explore ARIMA and SARIMA models in detail, discuss how they compare to deep learning-based models like RNNs, and demonstrate their practical implementation.
Deep Learning and Time Series Forecasting
Deep learning is a subset of machine learning where models learn hierarchical features from data using multiple layers of neural networks. When it comes to time series forecasting, one of the most common deep learning architectures used is Recurrent Neural Networks (RNNs).
RNNs are particularly well-suited for time series because they are designed to handle sequential data, where the output at each time step depends not only on the current input but also on the previous inputs. This is achieved by maintaining a hidden state that gets updated at each time step, allowing the model to “remember” past information.
Here are the key components of RNNs and their relevance to time series forecasting:
- Sequential Memory: RNNs are built to retain information across time steps. This makes them suitable for forecasting problems where patterns are spread across time, such as stock prices or weather data.
- Backpropagation Through Time (BPTT): Unlike traditional feedforward neural networks, RNNs are trained using a variant of backpropagation known as BPTT, where the network adjusts its weights by considering errors over multiple time steps.
- Long Short-Term Memory (LSTM): A variant of RNNs, LSTMs are particularly useful in long-term forecasting because they are designed to overcome the vanishing gradient problem, allowing them to capture long-term dependencies in data.
While ARIMA and SARIMA focus on modeling the linear relationships in time series data, RNNs and LSTMs can capture complex non-linear dependencies. This makes RNNs more flexible, but they also require larger datasets and more computational power to train effectively.
How RNNs Relate to ARIMA and SARIMA Models
Although RNNs and ARIMA/SARIMA models operate differently, they share common ground in the context of time series forecasting:
- Time Dependence: Both models are designed to forecast time-dependent data, meaning they consider historical information to predict future values.
- Lagging Features: ARIMA uses lagging features (i.e., past values) directly, while RNNs learn patterns through sequential memory.
- Complexity vs Simplicity: ARIMA/SARIMA models are simpler and more interpretable but may struggle with complex, non-linear patterns. RNNs, on the other hand, can model non-linearity but require more data and computational resources.
In this article, we will primarily focus on ARIMA and SARIMA models, their theoretical foundations, and how they are practically applied to time series forecasting. We’ll compare their strengths to RNNs and understand when to use which approach.
Understanding ARIMA and SARIMA Models
Time Series Fundamentals
At the heart of time series forecasting is the ability to recognize and model patterns in data that evolves over time. This includes:
- Trend: A long-term upward or downward movement in the data.
- Seasonality: Cyclical patterns that repeat at regular intervals, such as daily, weekly, or yearly fluctuations.
- Stationarity: A stationary time series has a constant mean and variance over time. Many forecasting models, including ARIMA, require the data to be stationary to perform well.
- Autocorrelation: The correlation between a time series and its lagged values. ARIMA models rely heavily on autocorrelation to predict future values.
ARIMA: Autoregressive Integrated Moving Average
The ARIMA model is a well-established statistical approach to time series forecasting. It works by combining three components:
- Autoregressive (AR): The model regresses the target variable against its own previous values.
- Integrated (I): This step involves differencing the data to remove trends and make the series stationary.
- Moving Average (MA): The model includes a moving average component to account for the errors of past predictions.
The general ARIMA model is expressed as ARIMA(p, d, q), where:
- p: The number of autoregressive terms.
- d: The degree of differencing.
- q: The number of lagged forecast errors used in the prediction.
SARIMA: Seasonal ARIMA
While ARIMA works well for non-seasonal data, time series data often contains seasonal patterns. SARIMA extends ARIMA by incorporating seasonal components:
- P: Seasonal autoregressive terms.
- D: Seasonal differencing.
- Q: Seasonal moving average terms.
- s: The length of the season (e.g., 7 for weekly seasonality).
A SARIMA model is expressed as ARIMA(p, d, q) x (P, D, Q, s), where both non-seasonal and seasonal components are considered.
Steps in Building ARIMA and SARIMA Models
- Data Preparation: Ensure the time series data is stationary. If not, apply differencing to make it stationary.
- Model Identification: Use tools like autocorrelation plots (ACF) and partial autocorrelation plots (PACF) to choose appropriate values for p, d, and q.
- Model Fitting: Train the ARIMA/SARIMA model on historical data.
- Forecasting: Use the fitted model to predict future data points.
- Model Evaluation: Measure the accuracy of the forecast using metrics like Mean Absolute Error (MAE).
Deep Learning vs Traditional Models
When deciding between RNNs and ARIMA/SARIMA models, it’s important to consider the complexity and nature of the data:
- ARIMA/SARIMA: Best suited for small to medium-sized datasets with linear patterns and clear seasonality. They require minimal data preprocessing but struggle with non-linearity.
- RNN/LSTM: Better suited for large datasets with complex, non-linear patterns. They excel at capturing long-term dependencies but need more data and computation to be effective. This is particularly useful for multi-step forecasts.
Code Implementation of ARIMA and SARIMA with RNN Comparison
In this section, we will implement the two code and we will compare ARIMA and SARIMA models with Recurrent Neural Networks (RNNs) for time series forecasting.
1. ARIMA Basic Forecast Code
The ARIMA model is used to forecast rail ridership for the next day (June 1, 2019), assuming data ends on May 31, 2019.
from statsmodels.tsa.arima.model import ARIMA # Define the origin and end date for the dataset origin, today = "2019-01-01", "2019-05-31" # Assume the time series is stored in a pandas dataframe rail_series = df.loc[origin:today]["rail"].asfreq("D") # Build the ARIMA model model = ARIMA(rail_series, order=(1, 0, 0), seasonal_order=(0, 1, 1, 7)) # Fit the model to the data model = model.fit() # Forecast the rail ridership for June 1, 2019 y_pred = model.forecast() # returns 427,758.6
Explanation:
- ARIMA Setup: The
order=(1, 0, 0)
sets up the ARIMA model with one autoregressive term (AR), no differencing (d=0
), and no moving average term (MA). - Seasonal Component: The
seasonal_order=(0, 1, 1, 7)
adds seasonal differencing (D=1
), a seasonal moving average (Q=1
), and a seasonal period of 7 days (weekly seasonality). - Forecast: After fitting the model, the predicted ridership for June 1, 2019, is 427,758.6 passengers.
2. SARIMA with Daily Retraining and MAE Calculation
In this code, we extend the SARIMA model to retrain it daily for each day from March 1 to May 31, 2019. The forecasts are then compared to actual values, and the Mean Absolute Error (MAE) is calculated to evaluate the performance.
# Define the time period and data range origin, start_date, end_date = "2019-01-01", "2019-03-01", "2019-05-31" time_period = pd.date_range(start_date, end_date) rail_series = df.loc[origin:end_date]["rail"].asfreq("D") # Empty list to store predictions y_preds = [] # Loop through each day in the time period and retrain the model for today in time_period.shift(-1): model = ARIMA(rail_series[origin:today], order=(1, 0, 0), seasonal_order=(0, 1, 1, 7)) # Fit the model model = model.fit() # Forecast for the next day and append to predictions list y_pred = model.forecast()[0] y_preds.append(y_pred) # Convert the predictions into a pandas Series y_preds = pd.Series(y_preds, index=time_period) # Calculate the Mean Absolute Error (MAE) mae = (y_preds - rail_series[time_period]).abs().mean() # MAE is 32,040.7
Explanation:
- Daily Retraining: The SARIMA model is retrained every day based on data up to the current day. This allows it to adapt better to recent data trends.
- Time Period: The forecasts are made for each day between March 1 and May 31, 2019. The predictions are stored in the list
y_preds
. - Evaluation (MAE Calculation): The Mean Absolute Error (MAE) measures the average error in the predictions. Here, the model produces an MAE of 32,040.7, which indicates the average error in the ridership prediction over the given time period.
Comparison with Recurrent Neural Networks (RNNs)
Now that we have implemented ARIMA and SARIMA models, let’s explore how they compare with Recurrent Neural Networks (RNNs) for time series forecasting.
Strengths of ARIMA and SARIMA:
- Simplicity: ARIMA and SARIMA models are relatively straightforward to implement and interpret, particularly for linear, seasonal data.
- Data Requirements: These models perform well on small to medium-sized datasets without requiring extensive computational resources.
- Seasonality: SARIMA can handle seasonal patterns explicitly, which is useful for datasets with known seasonality (e.g., weekly, monthly patterns).
Limitations of ARIMA and SARIMA:
- Linear Assumptions: Both ARIMA and SARIMA models assume linear relationships in the data. They may struggle with complex, non-linear patterns.
- Long-term Dependencies: These models work well with short-term forecasts but may not capture long-term dependencies as effectively.
Why Use RNNs for Time Series Forecasting?
Recurrent Neural Networks (RNNs) are designed to handle sequential data like time series, where the future value depends on previous values. Unlike ARIMA and SARIMA, RNNs are capable of modeling both linear and non-linear relationships, making them powerful for complex time series forecasting.
Strengths of RNNs:
- Sequential Memory: RNNs have a hidden state that retains information from previous time steps, allowing the model to “remember” past values and make better forecasts for long sequences.
- Non-linearity: RNNs can model non-linear patterns in the data, which is critical for complex time series that have intricate patterns.
- Handling Long-term Dependencies: With variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), RNNs can capture long-term dependencies that are difficult for ARIMA/SARIMA to handle.
Limitations of RNNs:
- Data Requirement: RNNs typically require larger datasets to train effectively compared to ARIMA/SARIMA models.
- Complexity: RNNs are computationally intensive, requiring more resources for training and tuning.
- Interpretability: Unlike ARIMA/SARIMA models, RNNs can be treated as black-box models. It is harder to interpret the relationships learned by RNNs.
When to Use ARIMA/SARIMA vs. RNNs?
- ARIMA/SARIMA: These models are better suited for small datasets with linear relationships and seasonal patterns. They are easier to interpret and require fewer computational resources.
- RNNs (LSTM/GRU): If your time series data is large, has non-linear relationships, or involves long-term dependencies, RNNs or their variants (like LSTMs or GRUs) may provide better accuracy.
Recap to ARIMA with a small real example
Recap again : What is ARIMA?
ARIMA (AutoRegressive Integrated Moving Average) is a time series forecasting model. It uses three components:
- Autoregressive (AR): Predicts future values based on past values.
- Integrated (I): Applies differencing to make the series stationary.
- Moving Average (MA): Uses past forecast errors to improve predictions.
ARIMA is typically represented as ARIMA(, , ), where:
- is the number of autoregressive terms.
- is the degree of differencing.
- is the number of moving average terms.
Example: ARIMA(1,1,1) Step-by-Step
Given this data:
Time (t) | Value (y) |
---|---|
t=1 | 50 |
t=2 | 55 |
t=3 | 54 |
t=4 | 57 |
We aim to predict using ARIMA(1,1,1).
Step 1: Differencing (I = 1)
First, we apply first differencing to remove trends:
For our data:
The differenced series is .
Step 2: Autoregression (AR = 1)
In AR(1), we predict the next differenced value using :
Where:
- (constant)
- (autoregressive coefficient estimated from autocorrelation)
How is Estimated:
To estimate , we calculate the lag-1 autocorrelation of the differenced series . Here’s the calculation:
- Find the mean of the differenced series:
- Covariance between and :
- Variance of :
- Autocorrelation:
We assume is approximately for this example.
Using and :
The predicted differenced value is **3.1**.
Step 3: Moving Average (MA = 1)
The MA(1) component adjusts the prediction based on the previous error :
Where:
- (previous error)
- (moving average coefficient)
Now, calculate the adjusted prediction:
The adjusted prediction is **2.8**.
Step 4: Reverse Differencing to Get Final Prediction
Finally, we reverse the differencing to bring the prediction back to the original scale:
Final Prediction
The predicted value for is **59.8**.
Summary
- Differencing removes trends in the data.
- Autoregression (AR) predicts the next value using the previous differenced value.
- Moving Average (MA) adjusts the prediction using past forecast errors.
- Reversing the differencing brings the prediction back to the original scale.
Overview
Objective:Now Lets Use a Recurrent Neural Network (RNN) to predict based on the previous values example exactly of , , , and , and demonstrate how the model improves over multiple training iterations.
Given Time Series Data:
Step 1: Data Preparation
1.1 Organize Data into Sequences
We create input-output pairs for training:
Training Target Output:
Prediction Input Sequence:
1.2 Reshape Data
Reshape for RNN input:
Step 2: Define the RNN Model
2.1 Model Architecture
- Input Size: 1
- Hidden Units: 1
- Output Size: 1
2.2 Initialize Weights and Biases
2.3 Activation Function
Use for the hidden state activation.
2.4 Initial Hidden State
Initial hidden state:
Step 3: Training Iterations
We will perform three training iterations to observe how the model improves.
Iteration 1
3.1 Forward Propagation
Time Step t = 1
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 2
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 3
- Input:
Compute Hidden State :
Compute Output :
3.2 Loss Calculation
Target Output:
Predicted Output:
Calculate Loss:
3.3 Adjust Output Scaling
To address the mismatch in output scale, set .
Recalculate Output :
Recalculate Loss:
3.4 Backpropagation Through Time (BPTT)
Compute Gradients
Gradient w.r.t :
Gradient w.r.t at t = 3:
Gradient w.r.t at t = 3:
Total Gradients
Total Gradients:
3.5 Update Weights
Using learning rate :
Iteration 2
4.1 Forward Propagation
Time Step t = 1
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 2
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 3
- Input:
Compute Hidden State :
Compute Output :
4.2 Loss Calculation
Calculate Loss:
Note: The loss has decreased from 2.003 to 1.962.
4.3 Backpropagation Through Time (BPTT)
Compute gradients similarly to Iteration 1 but with updated values.
Gradient w.r.t :
Gradient w.r.t at t = 3:
Gradient w.r.t at t = 3:
4.4 Update Weights
Iteration 3
5.1 Forward Propagation
Time Step t = 1
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 2
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 3
- Input:
Compute Hidden State :
Compute Output :
5.2 Loss Calculation
Calculate Loss:
Note: The loss has decreased from 1.962 to 1.9215.
5.3 Backpropagation Through Time (BPTT)
Gradient w.r.t :
Gradient w.r.t at t = 3:
Gradient w.r.t at t = 3:
5.4 Update Weights
Step 4: Prediction
After three iterations, we use the updated weights to predict .
4.1 Prediction Input Sequence
Input sequence:
4.2 Forward Propagation
Time Step t = 1
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 2
- Input:
Compute Hidden State :
Compute Output :
Time Step t = 3
- Input:
Compute Hidden State :
Compute Output :
4.3 Predicted
Conclusion
- Initial Loss: 2.003
- Loss after Iteration 3: 1.9215
- Predicted : Improved from 54.9985 to 55.0590
Final Thoughts
By performing multiple training iterations, we observed the following:
- Decrease in Loss: The loss gradually decreased with each iteration, indicating the model is learning.
- Improved Predictions: The predicted value for became more accurate over the iterations.
Key Takeaways
- Gradient Descent: Repeatedly updating weights using gradients reduces the loss.
- Learning Rate: A small learning rate ensures stable convergence.
- RNN Capability: Even with limited data, the RNN adjusts its weights to better fit the training data.
I hope this extended explanation, including multiple iterations, provides a clearer understanding of how an RNN learns and improves its predictions over time ; We have provided the ARIMA and SARIMA concepts to understand better how predictions can happen in other ways as well
Note to Understand RNN Steps in this Example
1. Sequence-to-One Prediction
Sequence Input: The RNN takes a sequence of inputs and processes them over time steps.
Single Output: It produces a single output after processing the entire sequence.
This is called a many-to-one or sequence-to-one prediction model.
2. Training Phase
Purpose: Teach the RNN to predict the next value in a sequence based on previous values.
Process:
- Provide sequences where the next value is known.
- The RNN learns patterns from these sequences.
What’s Being Calculated from the Sequence?
- From the input sequence, we are calculating the hidden states at each time step .
- Using the hidden states and the learned weights, we compute the output at each time step.
- The weights , , and are learned during training and remain fixed during prediction.
Detailed Explanation
1. During Training
Goal: Learn the weights that minimize the difference between the predicted output and the actual target.
- Input Sequence:
- Target Output:
Process:
-
Time Step :
Input:
Compute Hidden State:
Compute Output:
-
Time Step :
Input:
Compute Hidden State:
Compute Output:
-
Time Step :
Input:
Compute Hidden State:
Compute Output:
Prediction: is compared to to compute the loss.
Weights Update
Using backpropagation through time (BPTT), we compute gradients of the loss with respect to the weights.
Weights are updated to minimize the loss.
2. During Prediction
Goal: Use the learned weights to predict the next value based on a new input sequence.
- Input Sequence:
- We use the same learned weights from training.
Process:
-
Time Step :
Input:
Compute Hidden State:
Compute Output (intermediate):
-
Time Step :
Input:
Compute Hidden State:
Compute Output (intermediate):
-
Time Step :
Input:
Compute Hidden State:
Compute Output:
Prediction: is our predicted .
Key Points:
- Hidden States : Calculated from the input sequence and previous hidden states.
- Weights : Remain fixed during prediction; they are the result of training.
- Prediction: The final output is the RNN’s prediction for the next value in the sequence.
Summary
- From the Sequence, we are calculating the Hidden States at each time step.
- Using the Hidden States and the Weights , we compute the Outputs .
- During Training:
- We adjust the Weights to minimize the loss between and the actual .
- During Prediction:
- We use the learned Weights to compute , which is our predicted .
Visualization
Training Phase
Here’s a visualization of the training phase:
Input Sequence: y1 y2 y3 ↓ ↓ ↓ Time Steps: t=1 t=2 t=3 ↓ ↓ ↓ Hidden States: h1 h2 h3 ↓ Output: \hat{y}_3 (compared to y4)
Weights are updated based on the loss between and .
Prediction Phase
Here’s a visualization of the prediction phase:
Input Sequence: y2 y3 y4 ↓ ↓ ↓ Time Steps: t=1 t=2 t=3 ↓ ↓ ↓ Hidden States: h1 h2 h3 ↓ Output: \hat{y}_3 (prediction for y5)
Weights remain fixed; we use them to compute the prediction.
Notes
- We are calculating the Hidden States from the input sequence.
- The Weights are learned during training and are used (not calculated) to compute the hidden states and outputs.
- The Hidden States and Weights together allow us to predict during training and during prediction.
Key Takeaways
- Sequence → Hidden States: The sequence of inputs is used to compute hidden states at each time step.
- Hidden States + Weights → Outputs: The hidden states and learned weights are used to compute outputs.
- Weights:
- Adjusted during training to minimize loss.
- Remain fixed during prediction.