This R session will illustrate the different models for univariate and multivariate financial time series, in particular, for the conditional mean and conditional covariance matrix or volatility.

Mean models

This section explores conditional mean models.

Packages

The following R packages are widely used by the R community for mean modeling:

  • rugarch: [see intro to rugarch] probably the most comprehensive package with a neverending list of different ARMA and GARCH models, including different distributions for the residual, as well as utilities for simulating, forecasting, and plotting;
  • forecast: also a popular package for ARMA modeling (see the book https://otexts.com/fpp2);
  • MTS: multivariate time series modeling based on Tsay’s book.

There are many other packages worth exploring such as prophet, tsDyn, vars, rucm, etc. For a full list of packages for time series modeling, check the CRAN Task View: Time Series Analysis.

i.i.d. model

We will start with the simple i.i.d. model as a warm-up. The i.i.d. model assumes an \(N\)-dimensional Gaussian time series for the log-returns \(\mathbf{x}_t\): \[ \mathbf{x}_t \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma}). \] The sample estimators for the mean and covariance matrix are, respectively, the sample mean \[ \hat{\boldsymbol{\mu}} = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t \] and the sample covariance matrix \[ \hat{\boldsymbol{\Sigma}} = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T(\mathbf{x}_t - \hat{\boldsymbol{\mu}}). \]

We start by generating synthetic data to become familiar with the procedure and to make sure the estimation process gives the correct result (i.e., sanity check). Only once everything is under control we will move to real market data and fit the different models.

Let’s generate synthetic i.i.d. data and estimate the mean and covariance matrix:

Now, let’s do it again for different number of observations \(T\):

Univariate ARMA model

An ARMA(p,q) model on the log-returns \(x_t\) is \[ x_t = \phi_0 + \sum_{i=1}^p \phi_i x_{t-i} + w_t - \sum_{j=1}^q \theta_j w_{t-i} \] where \(w_t\) is a white noise series with zero mean and variance \(\sigma^2\). The parameters of the model are the coefficients \(\phi_i\), \(\theta_i\), and the noise variance \(\sigma^2\).

Note that an ARIMA(p,d,q) model is simply an ARMA(p,q) model on a time series that has been differenced \(d\) times. So if we \(x_t\) denotes instead the log-prices, then the previous model on the log-returns is actually an ARIMA(p,1,q) model, because differencing once the log-prices we obtain the log-returns.

Synthetic data generation with package rugarch

We will use the package rugarch to generate synthetic univariate ARMA data, estimate the parameters, and forecast.

First, we need to define the model order:

Then, we can generate one realization of the time series path:

ARMA fitting with package rugarch

Now, we can estimate the parameters (which we already know):

We can also study the effect of the number of samples \(T\) in the error of the estimation of parameters:

First, note that the true \(\mu\) is almost zero so the relative error may appear overly erratic. In any case, it is well-known that the estimation of \(\mu\) is very noisy. The other coefficients seem to be well estimated after \(T=800\) samples.

ARMA model selection with package rugarch

In the previous experiments, we implicitly assumed we knew the order of the ARMA model, i.e., \(p=1\) and \(q=0\). In practice, the order is unknown and one has to try different combinations of orders. The higher the order, the better the fit, but this will inevitable produce overfitting. Many methods have been developed to penalize the increase of the order complexity to avoid overfitting, e.g., AIC, BIC, SIC, HQIC, etc.

In this case, the order was properly detected because the number of observations \(T=1000\) is large enough. If instead, one tries with \(T=200\), then the detected order is \(p=1, q=3\).

ARMA forecasting with package rugarch

Once the ARMA model parameters have been estimated, \(\hat{\phi}_i\) and \(\hat{\theta}_j\), one can use the model to forecast the values ahead. For example, the forecast of \(x_t\) based on the past information is \[ \hat{x}_t = \hat{\phi}_0 + \sum_{i=1}^p \hat{\phi}_i x_{t-i} - \sum_{j=1}^q \hat{\theta}_j w_{t-i} \] and the forecast error will be \(x_t - \hat{x}_t = w_t\) (assuming the parameters have been perfectly estimated), which has a variance of \(\sigma^2\). The package rugarch makes the forecast of the out-of-sample data straightforward:

Multivariate VARMA model

A VARMA(p,q) model on the log-returns \(\mathbf{x}_t\) is \[ \mathbf{x}_t = \boldsymbol{\phi}_0 + \sum_{i=1}^p \boldsymbol{\Phi}_i \mathbf{x}_{t-i} + \mathbf{w}_t - \sum_{j=1}^q \boldsymbol{\Theta}_j \mathbf{w}_{t-i} \] where \(\mathbf{w}_t\) is a white noise series with zero mean and covariance matrix \(\mathbf{\Sigma}_w\). The parameters of the model are the vector/matrix coefficients \(\boldsymbol{\phi}_0\), \(\boldsymbol{\Phi}_i\), \(\boldsymbol{\Theta}_j\), and the noise covariance matrix \(\mathbf{\Sigma}_w\).

The package MTS is probably a good one for VARMA modeling.

Static comparison

Let’s start by loading the S&P500:

Now, we use the training data (i.e., for \(t=1,\dots,T_\textsf{trn}\)) to fit different models (note that out-of-sample data is excluded by indicating out.sample = T_tst). In particular, we will consider the i.i.d. model, AR model, ARMA model, and also some ARCH and GARCH models (to be studied later in more detail for the variance modeling).

library(rugarch)

# fit i.i.d. model
iid_spec <- arfimaspec(mean.model = list(armaOrder = c(0,0), include.mean = TRUE))
iid_fit <- arfimafit(spec = iid_spec, data = logreturns, out.sample = T_tst)
coef(iid_fit)
#>           mu        sigma 
#> 0.0005712982 0.0073516993
mean(logreturns_trn)
#> [1] 0.0005681388
sd(logreturns_trn)
#> [1] 0.007360208

# fit AR(1) model
ar_spec <- arfimaspec(mean.model = list(armaOrder = c(1,0), include.mean = TRUE))
ar_fit <- arfimafit(spec = ar_spec, data = logreturns, out.sample = T_tst)
coef(ar_fit)
#>            mu           ar1         sigma 
#>  0.0005678014 -0.0220185181  0.0073532716

# fit ARMA(2,2) model
arma_spec <- arfimaspec(mean.model = list(armaOrder = c(2,2), include.mean = TRUE))
arma_fit <- arfimafit(spec = arma_spec, data = logreturns, out.sample = T_tst)
coef(arma_fit)
#>            mu           ar1           ar2           ma1           ma2         sigma 
#>  0.0007223304  0.0268612636  0.9095552008 -0.0832923604 -0.9328475211  0.0072573570

# fit ARMA(1,1) + ARCH(1) model
arch_spec <- ugarchspec(mean.model = list(armaOrder = c(1,1), include.mean = TRUE), 
                        variance.model = list(model = "sGARCH", garchOrder = c(1,0)))
arch_fit <- ugarchfit(spec = arch_spec, data = logreturns, out.sample = T_tst)
coef(arch_fit)
#>            mu           ar1           ma1         omega        alpha1 
#>  6.321441e-04  8.720929e-02 -9.391019e-02  4.898885e-05  9.986975e-02

# fit ARMA(0,0) + ARCH(10) model
long_arch_spec <- ugarchspec(mean.model = list(armaOrder = c(0,0), include.mean = TRUE), 
                             variance.model = list(model = "sGARCH", garchOrder = c(10,0)))
long_arch_fit <- ugarchfit(spec = long_arch_spec, data = logreturns, out.sample = T_tst)
coef(long_arch_fit)
#>           mu        omega       alpha1       alpha2       alpha3       alpha4       alpha5 
#> 7.490786e-04 2.452099e-05 6.888561e-02 7.207551e-02 1.419938e-01 1.909541e-02 3.082806e-02 
#>       alpha6       alpha7       alpha8       alpha9      alpha10 
#> 4.026539e-02 3.050040e-07 9.260183e-02 1.150128e-01 1.068426e-06

# fit ARMA(1,1) + GARCH(1,1) model
garch_spec <- ugarchspec(mean.model = list(armaOrder = c(1,1), include.mean = TRUE), 
                         variance.model = list(model = "sGARCH", garchOrder = c(1,1)))
garch_fit <- ugarchfit(spec = garch_spec, data = logreturns, out.sample = T_tst)
coef(garch_fit)
#>            mu           ar1           ma1         omega        alpha1         beta1 
#>  6.660346e-04  9.664597e-01 -1.000000e+00  7.066506e-06  1.257786e-01  7.470725e-01

We are ready to use the different models to forecast the log-returns:

We can compute the forecast errors (in-sample and out-of-sample) of the different models:

We can observe that as the complexity of the model increases, the in-sample error tends to become smaller as expected (due the more degrees of freedom to fit the data), albeit the difference is neglibible. The important quantity is really the out-of-sample error: we can see that increasing the model complexity may give disappointing results. It seems that the simplest iid model is good enough in terms of error of forecast returns.

Finally, let’s show some plots of the out-of-sample error:

Observe that since we are not refitting the models, the error tends to become worse as the time advances (it’s particularly pronounced for the ARCH modeling).

Rolling-window comparison

Let’s first compare the concept of static forecast vs rolling forecast with a simple example:

We can clearly observe the effect of the rolling-window process in keeping track with the time series.

Now we can redo all the forecast for all the models on a rolling-window basis:

Let’s see the forecast errors in the rolling-basis case:

and some plots:

We see that now all the models track the time series. Also, we don’t observe any significant difference among the models.

We can finally compare the static vs rolling basis errors:

We can see that a rolling refitting not only does not hurt but in some cases is a must. Thus, in practice we need to refit on a regular basis.

Variance models

Packages

The following R packages are widely used by the R community for volatility clustering modeling:

  • fGarch: this is a popular package for GARCH models;
  • rugarch: [see intro to rugarch] probably the most comprehensive package with a neverending list of different ARMA and GARCH models, including different distributions for the residual, namely, Gaussian/normal distribution, Student distribution, generalized error distributions, skewed distributions, generalized hyperbolic distributions, generalized hyperbolic skewed Student distribution, Johnson’s reparameterized SU distribution, as well as utilities for simulating, forecasting, and plotting. It includes the following GARCH models:
    • standard GARCH model (’sGARCH’)
    • integrated GARCH model (’iGARCH’)
    • exponential GARCH model
    • GJR-GARCH model (’gjrGARCH’)
    • asymmetric power ARCH model (’apARCH’)
    • Exponential GARCH model
    • family GARCH model (’fGARCH’)
      • Absolute Value GARCH (AVGARCH) model (submodel = ’AVGARCH’)
      • GJR GARCH (GJRGARCH) model (submodel = ’GJRGARCH’)
      • Threshold GARCH (TGARCH) model (submodel = ’TGARCH’)
      • Nonlinear ARCH model (submodel = ’NGARCH’)
      • Nonlinear Asymmetric GARCH model (submodel = ’NAGARCH’)
      • Asymmetric Power ARCH model (submodel = ’APARCH’)
      • Full fGARCH model (submodel = ’ALLGARCH’)
    • Component sGARCH model (’csGARCH’)
    • Multiplicative Component sGARCH model (’mcsGARCH’)
    • realized GARCH model (’realGARCH’)
    • fractionally integrated GARCH model (’fiGARCH’);
  • rmgarch: extension of rugarch to the multivariate case. It includes the models: CCC, DCC, GARCH-Copula, GO-GARCH (vignette);
  • MTS: multivariate time series modeling based on Tsay’s book;
  • stochvol: stochastic volatility modeling based on MCMC (computationally intensive).

There are many other packages worth exploring. For a full list of packages for time series modeling, check the CRAN Task View: Time Series Analysis.

ARCH and GARCH models

An ARCH(\(m\)) model on the log-returns residuals \(w_t\) is \[ w_t = \sigma_t z_t \] where \(z_t\) is a white noise series with zero mean and constant unit variance, and the conditional variance \(\sigma_t^2\) is modeled as \[ \sigma_t^2 = \omega + \sum_{i=1}^m \alpha_i w_{t-i}^2 \] where \(m\) is the model order, and \(\omega>0,\alpha_i\ge0\) are the parameters.

A GARCH(\(m\),\(s\)) model extends the ARCH model with a recursive term on \(\sigma_t^2\): \[ \sigma_t^2 = \omega + \sum_{i=1}^m \alpha_i w_{t-i}^2 + \sum_{j=1}^s \beta_j \sigma_{t-j}^2 \] where the parameters \(\omega>0,\alpha_i\ge0,\beta_j\ge0\) need to satisfy \(\sum_{i=1}^m \alpha_i + \sum_{j=1}^s \beta_j \le 1\) for stability.

Synthetic data generation with package rugarch

First, we need to define the model order:

library(rugarch)

# specify an GARCH model with given coefficients and parameters
garch_fixed_spec <- ugarchspec(mean.model = list(armaOrder = c(1,0), include.mean = TRUE), 
                               variance.model = list(model = "sGARCH", garchOrder = c(1,1)),
                               fixed.pars = list(mu = 0.005, ar1 = -0.9, 
                                                 omega = 0.001, alpha1 = 0.3, beta1 = 0.65))

garch_fixed_spec
#> 
#> *---------------------------------*
#> *       GARCH Model Spec          *
#> *---------------------------------*
#> 
#> Conditional Variance Dynamics    
#> ------------------------------------
#> GARCH Model      : sGARCH(1,1)
#> Variance Targeting   : FALSE 
#> 
#> Conditional Mean Dynamics
#> ------------------------------------
#> Mean Model       : ARFIMA(1,0,0)
#> Include Mean     : TRUE 
#> GARCH-in-Mean        : FALSE 
#> 
#> Conditional Distribution
#> ------------------------------------
#> Distribution :  norm 
#> Includes Skew    :  FALSE 
#> Includes Shape   :  FALSE 
#> Includes Lambda  :  FALSE

garch_fixed_spec@model$pars
#>           Level Fixed Include Estimate LB UB
#> mu        0.005     1       1        0 NA NA
#> ar1      -0.900     1       1        0 NA NA
#> ma        0.000     0       0        0 NA NA
#> arfima    0.000     0       0        0 NA NA
#> archm     0.000     0       0        0 NA NA
#> mxreg     0.000     0       0        0 NA NA
#> omega     0.001     1       1        0 NA NA
#> alpha1    0.300     1       1        0 NA NA
#> beta1     0.650     1       1        0 NA NA
#> gamma     0.000     0       0        0 NA NA
#> eta1      0.000     0       0        0 NA NA
#> eta2      0.000     0       0        0 NA NA
#> delta     0.000     0       0        0 NA NA
#> lambda    0.000     0       0        0 NA NA
#> vxreg     0.000     0       0        0 NA NA
#> skew      0.000     0       0        0 NA NA
#> shape     0.000     0       0        0 NA NA
#> ghlambda  0.000     0       0        0 NA NA
#> xi        0.000     0       0        0 NA NA

garch_fixed_spec@model$fixed.pars
#> $mu
#> [1] 0.005
#> 
#> $ar1
#> [1] -0.9
#> 
#> $omega
#> [1] 0.001
#> 
#> $alpha1
#> [1] 0.3
#> 
#> $beta1
#> [1] 0.65

true_params <- unlist(garch_fixed_spec@model$fixed.pars)
true_params
#>     mu    ar1  omega alpha1  beta1 
#>  0.005 -0.900  0.001  0.300  0.650

Then, we can generate one realization of the returns time series path:

GARCH fitting with package rugarch

Now, we can estimate the parameters (which we already know):

We can also study the effect of the number of samples \(T\) in the error of the estimation of parameters:

First, note that the true \(\omega\) is almost zero so the relative error is very erratic and should be dismissed. As for the other coefficients, like in the ARMA case, the estimation of mu is really bad (above 50% relative error), whereas the other coefficients seem to be well estimated after \(T=800\) samples.

GARCH forecasting with package rugarch

Once the parameters of the GARCH model have been estimated, one can use the model to forecast the values ahead. For example, the one-step forecast of the conditional variace \(\sigma_t^2\) based on the past information is \[ \hat{\sigma}_t^2 = \hat{\omega} + \sum_{i=1}^m \hat{\alpha}_i w_{t-i}^2 + \sum_{j=1}^s \hat{\beta}_j \hat{\sigma}_{t-j}^2 \] with unconditional variance given my \(\hat{\omega}/(1 - \sum_{i=1}^m \hat{\alpha}_i - \sum_{j=1}^s \hat{\beta}_j)\). The package rugarch makes the forecast of the out-of-sample data straightforward:

Envelope from different methods

Let’s start by loading the S&P500:

MA

Now, let’s use a simple rolling means (aka moving average) of the squared returns: \[ \hat{y}_{t}=\frac{1}{m}\sum_{i=1}^{m}y_{t-i} \] with \(y_t=x_t^2\) (we could have used \(y_t=(x_t - \mu_t)^2\) but the difference is negligible).

EWMA

A more adaptive version is the exponentially weighted moving average (EWMA): \[ \hat{y}_{t}=\alpha y_{t-1}+(1-\alpha)\hat{y}_{t-1} \] with \(y_t=x_t^2\). Note that this can also be modeled in component form as an ETS(A,N,N) innovations state space model: \[ \begin{aligned} y_{t} &= \ell_{t-1} + \varepsilon_t\\ \ell_{t} &= \ell_{t-1} + \alpha\varepsilon_t. \end{aligned} \]

Multiplicative ETS

We can also try different variations of the ETS models. For example, the multiplicative noise version ETS(M,N,N), with innovations state space model: \[ \begin{aligned} y_{t} &= \ell_{t-1}(1 + \varepsilon_t)\\ \ell_{t} &= \ell_{t-1}(1 + \alpha\varepsilon_t). \end{aligned} \]

ARCH

We can now use the more sophisticated ARCH modeling: \[ \begin{aligned} w_t &= \sigma_t z_t\\ \sigma_t^2 &= \omega + \sum_{i=1}^m \alpha_i w_{t-i}^2. \end{aligned} \]

GARCH

We can step up our model to a GARCH: \[ \begin{aligned} w_t &= \sigma_t z_t\\ \sigma_t^2 &= \omega + \sum_{i=1}^m \alpha_i w_{t-i}^2 + \sum_{j=1}^s \beta_j \sigma_{t-j}^2. \end{aligned} \]

SV

Finally, we can use the stochastic volatility modeling: \[ \begin{aligned} w_{t} &= \exp\left(h_{t}/2\right)z_{t}\\ h_{t}-\bar{h} &= \phi\left(h_{t-1}-\bar{h}\right)+u_{t}. \end{aligned} \] or, equivalently, \[ \begin{aligned} w_{t} &= \sigma_{t}z_{t}\\ \log\left(\sigma_{t}^{2}\right) &= \bar{h}+\phi\left(\log\left(\sigma_{t-1}^{2}\right)-\bar{h}\right)+u_{t}. \end{aligned} \]

Rolling-window comparison

Rolling-window comparison of six methods: MA, EWMA, ETS(MNN), ARCH(5), GARCH(1,1), and SV.

error_sv <- error_garch <- error_arch <- error_ets_mnn <- error_ewma <- error_ma <- NULL
  
# rolling window
lookback <- 200
len_tst <- 40
for (i in seq(lookback, T-len_tst, by = len_tst)) {
  x_trn <- x[(i-lookback+1):i]
  x_tst <- x[(i+1):(i+len_tst)]
  var_tst <- var(x_tst)
  
  # MA
  var_t <- roll_meanr(x_trn^2, n = 20, fill = NA)
  var_fore <- var(x_trn/sqrt(var_t), na.rm = TRUE) * tail(var_t, 1)
  error_ma <- c(error_ma, abs(var_fore - var_tst))
  
  # EWMA
  fit_ets <- ets(x_trn^2, model = "ANN")
  std_t <- as.numeric(sqrt(fit_ets$fitted))
  var_fore <- var(x_trn/std_t, na.rm = TRUE) * tail(std_t, 1)^2
  error_ewma <- c(error_ewma, abs(var_fore - var_tst))
  
  # ETS(M,N,N)
  fit_ets <- ets(1e-6 + x_trn^2, model = "MNN")
  std_t <- as.numeric(sqrt(fit_ets$fitted))
  var_fore <- var(x_trn/std_t, na.rm = TRUE) * tail(std_t, 1)^2
  error_ets_mnn <- c(error_ets_mnn, abs(var_fore - var_tst))
  
  # ARCH
  arch_fit <- fGarch::garchFit(formula = ~ garch(5,0), x_trn, trace = FALSE)
  std_t <- as.numeric(arch_fit@sigma.t)
  var_fore <- var(x_trn/std_t, na.rm = TRUE) * tail(std_t, 1)^2
  error_arch <- c(error_arch, abs(var_fore - var_tst))
  
  # GARCH
  garch_fit <- fGarch::garchFit(formula = ~ garch(1,1), x_trn, trace = FALSE)
  std_t <- as.numeric(garch_fit@sigma.t)
  var_fore <- var(x_trn/std_t, na.rm = TRUE) * tail(std_t, 1)^2
  error_garch <- c(error_garch, abs(var_fore - var_tst))
  
  # SV
  res <- svsample(x_trn - mean(x_trn), priormu = c(0, 100), priornu = c(4, 100))
  std_t <- res$summary$sd[, 1]
  var_fore <- var(x_trn/std_t, na.rm = TRUE) * tail(std_t, 1)^2
  error_sv <- c(error_sv, abs(var_fore - var_tst))
}

error_all <- c("MA"         = mean(error_ma),
               "EWMA"       = mean(error_ewma),
               "ETS(M,N,N)" = mean(error_ets_mnn),
               "ARCH(5)"    = mean(error_arch),
               "GARCH(1,1)" = mean(error_garch),
               "SV"         = mean(error_sv))
print(error_all)
#>           MA         EWMA   ETS(M,N,N)      ARCH(5)   GARCH(1,1)           SV 
#> 1.786851e-04 2.218130e-04 2.081554e-04 1.690483e-04 1.642066e-04 9.588908e-05
barplot(error_all, main = "Error in estimation of variance", col = rainbow(6))

Multivariate GARCH models

We will consider only the constant conditional correlation (CCC) and dynamic conditional correlation (DCC) models for illustration purposes since they are the most popular ones. The log-returns residuals \(w_t\) are modeled as \[ \mathbf{w}_t = \boldsymbol{\Sigma}_t^{1/2} \mathbf{z}_t \] where \(\mathbf{z}_t\) is an i.i.d. white noise series with zero mean and constant covariance matrix \(\mathbf{I}\). The conditional covariance matrix \(\boldsymbol{\Sigma}_t\) is modeled as \[ \boldsymbol{\Sigma}_t = \mathbf{D}_t\mathbf{C}\mathbf{D}_t \] where \(\mathbf{D}_t = \textsf{Diag}(\sigma_{1,t},\dots,\sigma_{N,t})\) and \(mathbf{C}\) is the CCC covariance matrix of the standardized noise vector \(\boldsymbol{\eta}_t = \mathbf{C}^{-1}\mathbf{w}_t\) (i.e., it contains diagonal elements equal to 1).

Basically, with this model, the diagonal matrix \(\mathbf{D}_t\) contains a set of univariate GARCH models and then the matrix \(\mathbf{C}\) incorporates some correlation among the assets. The main drawback of this model is that the matrix \(\mathbf{C}\) is constant. To overcome this, the DCC was proposed as \[ \boldsymbol{\Sigma}_t = \mathbf{D}_t\mathbf{C}_t\mathbf{D}_t \] where \(\mathbf{C}_t\) contains diagonal elements equal to 1. To enforce diagonal elements equal to 1, Engle modeled it as \[ \mathbf{C}_t = \textsf{Diag}^{-1/2}(\mathbf{Q}_t) \mathbf{Q}_t \textsf{Diag}^{-1/2}(\mathbf{Q}_t) \] where \(\mathbf{Q}_t\) has arbitrary diagonal elements and follows the model \[ \mathbf{Q}_t = \alpha \boldsymbol{\eta}_t\boldsymbol{\eta}_t^T + (1-\alpha)\mathbf{Q}_{t-1}. \]

We will use the package rmgarch, which is authored by the same author as rugarch, and is used in the same way, to generate synthetic data, estimate the parameters, and forecast.

Let’s start by loading some multivariate ETF data:

  • SPY: SPDR S&P 500 ETF Trust
  • TLE: 20+ Year Treasury Bond ETF
  • IEF: 7-10 Year Treasury Bond ETF

First, we define the model:

Next, we fit the model:

We can plot the time-varying correlations:

We see the correlation between the two fixed-income ETFs is extremely high and quite stable. The correlation with the SPY is less so, shifting between 0 and quite negative.