MAFS6010R - Portfolio Optimization with R
MSc in Financial Mathematics
Fall 2018-19, HKUST, Hong Kong

This R session will illustrate sparse index tracking, i.e., the tracking of an index with a sparse portfolio.

The R package **sparseIndexTracking**, available in GitHub and CRAN, is recommended.

(Useful R links: Cookbook R, Quick-R, R documentation, CRAN, METACRAN.)

Fund managers follow two types of investment strategies: active and passive. Active investments assume that the market is not perfectly efficient and try to beat the market, whereas passive investments assume that the market cannot be beaten in the long run.

Index tracking is a popular form of passive portfolio management strategy. The goal is to construct a portfolio that replicates the performance of a financial index. The following are comprehensive references on which the R package **sparseIndexTracking** is based:

K. Benidis, Y. Feng, and D. P. Palomar,

Optimization Methods for Financial Index Tracking: From Theory to Practice. Foundations and Trends in Optimization, Now Publishers, 2018.

K. Benidis, Y. Feng, and D. P. Palomar, “Sparse portfolios for high-dimensional financial index tracking,”

IEEE Trans. Signal Process., vol. 66, no. 1, pp. 155–170, 2018.

In words, the goal is to minimize some error measure of the index tracking using a reduced basket of representative assets as opposed to using all the assets (which would be problematic due to transaction costs and execution logistics).

To define the problem mathematically, let us introduce some definitions. Supposed that an index is composed of \(N\) assets. We denote by \(\mathbf{r}^b=[r_1^b,\dots,r_T^b]^\top\in\mathbb{R}^T\) and \(\mathbf{X}=[\mathbf{r}_1,\dots,\mathbf{r}_T]^\top\in\mathbb{R}^{T\times N}\) the (arithmetic) net returns of the index and the \(N\) assets in the past \(T\) days, respectively, with \(\mathbf{r}_t\in\mathbb{R}^N\) denoting the net returns of the \(N\) assets at the \(t\)-th day.

The goal in an optimization form is the design of a (sparse) portfolio \(\mathbf{w}\in\mathbb{R}_+^N\), with \(\mathbf{w}^\top\mathbf{1} = 1\), that tracks closely the index, i.e., \(\mathbf{X}\mathbf{w} \approx \mathbf{r}^b\). A typical measure of tracking error is the empirical tracking error: \[\textsf{ETE}(\mathbf{w}) = \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{w}\big\|_2.\] The underlying optimization problem is \[\begin{array}{ll} \underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{w}\big\|_2^2 + \lambda\|\mathbf{w}\|_0\\ \textsf{subject to} & \mathbf{w}^\top\mathbf{1}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\]

where \(\lambda\) is a regularization parameter that controls the sparsity of the portfolio, and \(u\) is an upper bound on the weights of the portfolio.

To warm up, we start with some simple benchmarks.

A simple design for sparse index tracking is to decompose the task into two steps: asset selection (choosing the active weights) and capital allocation (optimization of weights). We will assume we know the portfolio implemented by the actual index, which will be a dense portfolio (as opposed to sparse) \(\mathbf{b}\) such that \(\mathbf{X}\mathbf{b} = \mathbf{r}^b\). In practice, it is expensive to get the information on \(\mathbf{b}\), so we will just regress it.

**Step 1: Asset selection**

We can define the selection of \(K\) assets through the vector \(\mathbf{s}\), defined as \[s_i=\begin{cases}
1\quad & \text{if the }j\text{-th asset is selected}\\
0 & \text{otherwise},
\end{cases}\] with \(\mathbf{1}^T\mathbf{s}=K\). For example, we can simply choose the \(K\) largest assets of \(\mathbf{b}\).

**Step 2: Capital allocation**

Once the active assets are known, then the portfolio design is simple. One option is to define the portfolio in the same proportion as the index composition (but only using the active assets): \[\mathbf{w}=\frac{\mathbf{b}\odot\mathbf{s}}{\mathbf{1}^T(\mathbf{b}\odot\mathbf{s})}\] where \(\odot\) denotes Hadamard (elementwise) product. Another option is to fit the index returns subject to the sparsity pattern given by the asset selection: \[\begin{array}{ll}
\underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}(\mathbf{w}\odot\mathbf{s})\big\|_2^2\\
\textsf{subject to}
& \mathbf{1}^\top\mathbf{w}=1\\
& \mathbf{w}\ge\mathbf{0}.
\end{array}\]

MIP finds the best combination by trying all possible combinations of \(K\) active assets out of \(N\), i.e., \(\binom{N}{K}\) combinations. However, this approach is only practical for small dimensions, e.g., even choosing \(K=20\) assets out of \(N=100\) is infeasible since \(\binom{100}{20}>10^{20}\).

Genetic algorithms is a family of meta-optimization methods that encode a candidate solution in a genome and then they let a whole population of candidate solutions to evolve from generation to generation imitating the natural selection process of life itself. They are also computationally too intensive.

We will illustrate the two-step approach. We start by loading the market data of the index S&P 500 and its underlying assets during 2010 (from the package **sparseIndexTracking**):

```
library(xts)
library(sparseIndexTracking) #install.packages("sparseIndexTracking")
data(INDEX_2010)
T <- nrow(INDEX_2010$X)
N <- ncol(INDEX_2010$X)
X <- INDEX_2010$X
r <- INDEX_2010$SP500
# training and test sets
T_trn <- round(0.7*T) # define the training set
T_tst <- T - T_trn
X_trn <- X[1:T_trn, ]
r_trn <- r[1:T_trn, ]
X_tst <- X[-(1:T_trn), ]
r_tst <- r[-(1:T_trn), ]
```

The data `INDEX_2010`

contains a list with two xts objects:

**1.** `X`

: A \(T\times N\) `xts`

with the daily linear returns of the \(N\) assets that were in the index during the year 2010 (total \(T\) trading days)

**2.** `SP500`

: A \(T\times 1\) `xts`

with the daily linear returns of the index S&P 500 during the same period.

Since we don’t know the index composition \(\mathbf{b}\), we will get an estimate by solving the constrained regression \[\begin{array}{ll} \underset{\mathbf{b}}{\t