Sparse Index Tracking via MM

MAFS6010R - Portfolio Optimization with R
MSc in Financial Mathematics
Hong Kong University of Science and Technology (HKUST)
Fall 2019-20

Outline

Introduction
Sparse Index Tracking
- Problem formulation
- Interlude: Majorization-Minimization (MM) algorithm
- Resolution via MM
Holding Constraints and Extensions
- Problem formulation
- Holding constraints via MM
- Extensions
Numerical Experiments
Conclusions

Introduction

Investment strategies

Fund managers follow two basic investment strategies:

Active

Assumption: markets are not perfectly efficient.
Through expertise add value by choosing high performing assets.

Passive

Assumption: market cannot be beaten in the long run.
Conform to a defined set of criteria (e.g. achieve same return as an index).

Passive investment

The stock markets have historically risen, e.g. S&P 500:

Partly misleading: e.g. inflation.
Still, reasonable returns can be obtained without the active management’s risk.
Makes passive investment more attractive.

Index tracking

Index tracking is a popular passive portfolio management strategy.
Goal: construct a portfolio that replicates the performance of a financial index.

Index tracking

Index tracking or benchmark replication is a strategy investment aimed at mimicking the risk/return profile of a financial instrument.
For practical reasons, the strategy focuses on a reduced basket of representative assets.
The problem is also regarded as portfolio compression and it is intimately related to compressed sensing and \(\ell_{1}\)-norm minimization techniques (Benidis et al. 2018a), (Benidis et al. 2018b).
One example is the replication of an index, e.g., Hang Seng Index, based on a reduced basket of assets.

Definitions

Price and return of an asset or an index: \(p_{t}\) and \(r_{t}=\frac{p_{t}-p_{t-1}}{p_{t-1}}\)
Returns of an index in \(T\) days: \(\boldsymbol{\mathbf{r}}^{b}=[r_{1}^{b},\dots,r_{T}^{b}]^{\top}\in\mathbb{R}^{T}\)
Returns of \(N\) assets in \(T\) days: \(\mathbf{X}=[\mathbf{r}_{1},\dots,\mathbf{r}_{T}]^{\top}\in\mathbb{R}^{T\times N}\) with \(\mathbf{r}_{t}\in\mathbb{R}^{N}\)
Assume that an index is composed by a weighted collection of \(N\) assets with normalized index weights \(\mathbf{b}\) satisfying
- \(\mathbf{b}>\mathbf{0}\)
- \(\mathbf{b}^{\top}\mathbf{1}=1\)
- \(\mathbf{X}\mathbf{b}=\mathbf{r}^{b}\)
We want to design a (sparse) tracking portfolio \(\mathbf{w}\) satisfying
- \(\mathbf{w}\geq\mathbf{0}\)
- \(\mathbf{w}^{\top}\mathbf{1}=1\)
- \(\mathbf{X}\mathbf{w}\approx\mathbf{r}^{b}\)

Full replication

How should we select \(\mathbf{w}\)?

Straightforward solution: full replication \(\mathbf{w}=\mathbf{b}\)
- Buy appropriate quantities of all the assets
- Perfect tracking

But it has drawbacks:
- We may be trying to hedge some given portfolio with just a few names (to simplify the operations)
- We may want to deal properly with illiquid assets in the universe
- We may want to control the transaction costs for small portfolios (AUM)

Sparse index tracking

How can we overcome these drawbacks?

👉 Sparse index tracking.

Use a small number of assets: \(\mathsf{card}(\mathbf{w})<N\)
- can allow hedging with just a few names
- can avoid illiquid assets
- can reduce transaction costs for small portfolios

Challenges:
- Which assets should we select?
- What should their relative weight be?

Sparse Index Tracking

Problem formulation

Sparse regression

Sparse regression: \[\begin{array}{ll} \underset{\mathbf{w}}{\mathsf{minimize}} & \left\Vert \mathbf{r}-\mathbf{X}\mathbf{w}\right\Vert _{2}+\lambda\left\Vert \mathbf{w}\right\Vert _{0} \end{array}\] tries to fit the observations by minimizing the error with a sparse solution:

Tracking error

Recall that \(\mathbf{b}\in\mathbb{R}^{N}\) represents the actual benchmark weight vector and \(\mathbf{w}\in\mathbb{R}^{N}\) denotes the replicating portfolio.
Investment managers seek to minimize the following tracking error (TE) performance measure: \[\textsf{TE}\left(\mathbf{w}\right) =\left(\mathbf{w}-\mathbf{b}\right)^{T}\boldsymbol{\Sigma}\left(\mathbf{w}-\mathbf{b}\right)\] where \(\boldsymbol{\Sigma}\) is the covariance matrix of the index returns.
In practice, however, the benchmark weight vector \(\mathbf{b}\) may be unknown and the error measure is defined in terms of market observations.
A common tracking measure is the empirical tracking error (ETE): \[\textsf{ETE}(\mathbf{w})=\frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|^2_{2}\]

Formulation for sparse index tracking

Problem formulation for sparse index tracking (Maringer and Oyewumi 2007): \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda\|\mathbf{w}\|_{0}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \]
- \(\|\mathbf{w}\|_{0}\) is the \(\ell_{0}\)-“norm” and denotes \(\mathsf{card}(\mathbf{w})\)
- \(\mathcal{W}\) is a set of convex constraints (e.g., \(\mathcal{W}=\{\mathbf{w}|\mathbf{w}\geq\mathbf{0},\mathbf{w}^{\top}\mathbf{1}=1\}\))
- we will treat any nonconvex constraint separately

This problem is too difficult to deal with directly:

👉 Discontinuous, non-differentiable, non-convex objective function.

Existing methods

Two-step approach:
1. stock selection:
  - largest market capital
  - most correlated to the index
  - a combination cointegrated well with the index
2. capital allocation:
  - naive allocation: proportional to the original weights
  - optimized allocation: usually a convex problem
Mixed Integer Programming (MIP)
- practical only for small dimensions, e.g. \(\binom{100}{20}>10^{20}\).
Genetic algorithms
- solve the MIP problems in reasonable time
- worse performance, cannot prove optimality.

Existing methods

The two-step approach is much worse than joint optimization:

R session: Index tracking with two-step approach

A simple design for sparse index tracking is to decompose the task into two steps: asset selection (choosing the active weights) and capital allocation (optimization of weights). We will assume we know the portfolio implemented by the actual index, which will be a dense portfolio (as opposed to sparse) \(\mathbf{b}\) such that \(\mathbf{X}\mathbf{b} = \mathbf{r}^b\). In practice, it is expensive to get the information on \(\mathbf{b}\), so we will just regress it.

Step 1: Asset selection We can define the selection of \(K\) assets through the vector \(\mathbf{s}\), defined as \[s_i=\begin{cases} 1\quad & \text{if the }j\text{-th asset is selected}\\ 0 & \text{otherwise}, \end{cases}\] with \(\mathbf{1}^T\mathbf{s}=K\). For example, we can simply choose the \(K\) largest assets of \(\mathbf{b}\).
Step 2: Capital allocation Once the active assets are known, then the portfolio design is simple. One option is to define the portfolio in the same proportion as the index composition (but only using the active assets): \[\mathbf{w}=\frac{\mathbf{b}\odot\mathbf{s}}{\mathbf{1}^T(\mathbf{b}\odot\mathbf{s})}\] where \(\odot\) denotes Hadamard (elementwise) product. Another option is to fit the index returns subject to the sparsity pattern given by the asset selection: \[\begin{array}{ll} \underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}(\mathbf{w}\odot\mathbf{s})\big\|_2^2\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{w}\ge\mathbf{0}. \end{array}\]

Loading data
We start by loading the market data of the index S&P 500 and its underlying assets during 2010 (from the package sparseIndexTracking):

library(xts)
library(sparseIndexTracking)  #install.packages("sparseIndexTracking")
data(INDEX_2010)
T <- nrow(INDEX_2010$X)
N <- ncol(INDEX_2010$X)
X <- INDEX_2010$X
r <- INDEX_2010$SP500

# training and test sets
T_trn <- round(0.7*T)  # define the training set
T_tst <- T - T_trn
X_trn <- X[1:T_trn, ]
r_trn <- r[1:T_trn, ]
X_tst <- X[-(1:T_trn), ]
r_tst <- r[-(1:T_trn), ]

The data INDEX_2010 contains a list with two xts objects:
1. X: A \(T\times N\) xts with the daily linear returns of the \(N\) assets that were in the index during the year 2010 (total \(T\) trading days)
2. SP500: A \(T\times 1\) xts with the daily linear returns of the index S&P 500 during the same period.

Regressing the dense index portfolio
Since we don’t know the index composition \(\mathbf{b}\), we will get an estimate by solving the constrained regression \[\begin{array}{ll} \underset{\mathbf{b}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{b}\big\|_2^2\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{b}=1\\ & \mathbf{b}\ge\mathbf{0}. \end{array}\]

library(CVXR)
b <- Variable(N)
prob <- Problem(Minimize(sum((as.matrix(r) - as.matrix(X) %*% b)^2)), 
                constraints = list(b >= 0, sum(b)==1))
result <- solve(prob)
b_SP500 <- as.vector(result$getValue(b))
barplot(b_SP500, main = "SP500 composition", xlab = "asset index")

Two-stage approach: weights proportional to \(\mathbf{b}\)
To obtain \(\mathbf{s}\), we can simply choose the largest \(K\) elements of \(\mathbf{b}\):

K <- 20
idx_Klargest <- sort(b_SP500, decreasing = TRUE, index.return = TRUE)$ix[1:K]
s <- rep(0, N)
s[idx_Klargest] <- 1

Then we can proceed to define the portfolio in the same proportion as the index composition:

w_2step_a <- rep(0, N)
w_2step_a[idx_Klargest] <- b_SP500[idx_Klargest]
w_2step_a <- w_2step_a/sum(w_2step_a)
barplot(w_2step_a, main = "Two-step design with proportional weights", xlab = "asset index")

Two-stage approach: fitted weights
We can also obtain the least squares design given the previous selection:

w <- Variable(N)
prob <- Problem(Minimize(sum((as.matrix(r_trn) - as.matrix(X_trn) %*% (w*s))^2)), 
                constraints = list(w >= 0, sum(w)==1))
result <- solve(prob)
w_2step_b <- as.vector(result$getValue(w))
barplot(w_2step_b, main = "Two-step design with fitted weights", xlab = "asset index")

Comparison We can now compute the tracking errors of the two designs compared with the index value:

# tracking error
ETE <- c("dense b"                = (1/T)*norm(r - X %*% b_SP500, "2")^2,
         "two-step sparse prop b" = (1/T)*norm(r - X %*% w_2step_a, "2")^2,
         "two-step sparse LS"     = (1/T)*norm(r - X %*% w_2step_b, "2")^2)
ETE <- as.matrix(ETE)
colnames(ETE) <- "ETE"
print(ETE)

R>>                                 ETE
R>> dense b                8.753776e-09
R>> two-step sparse prop b 5.949458e-06
R>> two-step sparse LS     4.335993e-06

We can get a better graph of the tracking errors visually:

barplot(c(ETE), main = "Tracking error", ylab = "ETE", col = rainbow(4))

We can also plot the Cum-PnL:

library(PerformanceAnalytics)

# plot of wealth
ret_all <- cbind("SP500"                  = r,
                 "dense b"                = X %*% b_SP500,
                 "two-step sparse prop b" = X %*% w_2step_a,
                 "two-step sparse LS"     = X %*% w_2step_b)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

Interlude: Majorization-Minimization (MM) algorithm

Interlude: Majorization-Minimization (MM)

Consider the following presumably difficult optimization problem: \[\begin{array}{ll} \underset{{\mathbf x}}{\textsf{minimize}} & f\left({\bf x}\right)\\ \textsf{subject to} & {\bf x}\in\mathcal{X}, \end{array}\] with \(\mathcal{X}\) being the feasible set and \(f\left({\bf x}\right)\) being continuous.
Idea: successively minimize a more managable surrogate function \(u\left({\bf x},{\bf x}^{(k)}\right)\): \[{\bf x}^{(k+1)}=\arg\min_{{\bf x}\in\mathcal{X}}u\left({\bf x},{\bf x}^{(k)}\right),\] hoping the sequence of minimizers \(\left\{ {\bf x}^{(k)}\right\}\) will converge to optimal \({\bf x}^{\star}\).
Question: how to construct \(u\left({\bf x},{\bf x}^{(k)}\right)\)?
Answer: that’s more like an art (Sun et al. 2017).

Interlude on MM: surrogate/majorizer

Construction rule: \[\begin{array}{ll} u\left({\bf y},{\bf y}\right) &= f\left({\bf y}\right),\ \forall{\bf y}\in\mathcal{X}\\ u\left({\bf x},{\bf y}\right) &\geq f\left({\bf x}\right),\ \forall{\bf x},{\bf y}\in\mathcal{X}\\ \left.u'\left({\bf x},{\bf y};{\bf d}\right)\right|_{{\bf x}={\bf y}} &= f\;'\left({\bf y};{\bf d}\right),\ \forall{\bf d}\text{ with }{\bf y}+{\bf d}\in\mathcal{X}\\ u\left({\bf x},{\bf y}\right) &\text{is continuous in }{\bf x} \text{ and } {\bf y} \end{array}\]

Interlude on MM: Algorithm

Algorithm MM

Set \(k=0\) and initialize with a feasible point \({\bf x}^0\in \mathcal X\).
repeat

\({\bf x}^{(k+1)}=\arg\min_{{\bf x}\in {\mathcal X}} u\left({\bf x},{\bf x}^{(k)}\right)\)
\(k \gets k+1\)

until convergence
return \(\mathbf{x}^{(k)}\)

Interlude on MM: Convergence

Under some technical assumptions, every limit point of the sequence \(\left\{ {\bf x}^{k}\right\}\) is a stationary point of the original problem.

If further assume that the level set \(\mathcal{X}^{0}=\left\{ {\bf x}|f\left({\bf x}\right)\leq f\left({\bf x}^{0}\right)\right\}\) is compact, then \[\lim_{k\to\infty}d\left({\bf x}^{(k)},\mathcal{X}^{\star}\right)=0,\] where \(\mathcal{X}^{\star}\) is the set of stationary points.

Resolution via MM

Sparse index tracking via MM

Approximation of the \(\ell_{0}\)-norm (indicator function): \[\rho_{p,{\color{red}\gamma}}(w)=\frac{\log(1+|w|/p)}{\log(1+{\color{red}\gamma}/p)}.\]

Good approximation in the interval \([-\gamma,\gamma]\).
Concave for \(w\ge0\).
So-called folded-concave for \(w\in\mathbb{R}\).
For our problem we set \(\gamma=u\), where \(u\leq1\) is an upperbound of the weights (we can always choose \(u=1\)).

Approximate formulation

Continuous and differentiable approximate formulation: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda\mathbf{1}^{\top}\mathbf{\boldsymbol{\rho}}{}_{p,u}(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \] where \(\boldsymbol{\rho}_{p,u}(\mathbf{w})=[\rho_{p,u}(w_{1}),\dots,\rho_{p,u}(w_{N})]^{\top}\).

This problem is still non-convex: \(\boldsymbol{\rho}_{p,u}(\mathbf{w})\) is concave for \(\mathbf{w}\geq\mathbf{0}\).

We will use MM to deal with the non-convex part.

Majorization of \(\boldsymbol{\rho}_{p,\gamma}\)

Lemma 1

The function \(\rho_{p,\gamma}(w)\), with \(w\geq0\), is upperbounded at \(w^{(k)}\) by the surrogate function \[h_{p,\gamma}(w,w^{(k)})=d_{p,\gamma}(w^{(k)})w+c_{p,\gamma}(w^{(k)}),\] where \[\begin{aligned} d_{p,\gamma}(w^{(k)}) &= \frac{1}{\log(1+\gamma/p)(p+w^{(k)})},\\ c_{p,\gamma}(w^{(k)}) &= \frac{\log\left(1+w^{(k)}/p\right)}{\log(1+\gamma/p)}-\frac{w^{(k)}}{\log(1+\gamma/p)(p+w^{(k)})} \end{aligned}\] are constants.

Proof of Lemma 1

The function \(\rho_{p,\gamma}(w)\) is concave for \(w\geq0\).
An upper bound is its first-order Taylor approximation at any point \(w_{0}\in\mathbb{R}_{+}\). \[\begin{aligned} \rho_{p,\gamma}(w) &= \frac{\log(1+w/p)}{\log(1+\gamma/p)}\\ & \leq \frac{1}{\log(1+\gamma/p)}\left[\log\left(1+w_{0}/p\right)+\frac{1}{p+w_{0}}(w-w_{0})\right] \\ & = \color{red}{\underbrace{\color{black}{\frac{1}{\log(1+\gamma/p)(p+w_{0})}}}_{d_{p,\gamma}}}\color{black}{w} \\ & \qquad + \color{red}{\underbrace{\color{black}{\frac{\log\left(1+w_{0}/p\right)}{\log(1+\gamma/p)}-\frac{w_0}{\log(1+\gamma/p)(p+w_{0})}}}_{b_{p,\gamma}}} \end{aligned}\]

Majorization of \(\boldsymbol{\rho}_{p,\gamma}\)

Iterative formulation via MM

Now in every iteration we need to solve the following problem: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \] where \(\mathbf{d}_{p,u}^{(k)}=\left[d_{p,u}(w_{1}^{(k)}),\dots,d_{p,u}(w_{N}^{(k)})\right]^{\top}\).

This problem is convex (actually a QP).

Requires a solver in each iteration.

Algorithm LAIT

Algorithm 1: Linear Approximation for the Index Tracking problem (LAIT)

Set \(k=0\) and choose \(\mathbf{w}^{(0)}\in\mathcal{W}\).

repeat

Compute \(\mathbf{d}_{p,u}^{(k)}\)
\({\bf w}^{(k+1)}=\arg\min_{{\bf w}\in\mathcal{W}} \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}\)
\(k \gets k+1\)

until convergence
return \(\mathbf{w}^{(k)}\)

The big picture

R session: Sparse index tracking via MM (single majorization)

Recall our nonconvex and nondifferentiable optimization problem: \[\begin{array}{ll} \underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{w}\big\|_2^2 + \lambda\|\mathbf{w}\|_0\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}. \end{array}\]

The idea now is to employ the Majorization-Minimization (MM) algorithm to iteratively solve a sequence of simpler surrogate problems.

The first step is to get rid of the nondifferentiability by approximating the cardinality operator \(\|\mathbf{w}\|_0\) by \(\mathbf{1}^{\top}\mathbf{\boldsymbol{\rho}}{}_{p,\gamma}(\mathbf{w})\), where we have approximated the \(\ell_{0}\)-norm (indicator function) with \[\rho_{p,\gamma}(w)=\frac{\log(1+|w|/p)}{\log(1+\gamma/p)}.\]

Now we can derive a majorizer of the nonconvex term \(\mathbf{1}^{\top}\mathbf{\boldsymbol{\rho}}{}_{p,\gamma}(\mathbf{w})\) (i.e., derive a tangent upper bound) easily by simply linearizing the log function. At iteration \(k\), the resulting problem is the convex QP: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^\top\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\] where \(d_{p,\gamma}(w^{(k)})=\frac{1}{\log(1+\gamma/p)(p+w^{(k)})}\).

We are ready to implement the iterative MM approach:

# parameters
gamma <- 0.5; p <- 1e-3  # parameters for the rho approximation of log
lambda <- 6e-7  # this achieves a sparsity level of K=20
# MM loop
w <- rep(1/N, N)
obj_value <- (1/T_trn)*norm(r_trn - X_trn %*% w, "2")^2 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p))
while(1) {
  d <- 1/(log(1+gamma/p)*(p+w))
  w_prev <- w
  w <- Variable(N)
  prob <- Problem(Minimize((1/T_trn)*sum((as.matrix(r_trn) - as.matrix(X_trn) %*% w)^2) 
                           + lambda * t(d) %*% w), 
                  constraints = list(w >= 0, w <= 1, sum(w)==1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  obj_value <- c(obj_value, 
                 (1/T_trn)*norm(r_trn - X_trn %*% w, "2")^2 
                 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p)))
  if(norm(w-w_prev, "2")/norm(w_prev, "2") < 1e-3) break
}
w_MM_single <- w
plot(obj_value, type = "b", pch = 19, cex = .8, 
     main = "Convergence", xlab = "iteration", ylab = "objective value")

cat("Sparsity level K =", sum(w_MM_single > 1e-2))

R>> Sparsity level K = 20

barplot(w_MM_single, xlab = "asset index", 
        main = sprintf("Sparse portfolio via MM (sparsity level K=%d)", sum(w_MM_single > 1e-2)))

We can now compute the tracking error:

# tracking error
ETE <- rbind(ETE, 
             "MM-single" = (1/T)*norm(r - X %*% w_MM_single, "2")^2)
print(ETE)

R>>                                 ETE
R>> dense b                8.753776e-09
R>> two-step sparse prop b 5.949458e-06
R>> two-step sparse LS     4.335993e-06
R>> MM-single              1.895406e-06

barplot(c(ETE), main = "Tracking error", ylab = "ETE", col = rainbow(4))

As expected, the tracking error is much lower than the two-step approaches.

We can also plot the Cum-PnL:

# plot of wealth
ret_all <- cbind(ret_all, 
                 "MM-single" = X %*% w_MM_single)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

Should we stop here?

Advantages:

✅ The problem is convex.
✅ Can be solved efficiently by an off-the-shelf solver.

Disadvantages:

❌ Needs to be solved many times (one for each iteration).
❌ Calling a solver many times increases significantly the running time.

Can we do something better?

✅ For specific constraint sets we can derive closed-form update algorithms!

Let’s rewrite the objective function

Expand the objective: \(\frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}=\frac{1}{T}\mathbf{w}^{\top}\mathbf{X}^{\top}\mathbf{X}\mathbf{w}+\left(\lambda\mathbf{d}_{p,u}^{(k)}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right)^{\!\!\top}\!\!\mathbf{w}+\mathsf{\mathit{const.}}\)

Further upper-bound it:

Lemma 2

Let \(\mathbf{L}\) and \(\mathbf{M}\) be real symmetric matrices such that \(\mathbf{M}\succeq\mathbf{L}\). Then, for any point \(\mathbf{w}^{(k)}\in\mathbb{R}^{N}\) the following inequality holds: \[\mathbf{w}^{\top}\mathbf{L}\mathbf{w}\leq\mathbf{w}^{\top}\mathbf{M}\mathbf{w}+2{\mathbf{w}^{(k)}}^{\!\top}\!\!\left(\mathbf{L}-\mathbf{M}\right)\mathbf{w}-{\mathbf{w}^{(k)}}^{\top}\!\!\left(\mathbf{L}-\mathbf{M}\right)\mathbf{w}^{(k)}.\] Equality is achieved when \(\mathbf{w}=\mathbf{w}^{(k)}\).

Let’s majorize the objective function

Based on Lemma 2:
- Majorize the quadratic term \(\frac{1}{T}\mathbf{w}^{\top}\mathbf{X}^{\top}\mathbf{X}\mathbf{w}\).
- In our case \(\mathbf{L}_{1}=\frac{1}{T}\mathbf{X}^{\top}\mathbf{X}\).
- We set \(\mathbf{M}_{1}=\lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{I}\) so that \(\mathbf{M}_{1}\succeq\mathbf{L}_{1}\) holds.
The objective becomes: \[ \begin{aligned} & \mathbf{w}^{\top}\mathbf{L}_{1}\mathbf{w}+\left(\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right)^{\top}\!\mathbf{w} \\ & \leq \mathbf{w}^{\top}\mathbf{M}_{1}\mathbf{w}+2{\mathbf{w}^{(k)}}^{\!\top}\!\left(\mathbf{L}_{1}-\mathbf{M}_{1}\right)\mathbf{w}-{{\mathbf{w}^{(k)}}^{\!\top}\!\!\left(\mathbf{L}_{1}-\mathbf{M}_{1}\right)\mathbf{w}^{(k)}} \\ & \qquad +\left(\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right)^{\!\top}\!\mathbf{w} \\ & = \lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{w}^{\top}\mathbf{w}+\!\left(\!2\left(\mathbf{L}_{1}-\lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{I}\right)\mathbf{w}^{(k)}+\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\!\right)^{\!\!\top}\mathbf{\!\!w}+const. \end{aligned} \]

Specialized iterative formulation

The new optimization problem at the \((k+1)\)-th iteration becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{\top}\mathbf{w}+{\mathbf{q}_{1}^{(k)}}^{\top}\mathbf{w}\\ \textsf{subject to} & \begin{array}{l} \mathbf{w}^{\top}\mathbf{1}=1,\\ \mathbf{0}\leq\mathbf{w}\leq\mathbf{1}, \end{array} \biggl\}\mathcal{W} \end{array} \] where \[\mathbf{q}_{1}^{(k)}=\frac{1}{\lambda_{\text{max}}^{(\mathbf{L}_{1})}}\left(2\left(\mathbf{L}_{1}-\lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{I}\right)\mathbf{w}^{(k)}+\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right).\]

This problem can be solved with a closed-form update algorithm.

Solution

Proposition 1

The optimal solution to the previous problem with \(u=1\) is: \[w_{i}^{\star}=\begin{cases} -\frac{\mu+q_{i}}{2},\quad & i\in\mathcal{A},\\ 0,\quad & i\notin\mathcal{A}, \end{cases}\] with \[\mu=-\frac{\sum_{i\in A}q_{i}+2}{\mathit{card}(\mathcal{A})},\] and \[\mathcal{A}=\big\{ i\big|\mu+q_{i}<0\big\},\] where \(\mathcal{A}\) can be determined in \(O(\log(N))\) steps.

Algorithm SLAIT

Algorithm 2: Specialized Linear Approximation for the Index Tracking problem (SLAIT)

Set \(k=0\) and choose \(\mathbf{w}^{(0)}\in\mathcal{W}\).

repeat

Compute \(\mathbf{q}_{1}^{(k)}\)
\({\bf w}^{(k+1)}=\arg\min_{{\bf w}\in\mathcal{W}} \mathbf{w}^{\top}\mathbf{w}+{\mathbf{q}_{1}^{(k)}}^{\top}\mathbf{w}\)
\(k \gets k+1\)

until convergence
return \(\mathbf{w}^{(k)}\)

The big picture

R session: Sparse index tracking via MM (double majorization)

Recall that after the application of the first majorization, we obtained the following convex optimization problem: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^\top\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\] which required the use of a solver at each iteration.

However, it is desirable to be able to solve the majorized problem in a simpler way. In our case, we can obtain a closed-form solution (we will focus on the simpler case \(u=1\)).

We first expand the objective: \[\frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}=\frac{1}{T}\mathbf{w}^{\top}\mathbf{X}^{\top}\mathbf{X}\mathbf{w}+\left(\lambda\mathbf{d}_{p,u}^{(k)}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right)^{\!\!\top}\!\!\mathbf{w}+\mathsf{\mathit{const.}}\] and then we majorize a second time: \[\begin{aligned} \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2} &+ \lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w} \\ & \le \lambda_{\text{max}}^{(\frac{1}{T}\mathbf{X}^{\top}\mathbf{X})}\mathbf{w}^{\top}\mathbf{w}+\!\left(\!2\left(\frac{1}{T}\mathbf{X}^{\top}\mathbf{X}-\lambda_{\text{max}}^{(\frac{1}{T}\mathbf{X}^{\top}\mathbf{X})}\mathbf{I}\right)\mathbf{w}^{(k)}+\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\!\right)^{\!\!\top}\mathbf{\!\!w}+const. \end{aligned}\]

Finally, the double-majorized problem can be written as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{\top}\mathbf{w}+{\mathbf{q}_{1}^{(k)}}^{\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq \mathbf{1}, \end{array}\] where \[\mathbf{q}_{1}^{(k)}=\frac{1}{\lambda_{\text{max}}^{(\frac{1}{T}\mathbf{X}^{\top}\mathbf{X})}}\left(2\left(\frac{1}{T}\mathbf{X}^{\top}\mathbf{X}-\lambda_{\text{max}}^{(\frac{1}{T}\mathbf{X}^{\top}\mathbf{X})}\mathbf{I}\right)\mathbf{w}^{(k)}+\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right).\] This problem can be solved again with a solver but also directly with a simple algorithm in a finite number of iterations.

We can now implement this double iterative MM approach (see https://cvxr.rbind.io/cvxr_functions/ for the list of functions in CVXR):

# parameters
gamma <- 0.5; p <- 1e-3  # parameters for the rho approximation of log
lambda <- 5.7e-7  # this achieves a sparsity level of K=20  (5.5e-7 gives K=22, 5.8e-7 gives K=19)
# MM loop
w <- rep(1/N, N)
obj_value <- (1/T_trn)*norm(r_trn - X_trn %*% w, "2")^2 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p))
while(1) {
  d <- 1/(log(1+gamma/p)*(p+w))
  XX <- t(X_trn) %*% X_trn
  lmd_max <- max(eigen(XX/T_trn)$values)
  q1 <- 1/lmd_max * (2*(XX/T_trn - lmd_max*diag(N)) %*% w + lambda*d - (2/T_trn)*t(X_trn)%*%r_trn)
  w_prev <- w
  w <- Variable(N)
  prob <- Problem(Minimize(sum_squares(w) + t(q1) %*% w), 
                  constraints = list(w >= 0, w <= 1, sum(w)==1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  obj_value <- c(obj_value, 
                 (1/T_trn)*norm(r_trn - X_trn %*% w, "2")^2 
                 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p)))
  if(norm(w-w_prev, "2")/norm(w_prev, "2") < 1e-5) break
}
w_MM_double <- w

plot(obj_value, type = "b", pch = 19, cex = .5, 
     main = "Convergence", xlab = "iteration", ylab = "objective value")

cat("Sparsity level K =", sum(w_MM_double > 1e-2))

R>> Sparsity level K = 21

barplot(w_MM_double, xlab = "asset index", 
        main = sprintf("Sparse portfolio via MM (sparsity level K=%d)", sum(w_MM_double > 1e-2)))

We can now compute the tracking error:

# tracking error
ETE <- rbind(ETE, 
             "MM-double" = (1/T)*norm(r - X %*% w_MM_double, "2")^2)
print(ETE)

R>>                                 ETE
R>> dense b                8.753776e-09
R>> two-step sparse prop b 5.949458e-06
R>> two-step sparse LS     4.335993e-06
R>> MM-single              1.895406e-06
R>> MM-double              2.076981e-06

barplot(c(ETE), main = "Tracking error", ylab = "ETE", col = rainbow(5))

We can also plot the Cum-PnL:

# plot of wealth
ret_all <- cbind(ret_all, 
                 "MM-double" = X %*% w_MM_double)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

R session: Sparse index tracking via package `sparseIndexTracking`

Now we will use the package sparseIndexTracking:

library(sparseIndexTracking)
w_pkg <- spIndexTrack(X_trn, r_trn, lambda = 8.3e-7, u = gamma)

Portfolio:

cat("Sparsity level K =", sum(w_pkg > 1e-2))

R>> Sparsity level K = 20

barplot(w_pkg, xlab = "asset index", names.arg = rep("", N),
        main = sprintf("Sparse portfolio via MM (sparsity level K=%d)", sum(w_pkg > 1e-2)))

Tracking error:

# tracking error
ETE <- rbind(ETE, 
             "MM-pkg" = (1/T)*norm(r - X %*% w_pkg, "2")^2)
print(ETE)

R>>                                 ETE
R>> dense b                8.753776e-09
R>> two-step sparse prop b 5.949458e-06
R>> two-step sparse LS     4.335993e-06
R>> MM-single              1.895406e-06
R>> MM-double              2.076981e-06
R>> MM-pkg                 1.993646e-06

barplot(c(ETE), main = "Tracking error", ylab = "ETE", col = rainbow(6))

We can observe that the single MM performs the best and the double MM is a bit worse because it didn’t have enough time to converge (convergence is very slow in this implementation because we are still using a solver instead of the efficient closed-form solution, with more time it would converge though). The package sparseIndexTracking is using the double MM but with the efficient closed-form solution, so convergence time is fast and the final solution is like the single MM.

Holding Constraints and Extensions

Problem formulation

Holding constraints

In practice, the constraints that are usually considered in the index tracking problem can be written in a convex form.

Exception: holding constraints to avoid extreme positions or brokerage fees for very small orders \[\mathbf{l}\odot\mathcal{I}_{\{\mathbf{w}>\mathbf{0}\}}\leq\mathbf{w}\leq\mathbf{u}\odot{\mathcal{I}}_{\{\mathbf{w}>\mathbf{0}\}}\]
Active constraints only for the selected assets (\(w_{i}>0\)).

Upper bound is easy: \(\mathbf{w}\leq\mathbf{u}\odot{\mathcal{I}}_{\{\mathbf{w}>\mathbf{0}\}}\Longleftrightarrow\mathbf{w}\leq\mathbf{u}\) (convex and can be included in \(\mathcal{W}\)).
Lower bound is nasty. 😢

Problem formulation

The problem formulation with holding constraints becomes (after the \(\ell_{0}\)-“norm” approximation): \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda\mathbf{1}^{\top}\boldsymbol{\rho}_{p,u}(\mathbf{w})\\ {\textsf{subject to}} & \mathbf{w}\in\mathcal{W},\\ & \mathbf{l}\odot{\mathcal{I}}_{\{\mathbf{w}>\mathbf{0}\}}\leq\mathbf{w}. \end{array} \]

How should we deal with the non-convex constraint?

Penalization of violations

Hard constraint \(\Longrightarrow\) Soft constraint.
Penalize violations in the objective.
A suitable penalty function for a general entry \(w\) is (since the constraints are separable): \[f_{l}(w)=\left(\mathcal{I}_{\{0<w<l\}}\cdot l-w\right)^{+}.\]
Approximate the indicator function with \(\rho_{p,\gamma}(w)\). Since we are interested in the interval \([0,l]\) we select \(\gamma=l\): \[\tilde{f}_{p,l}(w)=\left(\rho_{p,l}(w)\cdot l-w\right)^{+}.\]

Penalization of violations

Penalty functions \(f_{l}(w)\) and \(\tilde{f}_{p,l}(w)\) for \(l=0.01, p=10^{-4}\):

Problem Formulation with Penalty

The penalized optimization problem becomes: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda\mathbf{1}^{\top}\boldsymbol{\rho}_{p,u}(\mathbf{w})+\boldsymbol{\nu}^{\top}\tilde{\mathbf{f}}_{p,l}(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \] where \(\boldsymbol{\nu}\) is a parameter vector that controls the penalization and \(\tilde{\mathbf{f}}_{p,l}(\mathbf{w})=[\tilde{f}_{p,l}(w_{1}),\dots,\tilde{f}_{p,l}(w_{N})]^{\top}\).

This problem is not convex:
- \(\rho_{p,u}(w)\) is concave \(\Longrightarrow\) Linear upperbound with Lemma 1.
- \(\tilde{f}_{p,l}(w)\) is neither convex nor concave. 😢

Holding constraints via MM

Majorization of \(\tilde{f}_{p,l}(w)\)

Lemma 3

The function \(\tilde{f}_{p,l}(w)=\left(\rho_{p,l}(w)\cdot l-w\right)^{+}\) is majorized at \(w^{(k)}\in[0,u]\) by the convex function \[h_{p,l}(w,w^{(k)})=\left(\left(d_{p,l}(w^{(k)})\cdot l-1\right)w+c_{p,l}(w^{(k)})\cdot l\right)^{+},\] where \(d_{p,l}(w^{(k)})\) and \(c_{p,l}(w^{(k)})\) are given in Lemma 1.

Proof: \(\rho_{p,l}(w)\leq d_{p,l}(w^{(k)})w+c_{p,l}(w^{(k)})\) for \(w\geq0\) [Lemma 1]. \[\begin{aligned} \tilde{f}_{p,l}(w) & = \max\left(\rho_{p,l}(w)\cdot l-w,0\right)\\ & \leq \max\left((d_{p,l}(w^{(k)})w+c_{p,l}(w^{(k)}))\cdot l-w,0\right)\\ & = \max\left(\left(d_{p,l}(w^{(k)})\cdot l-1\right)w+c_{p,l}(w^{(k)})\cdot l,0\right). \end{aligned}\] \(h_{p,l}(w,w^{(k)})\) is convex as the maximum of two convex functions.

Majorization of \(\tilde{f}_{p,l}(w)\)

Observe \(\tilde{f}_{p,l}(w)\) and its piecewise linear majorizer \(h_{p,l}(w,w^{(k)})\):

Convex formulation of the majorization

Recall our problem: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda\mathbf{1}^{\top}\boldsymbol{\rho}_{p,u}(\mathbf{w})+\boldsymbol{\nu}^{\top}\tilde{\mathbf{f}}_{p,l}(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W}. \end{array} \]
From Lemma 1: \(\color{red}{\boldsymbol{\rho}_{p,u}(\mathbf{w})\leq{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}+const.}\)
From Lemma 3: \[\begin{aligned} \color{blue}{\tilde{\mathbf{f}}_{p,l}(\mathbf{w}) = \left(\boldsymbol{\rho}_{p,l}(\mathbf{w})\cdot\mathbf{l}-\mathbf{w}\right)^{+}} & \color{blue}{\leq \left(\mathsf{Diag}\left(\mathbf{d}_{p,l}^{(k)}\odot\mathbf{l}-\mathbf{1}\right)\mathbf{w}+\mathbf{c}_{p,l}^{(k)}\odot\mathbf{l}\right)^{+}}\\ & \color{blue}{=\mathbf{h}_{p,l}(\mathbf{w},\mathbf{w}^{(k)})} \end{aligned}\]
The majorized problem at the \((k+1)\)-th iteration becomes: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2} + \lambda \color{red}{\mathbf{d}_{p,u}^{(k)\top}\mathbf{w}} + \boldsymbol{\nu}^\top\color{blue}{\mathbf{h}_{p,l}(\mathbf{w},\mathbf{w}^{(k)})}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \] which is convex.

Algorithm LAITH

Algorithm 3: Linear Approximation for the Index Tracking problem with Holding constraints (LAITH)

Set \(k=0\) and choose \(\mathbf{w}^{(0)}\in\mathcal{W}\).

repeat

Compute \(\mathbf{d}_{p,l}^{(k)}\), \(\mathbf{d}_{p,u}^{(k)}\)
Compute \(\mathbf{c}_{p,l}^{(k)}\)
\({\bf w}^{(k+1)}=\arg\min_{{\bf w}\in\mathcal{W}} \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2} + \lambda \color{red}{\mathbf{d}_{p,u}^{(k)\top}\mathbf{w}} + \boldsymbol{\nu}^\top\color{blue}{\mathbf{h}_{p,l}(\mathbf{w},\mathbf{w}^{(k)})}\)
\(k \gets k+1\)

until convergence
return \(\mathbf{w}^{(k)}\)

The big picture

Should we stop here?

✅ Again, for specific constraint sets we can derive closed-form update algorithms!

Smooth approximation of the \((\cdot)^{+}\) operator

To get a closed-form update algorithm we need to majorize again the objective.
Let us begin with the majorization of the third term, i.e., \[\mathbf{h}_{p,l}(\mathbf{w},\mathbf{w}^{(k)})=\left(\mathsf{Diag}\left(\mathbf{d}_{p,l}^{(k)}\odot\mathbf{l}-\mathbf{1}\right)\!\mathbf{w}+\mathbf{c}_{p,l}^{(k)}\odot\mathbf{l}\right)^{\!+}.\]

✅ Separable: focus only in the univariate case, i.e., \(h_{p,l}(w,w^{(k)})\).

❌ Not smooth: cannot define majorization function at the non-differentiable point.

Smooth approximation of the \((\cdot)^{+}\) operator

Use a smooth approximation of the \((\cdot)^{+}\) operator: \[(x)^{+}\approx\frac{x+\sqrt{x^{2}+\epsilon^{2}}}{2},\] where \(0<\epsilon\ll1\) controls the approximation.

Apply this to \(h_{p,l}(w,w^{(k)})=\left(\left(d_{p,l}(w^{(k)})\cdot l-1\right)w+c_{p,l}(w^{(k)})\cdot l\right)^{+}\): \[\tilde{h}_{p,\epsilon,l}(w,w^{(k)})=\frac{\alpha^{(k)}w+\beta^{(k)}+\sqrt{(\alpha^{(k)}w+\beta^{(k)})^{2}+\epsilon^{2}}}{2},\] where \(\alpha^{(k)}=d_{p,l}(w^{(k)})\cdot l-1, and \beta^{(k)}=c_{p,l}(w^{(k)})\cdot l\).

Smooth majorization of \(\tilde{f}_{p,l}(w)\)

Penalty function \(\tilde{f}_{p,l}(w)\), its piecewise linear majorizer \(h_{p,l}(w,w^{(k)})\), and its smooth approximation \(\tilde{h}_{p,\epsilon,l}(w,w^{(k)})\):

Quadratic majorization of \(\tilde{h}_{p,\epsilon,l}(w,w^{(k)})\)

Lemma 4

The function \(\tilde{h}_{p,\epsilon,l}(w,w^{(k)})\) is majorized at \(w^{(k)}\) by the quadratic convex function \[q_{p,\epsilon,l}(w,w^{(k)})=a_{p,\epsilon,l}(w^{(k)})w^{2}+b_{p,\epsilon,l}(w^{(k)})w+c_{p,\epsilon,l}(w^{(k)}),\] where \(a_{p,\epsilon,l}(w^{(k)})= \frac{(\alpha^{(k)})^{2}}{2\kappa}\), \(b_{p,\epsilon,l}(w^{(k)})= \frac{\alpha^{(k)}\beta^{(k)}}{\kappa}+\frac{\alpha^{(k)}}{2}\), and \(c_{p,\epsilon,l}(w^{(k)})=\frac{(\alpha^{(k)}w^{(k)})(\alpha^{(k)}w^{(k)}+2\beta^{(k)})+2({\beta^{(k)}}^{2}+{\epsilon}^{2})}{2\kappa}+\frac{\beta^{(k)}}{2}\) is an optimization irrelevant constant, with \(\!\kappa\!=\!2\sqrt{(\!\alpha^{(k)}w^{(k)}\!+\!\beta^{(k)}\!)^{2}\!+\!\epsilon^{2}}\).

Proof: Majorize the square root term of \(\tilde{h}_{p,\epsilon,l}(w,w^{(k)})\) (concave) with its first-order Taylor approximation.

Quadratic majorization of \(\tilde{f}_{p,l}(w)\)

Penalty function \(\tilde{f}_{p,l}(w)\), its piecewise linear majorizer \(h_{p,l}(w,w^{(k)})\), its smooth majorizer \(\tilde{h}_{p,\epsilon,l}(w,w^{(k)})\), and its quadratic majorizer \(q_{p,\epsilon,l}(w,w^{(k)})\):

Quadratic formulation of the majorization

Recall our problem: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2} + \lambda \color{red}{\mathbf{d}_{p,u}^{(k)\top}\mathbf{w}} + \boldsymbol{\nu}^\top\color{blue}{\tilde{\mathbf{h}}_{p,\epsilon,l}(\mathbf{w},\mathbf{w}^{(k)})}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \]
From Lemma 4: \[\color{blue}{\tilde{\mathbf{h}}_{p,\epsilon,l}(\mathbf{w},\mathbf{w}^{(k)})\!\leq\!\mathbf{w}^{\top}\mathsf{Diag}\left(\mathbf{a}_{p,\epsilon,l}^{(k)}\odot\boldsymbol{\nu}\right)\mathbf{w}+\mathbf{b}_{p,\epsilon,l}^{(k)}\odot\boldsymbol{\nu}^{\top}\mathbf{w}+const.}\]
The majorized problem at the \((k+1)\)-th iteration becomes: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{\top}\left(\frac{1}{T}\mathbf{X}^{\top}\mathbf{X}+\color{blue}{\mathsf{Diag}\left(\mathbf{a}_{p,\epsilon,l}^{(k)}\odot\boldsymbol{\nu}\right)}\right)\mathbf{w}\\ & \qquad + \left(\lambda\mathbf{d}_{p,u}^{(k)}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}+\color{blue}{\mathbf{b}_{p,\epsilon,l}^{(k)}\odot\boldsymbol{\nu}}\right)^{\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W}\\ \end{array} \]
This problem is a QP that can be solved with a solver, but we can do better.

Quadratic formulation of the majorization

Use Lemma 2 to majorize the quadratic part:
- \(\mathbf{L}_{2}=\frac{1}{T}\mathbf{X}^{\top}\mathbf{X}+\mathsf{Diag}\left(\mathbf{a}_{p,\epsilon,l}^{(k)}\odot\boldsymbol{\nu}\right)\)
- \(\mathbf{M}_{2}=\lambda_{\text{max}}^{(\mathbf{L}_{2})}\mathbf{I}.\)
And the final optimization problem at the \((k+1)\)-th iteration becomes: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{\top}\mathbf{w}+{\mathbf{q}_{2}^{(k)}}^{\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W}, \end{array} \] where \[\mathbf{q}_{2}^{(k)}=\frac{1}{\lambda_{\text{max}}^{(\mathbf{L}_{2})}}\left(2\left(\mathbf{L}_{2}-\lambda_{\text{max}}^{(\mathbf{L}_{2})}\mathbf{I}\right)\mathbf{w}^{(k)}+\lambda\mathbf{d}_{p,u}^{(k)}\right.\left.-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}+\mathbf{b}_{p,\epsilon,l}^{(k)}\odot\boldsymbol{\nu}\right).\]
This problem can be solved in closed form!

Algorithm SLAITH

Algorithm 4: Specialized Linear Approximation for the Index Tracking problem with Holding constraints (SLAITH)

Set \(k=0\) and choose \(\mathbf{w}^{(0)}\in\mathcal{W}\).

repeat

Compute \(\mathbf{q}_{2}^{(k)}\)
Solve \({\bf w}^{(k+1)}=\arg\min_{{\bf w}\in\mathcal{W}} \mathbf{w}^{\top}\mathbf{w}+{\mathbf{q}_{2}^{(k)}}^{\top}\mathbf{w}\), using Proposition 1.
\(k \gets k+1\)

until convergence
return \(\mathbf{w}^{(k)}\)

The big picture

Extensions

Extension to other tracking error measures

In all the previous formulations we used the empirical tracking error (ETE): \[\textsf{ETE}(\mathbf{w})=\frac{1}{T}\big\|\mathbf{r}^{b}-\mathbf{X}\mathbf{w}\big\|_{2}^{2}.\]

However, we can use other tracking error measures such as (Benidis et al. 2018b):

Downside risk: \[\textsf{DR}(\mathbf{w})=\frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2},\] where \((x)^+=\text{max}(0,x)\).
Value-at-Risk (VaR) relative to an index.
Conditional VaR (CVaR) relative to an index.

Extension to downside risk

The downside risk as a function of the portfolio \(\textsf{DR}(\mathbf{w})\) is convex: can be used directly without any manipulation.
Nevertheless, specialized algorithms can be derived for \(\textsf{DR}\) too by properly majorizing it.

Lemma 5

The function \(\textsf{DR}(\mathbf{w})=\frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2}\) is majorized at \(\mathbf{w}^{(k)}\) by the quadratic convex function \[\frac{1}{T}\big\|\mathbf{r}^{b}-\mathbf{X}\mathbf{w}-\mathbf{y}^{(k)}\big\|_{2}^{2},\] where \(\mathbf{y}^{(k)}=-\left(\mathbf{X}\mathbf{w}^{(k)}-\mathbf{r}^{b}\right)^{+}\).

R session: Downside risk as tracking error measure (direct formulation)

In the original formulation, we used the empirical tracking error to measure the performance: \[\textsf{ETE}(\mathbf{w}) = \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{w}\big\|_2^2.\] However, depending on the goals of the tracking, one could argue that if the performance of the portfolio is better than the index then it should not be penalized. For that purpose, we can use the downside risk (DR): \[\textsf{DR}(\mathbf{w})=\frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2},\] where \((x)^+=\text{max}(0,x)\).

The problem formulation is then \[\begin{array}{ll} \underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2} + \lambda\|\mathbf{w}\|_0\\ \textsf{subject to} & \mathbf{w}^\top\mathbf{1}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\] which, after approximating and majorizing the cardinality term \(\|\mathbf{w}\|_0\) as before, becomes the convex problem \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2} + \lambda{\mathbf{d}_{p,u}^{(k)}}^\top\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\]

We can easily solve this convex problem with a solver (note that we could also use a double majorization like before to further simplify the problem):

# parameters
gamma <- 0.5; p <- 1e-3  # parameters for the rho approximation of log
lambda <- 1e-7  # this achieves a sparsity level of K=20
# MM loop
w <- rep(1/N, N)
obj_value <- (1/T_trn)*norm(pmax(0, r_trn - X_trn %*% w), "2")^2 +
             lambda * sum(log(1+abs(w)/p)/log(1+gamma/p))
while(1) {
  d <- 1/(log(1+gamma/p)*(p+w))
  w_prev <- w
  w <- Variable(N)
  prob <- Problem(Minimize((1/T_trn)*sum((pos(as.matrix(r_trn) - as.matrix(X_trn) %*% w))^2) 
                           + lambda * t(d) %*% w), 
                  constraints = list(w >= 0, w <= 1, sum(w)==1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  obj_value <- c(obj_value, 
                 (1/T_trn)*norm(pmax(0, r_trn - X_trn %*% w), "2")^2 
                 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p)))
  if(norm(w-w_prev, "2")/norm(w_prev, "2") < 1e-3) break
}
w_DR_MM <- w
plot(obj_value, type = "b", pch = 19, cex = .8, 
     main = "Convergence", xlab = "iteration", ylab = "objective value")

cat("Sparsity level K =", sum(w_DR_MM > 1e-2))

R>> Sparsity level K = 20

barplot(w_DR_MM, xlab = "asset index", 
        main = sprintf("Sparse portfolio via MM (sparsity level K=%d)", sum(w_DR_MM > 1e-2)))

We can observe the difference from the Cum-PnL:

# plot of wealth
ret_all <- cbind(ret_all, 
                 "DR-MM-single" = X %*% w_DR_MM)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

Let’s plot the out-of-sample Cum-PnL for a fair comparison:

chart.CumReturns(ret_all[-(1:T_trn), ], 
                 main = sprintf("Out-of-sample cumulative P&L (N=%d, K=%d)", N, K), 
                 wealth.index = TRUE, legend.loc = "topleft")

R session: Downside risk as tracking error measure (majorization approach)

Recall the original problem formulation is then \[\begin{array}{ll} \underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2} + \lambda\|\mathbf{w}\|_0\\ \textsf{subject to} & \mathbf{w}^\top\mathbf{1}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\] and the convex formulation after approximating and majorizing the cardinality term \(\|\mathbf{w}\|_0\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2} + \lambda{\mathbf{d}_{p,u}^{(k)}}^\top\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}. \end{array}\] This problem is convex and can be solved with a QP solver.

Nevertheless, we can further simplify the problem by majorizing the DR term as follows: \[\frac{1}{T}\big\|(\mathbf{r}^{b}-\mathbf{X}\mathbf{w})^{+}\big\|_{2}^{2} \le \frac{1}{T}\big\|\mathbf{r}^{b}-\mathbf{X}\mathbf{w}-\mathbf{y}^{(k)}\big\|_{2}^{2}, \] where \[\mathbf{y}^{(k)}=-\left(\mathbf{X}\mathbf{w}^{(k)}-\mathbf{r}^{b}\right)^{+}.\] Effectively, this majorization transforms the DR into a term that looks like the ETE! This means we can use all the methods developed before simply by substituting \(\mathbf{r}^{b}\) with \(\mathbf{r}^{b}-\mathbf{y}^{(k)}\). This is the resulting optimization problem: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{r}^{b}-\mathbf{X}\mathbf{w}-\mathbf{y}^{(k)}\big\|_{2}^{2} + \lambda{\mathbf{d}_{p,u}^{(k)}}^\top\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\]

# parameters
gamma <- 0.5; p <- 1e-3  # parameters for the rho approximation of log
lambda <- 0.807324e-7  # this achieves a sparsity level close to 20
# MM loop
w <- rep(1/N, N)
obj_value <- (1/T_trn)*norm(pmax(0, r_trn - X_trn %*% w), "2")^2 +
             lambda * sum(log(1+abs(w)/p)/log(1+gamma/p))
while(1) {
  d <- 1/(log(1+gamma/p)*(p+w))
  y <- -pmax(0, X_trn %*% w - r_trn)
  w_prev <- w
  w <- Variable(N)
  prob <- Problem(Minimize((1/T_trn)*sum((as.matrix(r_trn) - as.matrix(X_trn) %*% w - y)^2) 
                           + lambda * t(d) %*% w), 
                  constraints = list(w >= 0, w <= 1, sum(w)==1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  obj_value <- c(obj_value, 
                 (1/T_trn)*norm(pmax(0, r_trn - X_trn %*% w), "2")^2
                 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p)))
  if(norm(w-w_prev, "2")/norm(w_prev, "2") < 1e-3) break
}
w_DR_MM_bis <- w
plot(obj_value, type = "b", pch = 19, cex = .8, 
     main = "Convergence", xlab = "iteration", ylab = "objective value")

cat("Sparsity level K =", sum(w_DR_MM_bis > 1e-2))

R>> Sparsity level K = 19

There is some difference in the portfolio by majorizing only the cardinality term or both the cardinality and the downside risk tracking error term:

norm(w_DR_MM - w_DR_MM_bis, "2")

R>> [1] 0.2887955

However, both Cum-PnL look very similar:

# plot of wealth
ret_all <- cbind(ret_all, 
                 "DR-MM-single-bis" = X %*% w_DR_MM_bis)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

R session: Downside risk as tracking error measure via package `sparseIndexTracking`

We can again use the package sparseIndexTracking:

w_DR_pkg <- spIndexTrack(X_trn, r_trn, lambda = 2.78e-7, measure = "dr")
cat("Sparsity level K =", sum(w_DR_pkg > 1e-2))

R>> Sparsity level K = 20

We can compare the DR:

# tracking error
DR_trn <- c("DR MM single"     = (1/T_trn)*norm(pmax(0, r_trn - X_trn %*% w_DR_MM), "2")^2,
            "DR MM single-bis" = (1/T_trn)*norm(pmax(0, r_trn - X_trn %*% w_DR_MM_bis), "2")^2,
            "DR-MM-pkg"        = (1/T_trn)*norm(pmax(0, r_trn - X_trn %*% w_DR_pkg), "2")^2)
print(DR_trn)

R>>     DR MM single DR MM single-bis        DR-MM-pkg 
R>>     1.247079e-07     1.457982e-07     2.466550e-07

DR <- c("DR MM single"     = (1/T)*norm(pmax(0, r - X %*% w_DR_MM), "2")^2,
        "DR MM single-bis" = (1/T)*norm(pmax(0, r - X %*% w_DR_MM_bis), "2")^2,
        "DR-MM-pkg"        = (1/T)*norm(pmax(0, r - X %*% w_DR_pkg), "2")^2)
print(DR)

R>>     DR MM single DR MM single-bis        DR-MM-pkg 
R>>     8.397384e-07     6.616777e-07     1.241276e-06

barplot(DR, main = "Tracking error", ylab = "DR", col = rainbow(4))

We can observe the difference from the Cum-PnL:

# plot of wealth
ret_all <- cbind(ret_all,
                 "DR-MM-pkg" = X %*% w_DR_pkg)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

Extension to other penalty functions

Apart from the various performance measures, we can select a different penalty function.
We have used only the \(\ell_{2}\)-norm to penalize the differences between the portfolio and the index.
We can use the Huber penalty function for robustness against outliers (Benidis et al. 2018b): \[\phi(x)=\begin{cases} x^{2}, & \quad|x|\leq M,\\ M(2|x|-M), & \quad|x|>M. \end{cases}\]
The \(\ell_{1}\)-norm.
Many more…

Extension to Huber penalty function

Lemma 6

The function \(\phi(x)\) is majorized at \(x^{(k)}\) by the quadratic convex function \(f(x|x^{(k)})=a^{(k)}x^{2}+b^{(k)}\), where \[a^{(k)}=\begin{cases} 1, & \quad|x^{(k)}|\leq M,\\ \frac{M}{|x^{(k)}|}, & \quad|x^{(k)}|>M, \end{cases}\] and \[b^{(k)}=\begin{cases} 0, & \quad|x^{(k)}|\leq M,\\ M(|x^{(k)}|-M), & \quad|x^{(k)}|>M. \end{cases}\]

Extension to Huber penalty function

R session: Huber penalty function as error measure (direct formulation)

In the original formulation, we used the empirical tracking error to measure the performance: \[\textsf{ETE}(\mathbf{w}) = \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{w}\big\|_2^2.\] However, the \(\ell_2\)-norm is very sensitive to outliers. A way to make it more robust is to use a more robust penalty function like the Huber penalty function: \[\phi(x)=\begin{cases} x^{2}, & \quad|x|\leq M,\\ M(2|x|-M), & \quad|x|>M. \end{cases}\] The robust tracking error measure becomes \[\textsf{ETE}_\textsf{huber}(\mathbf{w}) = \frac{1}{T}\mathbf{1}^\top\boldsymbol{\phi}(\mathbf{r}^b - \mathbf{X}\mathbf{w}).\]

The problem formulation after majorizing the cardinality term \(\|\mathbf{w}\|_0\) as before is then \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\mathbf{1}^\top\boldsymbol{\phi}(\mathbf{r}^b - \mathbf{X}\mathbf{w}) + \lambda\mathbf{d}_{p,u}^{(k)^\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\]

We can easily solve this convex problem with a solver:

# parameters
gamma <- 0.5; p <- 1e-3  # parameters for the rho approximation of log
lambda <- 5e-7  # this achieves a sparsity level close to K=20
# MM loop
phi <- function(x, M = 0.1) {
  ifelse(abs(x) <= M, x^2, M*(2*abs(x)-M))
}
w <- rep(1/N, N)
obj_value <- (1/T_trn)*sum(phi(r_trn - X_trn %*% w)) + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p))
while(1) {
  d <- 1/(log(1+gamma/p)*(p+w))
  w_prev <- w
  w <- Variable(N)
  prob <- Problem(Minimize((1/T_trn)*sum(huber(as.matrix(r_trn) - as.matrix(X_trn) %*% w)) 
                           + lambda * t(d) %*% w), 
                  constraints = list(w >= 0, w <= 1, sum(w)==1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  obj_value <- c(obj_value, 
                 (1/T_trn)*sum(phi(r_trn - X_trn %*% w)) 
                 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p)))
  if(norm(w-w_prev, "2")/norm(w_prev, "2") < 1e-3) break
}
w_Huber_MM <- w
plot(obj_value, type = "b", pch = 19, cex = .8, 
     main = "Convergence", xlab = "iteration", ylab = "objective value")

cat("Sparsity level K =", sum(w_Huber_MM > 1e-2))

R>> Sparsity level K = 19

barplot(w_Huber_MM, xlab = "asset index", 
        main = sprintf("Sparse portfolio via MM (sparsity level K=%d)", sum(w_Huber_MM > 1e-2)))

We can observe the difference from the Cum-PnL:

# plot of wealth
ret_all <- cbind(ret_all[, 1:6], 
                 "Huber-MM-single" = X %*% w_Huber_MM)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

R session: Huber penalty function as error measure (majorization approach)

Recall the problem formulation after majorizing the cardinality term \(\|\mathbf{w}\|_0\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\mathbf{1}^\top\boldsymbol{\phi}(\mathbf{r}^b - \mathbf{X}\mathbf{w}) + \lambda\mathbf{d}_{p,u}^{(k)^\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\] This problem is convex and can be solved with a QP solver.

Nevertheless, we can further simplify the problem by majorizing the Huber term as follows: \[\frac{1}{T}\mathbf{1}^\top\boldsymbol{\phi}(\mathbf{r}^b - \mathbf{X}\mathbf{w}) \le \frac{1}{T}\mathbf{a}^{(k)^\top}(\mathbf{r}^b - \mathbf{X}\mathbf{w})^2, \] where \[a^{(k)}_i=\begin{cases} 1, & \quad |x^{(k)}_i|\leq M,\\ \frac{M}{|x^{(k)}_i|}, & \quad|x^{(k)}_i|>M. \end{cases}\]

Effectively, this majorization again transforms the Huber term into a term that looks like the ETE! This means we can use all the methods developed before. We can now solve the problem \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\mathbf{a}^{(k)^\top}(\mathbf{r}^b - \mathbf{X}\mathbf{w})^2 + \lambda\mathbf{d}_{p,u}^{(k)^\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\]

# parameters
gamma <- 0.5; p <- 1e-3  # parameters for the rho approximation of log
lambda <- 5e-7  # this achieves a sparsity level of K=20
# MM loop
ak <- function(x, M = 0.1) {
  ifelse(abs(x) <= M, 1, M/abs(x))
}
w <- rep(1/N, N)
obj_value <- (1/T_trn)*sum(phi(r_trn - X_trn %*% w)) + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p))
while(1) {
  d <- 1/(log(1+gamma/p)*(p+w))
  a <- ak(r_trn - X_trn %*% w)
  w_prev <- w
  w <- Variable(N)
  prob <- Problem(Minimize((1/T_trn)* t(a) %*% (as.matrix(r_trn) - as.matrix(X_trn) %*% w)^2 
                           + lambda * t(d) %*% w), 
                  constraints = list(w >= 0, w <= 1, sum(w)==1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  obj_value <- c(obj_value, 
                 (1/T_trn)*sum(phi(r_trn - X_trn %*% w)) 
                 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p)))
  if(norm(w-w_prev, "2")/norm(w_prev, "2") < 1e-3) break
}
w_Huber_MM_bis <- w
cat("Sparsity level K =", sum(w_Huber_MM_bis > 1e-2))

R>> Sparsity level K = 19

plot(obj_value, type = "b", pch = 19, cex = .8, 
     main = "Convergence", xlab = "iteration", ylab = "objective value")

Of course we get exactly the same portfolio by majorizing only the cardinality term or both the cardinality and the tracking error term:

norm(w_Huber_MM - w_Huber_MM_bis, "2")

R>> [1] 4.235976e-05

R session: Huber penalty function as error measure via package `sparseIndexTracking`

We can again use the package sparseIndexTracking:

w_Huber_pkg <- spIndexTrack(X_trn, r_trn, lambda = 8.18e-7, hub = 0.1, measure = "hete")
cat("Sparsity level K =", sum(w_Huber_pkg > 1e-2))

R>> Sparsity level K = 21

Numerical Experiments

Set up

For the numerical experiments we use historical data of two indices:

Index	Data Period	\(T_{trn}\)	\(T_{tst}\)
S&P 500	01/01/10 - 31/12/15	252	252
Russell 2000	01/06/06 - 31/12/15	1000	252

We use a rolling window approach.
Performance measure: magnitude of daily tracking error (MDTE) \[\text{MDTE}=\frac{1}{T-T_{\text{trn}}}\big\|\mathsf{diag}(\mathbf{X}\mathbf{W})-\mathbf{r}^{b}\big\|_{2},\] where \(\mathbf{X}\in\mathbb{R}^{(T-T_{\text{trn}})\times N}\) and \(\mathbf{r}^{b}\in\mathbb{R}^{T-T_{\text{trn}}}\).

Benchmarks

We will use the following benchmark methods:

MIP solution by Gurobi solver (\(\textsf{MIP}_{\text{Gur}}\)).
Diversity Method (Jansen and Van Dijk 2002) where the \(\ell_{1/2}\)-“norm” approximation is used (\(\textsf{DM}_{1/2}\)).
Hybrid Half Thresholding (HHT) algorithm (Xu et al. 2015).

S&P 500 - w/o holding constraints

Russell 2000 - w/o holding constraints

S&P 500 - w/ holding constraints

Russell 2000 - w/ holding constraints

Tracking the S&P 500 index

📄 K. Benidis, Y. Feng, and D. P. Palomar, “Sparse portfolios for high-dimensional financial index tracking,” IEEE Trans. Signal Processing, vol. 66, no. 1, pp. 155–170, 2018.

Average running time of proposed methods

Comparison of \(\textsf{AS}_{1}\) and \(\textsf{AS}_{u}\):

Conclusions

Sparse index tracking is typically implemented in a suboptimal way or discretionary way in the industry.
However, it can be mathematically formulated in a precise way and solved efficiently.
We have explored efficient algorithms that promote sparsity for the index tracking problem.

The algorithms are derived based on the MM optimization framework:
- derivation of surrogate functions
- majorization of convex problems for closed-form solutions.

Many possible extensions.

Same techniques can be used for active portfolio management.

More generally: if you know how to solve a problem, then inducing sparsity should be a piece of cake!

Resources

A software implementation of the algorithms described is available in the R package sparseIndexTracking.

The algorithms are derived and explained in full detail in

📄 K. Benidis, Y. Feng, and D. P. Palomar, “Sparse portfolios for high-dimensional financial index tracking,” IEEE Trans. Signal Processing, vol. 66, no. 1, pp. 155–170, 2018.

📘 K. Benidis, Y. Feng, and D. P. Palomar. Optimization Methods for Financial Index Tracking: From Theory to Practice. Foundations and Trends in Optimization, Now Publishers, 2018.

Thanks

For more information visit: https://www.danielppalomar.com

References

Benidis, K., Feng, Y., & Palomar, D. P. (2018a). Sparse portfolios for high-dimensional financial index tracking. IEEE Trans. Signal Processing, 66(1), 155–170.

Benidis, K., Feng, Y., & Palomar, D. P. (2018b). Optimization Methods for Financial Index Tracking: From Theory to Practice. Foundations; Trends in Optimization, Now Publishers.

Jansen, R., & Van Dijk, R. (2002). Optimal benchmark tracking with small portfolios. The Journal of Portfolio Management, 28(2), 33–39.

Maringer, D., & Oyewumi, O. (2007). Index tracking with constrained portfolios. Intelligent Systems in Accounting, Finance and Management, 15(1-2), 57–71.

Sun, Y., Babu, P., & Palomar, D. P. (2017). Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal Processing, 65(3), 794–816.

Xu, F., Xu, Z., & Xue, H. (2015). Sparse index tracking based on L\(_{1/2}\) model and algorithm. arXiv preprint.

Outline

Introduction

Investment strategies

Passive investment

Index tracking

Index tracking

Definitions

Full replication

Sparse index tracking

Sparse Index Tracking

Problem formulation

Sparse regression

Tracking error

Formulation for sparse index tracking

Existing methods

Existing methods

R session: Index tracking with two-step approach

Interlude: Majorization-Minimization (MM) algorithm

Interlude: Majorization-Minimization (MM)

Interlude on MM: surrogate/majorizer

Interlude on MM: Algorithm

Algorithm MM

Interlude on MM: Convergence

Resolution via MM

Sparse index tracking via MM

Approximate formulation

Majorization of \(\boldsymbol{\rho}_{p,\gamma}\)

Lemma 1

Proof of Lemma 1

Majorization of \(\boldsymbol{\rho}_{p,\gamma}\)

Iterative formulation via MM

Algorithm LAIT

Algorithm 1: Linear Approximation for the Index Tracking problem (LAIT)

The big picture

R session: Sparse index tracking via MM (single majorization)

Should we stop here?

Let’s rewrite the objective function

Lemma 2

Let’s majorize the objective function

Specialized iterative formulation

Solution

Proposition 1

Algorithm SLAIT

Algorithm 2: Specialized Linear Approximation for the Index Tracking problem (SLAIT)

The big picture

The big picture

R session: Sparse index tracking via MM (double majorization)

R session: Sparse index tracking via package sparseIndexTracking

Holding Constraints and Extensions

Problem formulation

Holding constraints

Problem formulation

Penalization of violations

Penalization of violations

Problem Formulation with Penalty

Holding constraints via MM

Majorization of \(\tilde{f}_{p,l}(w)\)

Lemma 3

Majorization of \(\tilde{f}_{p,l}(w)\)

Convex formulation of the majorization

Algorithm LAITH

Algorithm 3: Linear Approximation for the Index Tracking problem with Holding constraints (LAITH)

The big picture

Should we stop here?

Smooth approximation of the \((\cdot)^{+}\) operator

Smooth approximation of the \((\cdot)^{+}\) operator

Smooth majorization of \(\tilde{f}_{p,l}(w)\)

Quadratic majorization of \(\tilde{h}_{p,\epsilon,l}(w,w^{(k)})\)

Lemma 4

Quadratic majorization of \(\tilde{f}_{p,l}(w)\)

Quadratic formulation of the majorization

Quadratic formulation of the majorization

Algorithm SLAITH

Algorithm 4: Specialized Linear Approximation for the Index Tracking problem with Holding constraints (SLAITH)

The big picture

Extensions

Extension to other tracking error measures

Extension to downside risk

Lemma 5

R session: Downside risk as tracking error measure (direct formulation)

R session: Sparse index tracking via package `sparseIndexTracking`

R session: Downside risk as tracking error measure via package `sparseIndexTracking`

R session: Huber penalty function as error measure via package `sparseIndexTracking`