MAFS5310 - Portfolio Optimization with R
MSc in Financial Mathematics
The Hong Kong University of Science and Technology (HKUST)
Fall 2020-21

Outline

  • Introduction

  • Sparse Index Tracking

    • Problem formulation
    • Interlude: Majorization-Minimization (MM) algorithm
    • Resolution via MM
  • Holding Constraints and Extensions

    • Problem formulation
    • Holding constraints via MM
    • Extensions
  • Numerical Experiments

  • Conclusions

Introduction

Investment strategies

Fund managers follow two basic investment strategies:


Active

  • Assumption: markets are not perfectly efficient.
  • Through expertise add value by choosing high performing assets.

Passive

  • Assumption: market cannot be beaten in the long run.
  • Conform to a defined set of criteria (e.g. achieve same return as an index).

Passive investment

The stock markets have historically risen, e.g. S&P 500:

  • Partly misleading: e.g. inflation.
  • Still, reasonable returns can be obtained without the active management’s risk.
  • Makes passive investment more attractive.

Index tracking

  • Index tracking is a popular passive portfolio management strategy.
  • Goal: construct a portfolio that replicates the performance of a financial index.

Index tracking

  • Index tracking or benchmark replication is a strategy investment aimed at mimicking the risk/return profile of a financial instrument.

  • For practical reasons, the strategy focuses on a reduced basket of representative assets.

  • The problem is also regarded as portfolio compression and it is intimately related to compressed sensing and \(\ell_{1}\)-norm minimization techniques (Benidis et al. 2018a), (Benidis et al. 2018b).

  • One example is the replication of an index, e.g., Hang Seng Index, based on a reduced basket of assets.

Definitions

  • Price and return of an asset or an index: \(p_{t}\) and \(r_{t}=\frac{p_{t}-p_{t-1}}{p_{t-1}}\)
  • Returns of an index in \(T\) days: \(\boldsymbol{\mathbf{r}}^{b}=[r_{1}^{b},\dots,r_{T}^{b}]^{\top}\in\mathbb{R}^{T}\)
  • Returns of \(N\) assets in \(T\) days: \(\mathbf{X}=[\mathbf{r}_{1},\dots,\mathbf{r}_{T}]^{\top}\in\mathbb{R}^{T\times N}\) with \(\mathbf{r}_{t}\in\mathbb{R}^{N}\)
  • Assume that an index is composed by a weighted collection of \(N\) assets with normalized index weights \(\mathbf{b}\) satisfying
    • \(\mathbf{b}>\mathbf{0}\)
    • \(\mathbf{b}^{\top}\mathbf{1}=1\)
    • \(\mathbf{X}\mathbf{b}=\mathbf{r}^{b}\)
  • We want to design a (sparse) tracking portfolio \(\mathbf{w}\) satisfying
    • \(\mathbf{w}\geq\mathbf{0}\)
    • \(\mathbf{w}^{\top}\mathbf{1}=1\)
    • \(\mathbf{X}\mathbf{w}\approx\mathbf{r}^{b}\)

Full replication

  • How should we select \(\mathbf{w}\)?
  • Straightforward solution: full replication \(\mathbf{w}=\mathbf{b}\)
    • Buy appropriate quantities of all the assets
    • Perfect tracking
  • But it has drawbacks:
    • We may be trying to hedge some given portfolio with just a few names (to simplify the operations)
    • We may want to deal properly with illiquid assets in the universe
    • We may want to control the transaction costs for small portfolios (AUM)

Sparse index tracking

  • How can we overcome these drawbacks?

    👉 Sparse index tracking.

  • Use a small number of assets: \(\mathsf{card}(\mathbf{w})<N\)
    • can allow hedging with just a few names
    • can avoid illiquid assets
    • can reduce transaction costs for small portfolios
  • Challenges:
    • Which assets should we select?
    • What should their relative weight be?

Sparse Index Tracking

Problem formulation

Sparse regression

Sparse regression: \[\begin{array}{ll} \underset{\mathbf{w}}{\mathsf{minimize}} & \left\Vert \mathbf{r}-\mathbf{X}\mathbf{w}\right\Vert _{2}+\lambda\left\Vert \mathbf{w}\right\Vert _{0} \end{array}\] tries to fit the observations by minimizing the error with a sparse solution:

Tracking error

  • Recall that \(\mathbf{b}\in\mathbb{R}^{N}\) represents the actual benchmark weight vector and \(\mathbf{w}\in\mathbb{R}^{N}\) denotes the replicating portfolio.

  • Investment managers seek to minimize the following tracking error (TE) performance measure: \[\textsf{TE}\left(\mathbf{w}\right) =\left(\mathbf{w}-\mathbf{b}\right)^{T}\boldsymbol{\Sigma}\left(\mathbf{w}-\mathbf{b}\right)\] where \(\boldsymbol{\Sigma}\) is the covariance matrix of the index returns.

  • In practice, however, the benchmark weight vector \(\mathbf{b}\) may be unknown and the error measure is defined in terms of market observations.

  • A common tracking measure is the empirical tracking error (ETE): \[\textsf{ETE}(\mathbf{w})=\frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|^2_{2}\]

Formulation for sparse index tracking

  • Problem formulation for sparse index tracking (Maringer and Oyewumi 2007): \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda\|\mathbf{w}\|_{0}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \]
    • \(\|\mathbf{w}\|_{0}\) is the \(\ell_{0}\)-“norm” and denotes \(\mathsf{card}(\mathbf{w})\)
    • \(\mathcal{W}\) is a set of convex constraints (e.g., \(\mathcal{W}=\{\mathbf{w}|\mathbf{w}\geq\mathbf{0},\mathbf{w}^{\top}\mathbf{1}=1\}\))
    • we will treat any nonconvex constraint separately


  • This problem is too difficult to deal with directly:

    👉 Discontinuous, non-differentiable, non-convex objective function.

Existing methods

  • Two-step approach:
    1. stock selection:
      • largest market capital
      • most correlated to the index
      • a combination cointegrated well with the index
    2. capital allocation:
      • naive allocation: proportional to the original weights
      • optimized allocation: usually a convex problem
  • Mixed Integer Programming (MIP)
    • practical only for small dimensions, e.g. \(\binom{100}{20}>10^{20}\).
  • Genetic algorithms
    • solve the MIP problems in reasonable time
    • worse performance, cannot prove optimality.

Existing methods

The two-step approach is much worse than joint optimization:

R session: Index tracking with two-step approach

A simple design for sparse index tracking is to decompose the task into two steps: asset selection (choosing the active weights) and capital allocation (optimization of weights). We will assume we know the portfolio implemented by the actual index, which will be a dense portfolio (as opposed to sparse) \(\mathbf{b}\) such that \(\mathbf{X}\mathbf{b} = \mathbf{r}^b\). In practice, it is expensive to get the information on \(\mathbf{b}\), so we will just regress it.

  • Step 1: Asset selection We can define the selection of \(K\) assets through the vector \(\mathbf{s}\), defined as \[s_i=\begin{cases} 1\quad & \text{if the }j\text{-th asset is selected}\\ 0 & \text{otherwise}, \end{cases}\] with \(\mathbf{1}^T\mathbf{s}=K\). For example, we can simply choose the \(K\) largest assets of \(\mathbf{b}\).

  • Step 2: Capital allocation Once the active assets are known, then the portfolio design is simple. One option is to define the portfolio in the same proportion as the index composition (but only using the active assets): \[\mathbf{w}=\frac{\mathbf{b}\odot\mathbf{s}}{\mathbf{1}^T(\mathbf{b}\odot\mathbf{s})}\] where \(\odot\) denotes Hadamard (elementwise) product. Another option is to fit the index returns subject to the sparsity pattern given by the asset selection: \[\begin{array}{ll} \underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}(\mathbf{w}\odot\mathbf{s})\big\|_2^2\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{w}\ge\mathbf{0}. \end{array}\]


Loading data
We start by loading the market data of the index S&P 500 and its underlying assets during 2010 (from the package sparseIndexTracking):

library(xts)
library(sparseIndexTracking)  #install.packages("sparseIndexTracking")
data(INDEX_2010)
T <- nrow(INDEX_2010$X)
N <- ncol(INDEX_2010$X)
X <- INDEX_2010$X
r <- INDEX_2010$SP500

# training and test sets
T_trn <- round(0.7*T)  # define the training set
T_tst <- T - T_trn
X_trn <- X[1:T_trn, ]
r_trn <- r[1:T_trn, ]
X_tst <- X[-(1:T_trn), ]
r_tst <- r[-(1:T_trn), ]

The data INDEX_2010 contains a list with two xts objects:
1. X: A \(T\times N\) xts with the daily linear returns of the \(N\) assets that were in the index during the year 2010 (total \(T\) trading days)
2. SP500: A \(T\times 1\) xts with the daily linear returns of the index S&P 500 during the same period.


Regressing the dense index portfolio
Since we don’t know the index composition \(\mathbf{b}\), we will get an estimate by solving the constrained regression \[\begin{array}{ll} \underset{\mathbf{b}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{b}\big\|_2^2\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{b}=1\\ & \mathbf{b}\ge\mathbf{0}. \end{array}\]

library(CVXR)
b <- Variable(N)
prob <- Problem(Minimize(sum((as.matrix(r) - as.matrix(X) %*% b)^2)), 
                constraints = list(b >= 0, sum(b)==1))
result <- solve(prob)
b_SP500 <- as.vector(result$getValue(b))
barplot(b_SP500, main = "SP500 composition", xlab = "asset index")


Two-stage approach: weights proportional to \(\mathbf{b}\)
To obtain \(\mathbf{s}\), we can simply choose the largest \(K\) elements of \(\mathbf{b}\):

K <- 20
idx_Klargest <- sort(b_SP500, decreasing = TRUE, index.return = TRUE)$ix[1:K]
s <- rep(0, N)
s[idx_Klargest] <- 1

Then we can proceed to define the portfolio in the same proportion as the index composition:

w_2step_a <- rep(0, N)
w_2step_a[idx_Klargest] <- b_SP500[idx_Klargest]
w_2step_a <- w_2step_a/sum(w_2step_a)
barplot(w_2step_a, main = "Two-step design with proportional weights", xlab = "asset index")


Two-stage approach: fitted weights
We can also obtain the least squares design given the previous selection:

w <- Variable(N)
prob <- Problem(Minimize(sum((as.matrix(r_trn) - as.matrix(X_trn) %*% (w*s))^2)), 
                constraints = list(w >= 0, sum(w)==1))
result <- solve(prob)
w_2step_b <- as.vector(result$getValue(w))
barplot(w_2step_b, main = "Two-step design with fitted weights", xlab = "asset index")


Comparison We can now compute the tracking errors of the two designs compared with the index value:

# tracking error
ETE <- c("dense b"                = (1/T)*norm(r - X %*% b_SP500, "2")^2,
         "two-step sparse prop b" = (1/T)*norm(r - X %*% w_2step_a, "2")^2,
         "two-step sparse LS"     = (1/T)*norm(r - X %*% w_2step_b, "2")^2)
ETE <- as.matrix(ETE)
colnames(ETE) <- "ETE"
print(ETE)
R>>                                 ETE
R>> dense b                8.767757e-09
R>> two-step sparse prop b 5.950379e-06
R>> two-step sparse LS     4.304850e-06

We can get a better graph of the tracking errors visually:

barplot(c(ETE), main = "Tracking error", ylab = "ETE", col = rainbow(4))

We can also plot the Cum-PnL:

library(PerformanceAnalytics)

# plot of wealth
ret_all <- cbind("SP500"                  = r,
                 "dense b"                = X %*% b_SP500,
                 "two-step sparse prop b" = X %*% w_2step_a,
                 "two-step sparse LS"     = X %*% w_2step_b)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

Interlude: Majorization-Minimization (MM) algorithm

Interlude: Majorization-Minimization (MM)

  • Consider the following presumably difficult optimization problem: \[\begin{array}{ll} \underset{{\mathbf x}}{\textsf{minimize}} & f\left({\bf x}\right)\\ \textsf{subject to} & {\bf x}\in\mathcal{X}, \end{array}\] with \(\mathcal{X}\) being the feasible set and \(f\left({\bf x}\right)\) being continuous.

  • Idea: successively minimize a more managable surrogate function \(u\left({\bf x},{\bf x}^{(k)}\right)\): \[{\bf x}^{(k+1)}=\arg\min_{{\bf x}\in\mathcal{X}}u\left({\bf x},{\bf x}^{(k)}\right),\] hoping the sequence of minimizers \(\left\{ {\bf x}^{(k)}\right\}\) will converge to optimal \({\bf x}^{\star}\).

  • Question: how to construct \(u\left({\bf x},{\bf x}^{(k)}\right)\)?

  • Answer: that’s more like an art (Sun et al. 2017).

Interlude on MM: surrogate/majorizer

  • Construction rule: \[\begin{array}{ll} u\left({\bf y},{\bf y}\right) &= f\left({\bf y}\right),\ \forall{\bf y}\in\mathcal{X}\\ u\left({\bf x},{\bf y}\right) &\geq f\left({\bf x}\right),\ \forall{\bf x},{\bf y}\in\mathcal{X}\\ \left.u'\left({\bf x},{\bf y};{\bf d}\right)\right|_{{\bf x}={\bf y}} &= f\;'\left({\bf y};{\bf d}\right),\ \forall{\bf d}\text{ with }{\bf y}+{\bf d}\in\mathcal{X}\\ u\left({\bf x},{\bf y}\right) &\text{is continuous in }{\bf x} \text{ and } {\bf y} \end{array}\]

Interlude on MM: Algorithm

Algorithm MM

Set \(k=0\) and initialize with a feasible point \({\bf x}^0\in \mathcal X\).
repeat

  • \({\bf x}^{(k+1)}=\arg\min_{{\bf x}\in {\mathcal X}} u\left({\bf x},{\bf x}^{(k)}\right)\)

  • \(k \gets k+1\)

until convergence
return \(\mathbf{x}^{(k)}\)

Interlude on MM: Convergence

  • Under some technical assumptions, every limit point of the sequence \(\left\{ {\bf x}^{k}\right\}\) is a stationary point of the original problem.


  • If further assume that the level set \(\mathcal{X}^{0}=\left\{ {\bf x}|f\left({\bf x}\right)\leq f\left({\bf x}^{0}\right)\right\}\) is compact, then \[\lim_{k\to\infty}d\left({\bf x}^{(k)},\mathcal{X}^{\star}\right)=0,\] where \(\mathcal{X}^{\star}\) is the set of stationary points.

Resolution via MM

Sparse index tracking via MM

Approximation of the \(\ell_{0}\)-norm (indicator function): \[\rho_{p,{\color{red}\gamma}}(w)=\frac{\log(1+|w|/p)}{\log(1+{\color{red}\gamma}/p)}.\]

  • Good approximation in the interval \([-\gamma,\gamma]\).
  • Concave for \(w\ge0\).
  • So-called folded-concave for \(w\in\mathbb{R}\).
  • For our problem we set \(\gamma=u\), where \(u\leq1\) is an upperbound of the weights (we can always choose \(u=1\)).

Approximate formulation

  • Continuous and differentiable approximate formulation: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda\mathbf{1}^{\top}\mathbf{\boldsymbol{\rho}}{}_{p,u}(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \] where \(\boldsymbol{\rho}_{p,u}(\mathbf{w})=[\rho_{p,u}(w_{1}),\dots,\rho_{p,u}(w_{N})]^{\top}\).


  • This problem is still non-convex: \(\boldsymbol{\rho}_{p,u}(\mathbf{w})\) is concave for \(\mathbf{w}\geq\mathbf{0}\).


  • We will use MM to deal with the non-convex part.

Majorization of \(\boldsymbol{\rho}_{p,\gamma}\)

Lemma 1

The function \(\rho_{p,\gamma}(w)\), with \(w\geq0\), is upperbounded at \(w^{(k)}\) by the surrogate function \[h_{p,\gamma}(w,w^{(k)})=d_{p,\gamma}(w^{(k)})w+c_{p,\gamma}(w^{(k)}),\] where \[\begin{aligned} d_{p,\gamma}(w^{(k)}) &= \frac{1}{\log(1+\gamma/p)(p+w^{(k)})},\\ c_{p,\gamma}(w^{(k)}) &= \frac{\log\left(1+w^{(k)}/p\right)}{\log(1+\gamma/p)}-\frac{w^{(k)}}{\log(1+\gamma/p)(p+w^{(k)})} \end{aligned}\] are constants.

Proof of Lemma 1

  • The function \(\rho_{p,\gamma}(w)\) is concave for \(w\geq0\).
  • An upper bound is its first-order Taylor approximation at any point \(w_{0}\in\mathbb{R}_{+}\). \[\begin{aligned} \rho_{p,\gamma}(w) &= \frac{\log(1+w/p)}{\log(1+\gamma/p)}\\ & \leq \frac{1}{\log(1+\gamma/p)}\left[\log\left(1+w_{0}/p\right)+\frac{1}{p+w_{0}}(w-w_{0})\right] \\ & = \color{red}{\underbrace{\color{black}{\frac{1}{\log(1+\gamma/p)(p+w_{0})}}}_{d_{p,\gamma}}}\color{black}{w} \\ & \qquad + \color{red}{\underbrace{\color{black}{\frac{\log\left(1+w_{0}/p\right)}{\log(1+\gamma/p)}-\frac{w_0}{\log(1+\gamma/p)(p+w_{0})}}}_{b_{p,\gamma}}} \end{aligned}\]

Majorization of \(\boldsymbol{\rho}_{p,\gamma}\)

Iterative formulation via MM

  • Now in every iteration we need to solve the following problem: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \] where \(\mathbf{d}_{p,u}^{(k)}=\left[d_{p,u}(w_{1}^{(k)}),\dots,d_{p,u}(w_{N}^{(k)})\right]^{\top}\).


  • This problem is convex (actually a QP).


  • Requires a solver in each iteration.

Algorithm LAIT

Algorithm 1: Linear Approximation for the Index Tracking problem (LAIT)

Set \(k=0\) and choose \(\mathbf{w}^{(0)}\in\mathcal{W}\).

repeat

  • Compute \(\mathbf{d}_{p,u}^{(k)}\)

  • \({\bf w}^{(k+1)}=\arg\min_{{\bf w}\in\mathcal{W}} \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}\)

  • \(k \gets k+1\)

until convergence
return \(\mathbf{w}^{(k)}\)

The big picture

R session: Sparse index tracking via MM (single majorization)

Recall our nonconvex and nondifferentiable optimization problem: \[\begin{array}{ll} \underset{\mathbf{w}}{\text{minimize}} & \frac{1}{T}\big\|\mathbf{r}^b - \mathbf{X}\mathbf{w}\big\|_2^2 + \lambda\|\mathbf{w}\|_0\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}. \end{array}\]

The idea now is to employ the Majorization-Minimization (MM) algorithm to iteratively solve a sequence of simpler surrogate problems.

The first step is to get rid of the nondifferentiability by approximating the cardinality operator \(\|\mathbf{w}\|_0\) by \(\mathbf{1}^{\top}\mathbf{\boldsymbol{\rho}}{}_{p,\gamma}(\mathbf{w})\), where we have approximated the \(\ell_{0}\)-norm (indicator function) with \[\rho_{p,\gamma}(w)=\frac{\log(1+|w|/p)}{\log(1+\gamma/p)}.\]

Now we can derive a majorizer of the nonconvex term \(\mathbf{1}^{\top}\mathbf{\boldsymbol{\rho}}{}_{p,\gamma}(\mathbf{w})\) (i.e., derive a tangent upper bound) easily by simply linearizing the log function. At iteration \(k\), the resulting problem is the convex QP: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^\top\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^\top\mathbf{w}=1\\ & \mathbf{0}\leq\mathbf{w}\leq u\mathbf{1}, \end{array}\] where \(d_{p,\gamma}(w^{(k)})=\frac{1}{\log(1+\gamma/p)(p+w^{(k)})}\).

We are ready to implement the iterative MM approach:

# parameters
gamma <- 0.5; p <- 1e-3  # parameters for the rho approximation of log
lambda <- 6e-7  # this achieves a sparsity level of K=20
# MM loop
w <- rep(1/N, N)
obj_value <- (1/T_trn)*norm(r_trn - X_trn %*% w, "2")^2 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p))
while(1) {
  d <- 1/(log(1+gamma/p)*(p+w))
  w_prev <- w
  w <- Variable(N)
  prob <- Problem(Minimize((1/T_trn)*sum((as.matrix(r_trn) - as.matrix(X_trn) %*% w)^2) 
                           + lambda * t(d) %*% w), 
                  constraints = list(w >= 0, w <= 1, sum(w)==1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  obj_value <- c(obj_value, 
                 (1/T_trn)*norm(r_trn - X_trn %*% w, "2")^2 
                 + lambda * sum(log(1+abs(w)/p)/log(1+gamma/p)))
  if(norm(w-w_prev, "2")/norm(w_prev, "2") < 1e-3) break
}
w_MM_single <- w
plot(obj_value, type = "b", pch = 19, cex = .8, 
     main = "Convergence", xlab = "iteration", ylab = "objective value")

cat("Sparsity level K =", sum(w_MM_single > 1e-2))
R>> Sparsity level K = 17
barplot(w_MM_single, xlab = "asset index", 
        main = sprintf("Sparse portfolio via MM (sparsity level K=%d)", sum(w_MM_single > 1e-2)))

We can now compute the tracking error:

# tracking error
ETE <- rbind(ETE, 
             "MM-single" = (1/T)*norm(r - X %*% w_MM_single, "2")^2)
print(ETE)
R>>                                 ETE
R>> dense b                8.767757e-09
R>> two-step sparse prop b 5.950379e-06
R>> two-step sparse LS     4.304850e-06
R>> MM-single              2.754003e-06
barplot(c(ETE), main = "Tracking error", ylab = "ETE", col = rainbow(4))

As expected, the tracking error is much lower than the two-step approaches.

We can also plot the Cum-PnL:

# plot of wealth
ret_all <- cbind(ret_all, 
                 "MM-single" = X %*% w_MM_single)
{ chart.CumReturns(ret_all, main = sprintf("Cumulative P&L (N=%d, K=%d)", N, K), 
                   wealth.index = TRUE, legend.loc = "topleft")
  addEventLines(xts("training", index(X[T_trn])), srt=90, pos=2, lwd = 2, col = "black") }

Should we stop here?

  • Advantages:

    ✅ The problem is convex.
    ✅ Can be solved efficiently by an off-the-shelf solver.


  • Disadvantages:

    ❌ Needs to be solved many times (one for each iteration).
    ❌ Calling a solver many times increases significantly the running time.


  • Can we do something better?

    ✅ For specific constraint sets we can derive closed-form update algorithms!

Let’s rewrite the objective function

  • Expand the objective: \(\frac{1}{T}\big\|\mathbf{X}\mathbf{w}-\mathbf{r}^{b}\big\|_{2}^{2}+\lambda{\mathbf{d}_{p,u}^{(k)}}^{\top}\mathbf{w}=\frac{1}{T}\mathbf{w}^{\top}\mathbf{X}^{\top}\mathbf{X}\mathbf{w}+\left(\lambda\mathbf{d}_{p,u}^{(k)}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right)^{\!\!\top}\!\!\mathbf{w}+\mathsf{\mathit{const.}}\)


  • Further upper-bound it:

Lemma 2

Let \(\mathbf{L}\) and \(\mathbf{M}\) be real symmetric matrices such that \(\mathbf{M}\succeq\mathbf{L}\). Then, for any point \(\mathbf{w}^{(k)}\in\mathbb{R}^{N}\) the following inequality holds: \[\mathbf{w}^{\top}\mathbf{L}\mathbf{w}\leq\mathbf{w}^{\top}\mathbf{M}\mathbf{w}+2{\mathbf{w}^{(k)}}^{\!\top}\!\!\left(\mathbf{L}-\mathbf{M}\right)\mathbf{w}-{\mathbf{w}^{(k)}}^{\top}\!\!\left(\mathbf{L}-\mathbf{M}\right)\mathbf{w}^{(k)}.\] Equality is achieved when \(\mathbf{w}=\mathbf{w}^{(k)}\).

Let’s majorize the objective function

  • Based on Lemma 2:
    • Majorize the quadratic term \(\frac{1}{T}\mathbf{w}^{\top}\mathbf{X}^{\top}\mathbf{X}\mathbf{w}\).
    • In our case \(\mathbf{L}_{1}=\frac{1}{T}\mathbf{X}^{\top}\mathbf{X}\).
    • We set \(\mathbf{M}_{1}=\lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{I}\) so that \(\mathbf{M}_{1}\succeq\mathbf{L}_{1}\) holds.
  • The objective becomes: \[ \begin{aligned} & \mathbf{w}^{\top}\mathbf{L}_{1}\mathbf{w}+\left(\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right)^{\top}\!\mathbf{w} \\ & \leq \mathbf{w}^{\top}\mathbf{M}_{1}\mathbf{w}+2{\mathbf{w}^{(k)}}^{\!\top}\!\left(\mathbf{L}_{1}-\mathbf{M}_{1}\right)\mathbf{w}-{{\mathbf{w}^{(k)}}^{\!\top}\!\!\left(\mathbf{L}_{1}-\mathbf{M}_{1}\right)\mathbf{w}^{(k)}} \\ & \qquad +\left(\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right)^{\!\top}\!\mathbf{w} \\ & = \lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{w}^{\top}\mathbf{w}+\!\left(\!2\left(\mathbf{L}_{1}-\lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{I}\right)\mathbf{w}^{(k)}+\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\!\right)^{\!\!\top}\mathbf{\!\!w}+const. \end{aligned} \]

Specialized iterative formulation

The new optimization problem at the \((k+1)\)-th iteration becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{\top}\mathbf{w}+{\mathbf{q}_{1}^{(k)}}^{\top}\mathbf{w}\\ \textsf{subject to} & \begin{array}{l} \mathbf{w}^{\top}\mathbf{1}=1,\\ \mathbf{0}\leq\mathbf{w}\leq\mathbf{1}, \end{array} \biggl\}\mathcal{W} \end{array} \] where \[\mathbf{q}_{1}^{(k)}=\frac{1}{\lambda_{\text{max}}^{(\mathbf{L}_{1})}}\left(2\left(\mathbf{L}_{1}-\lambda_{\text{max}}^{(\mathbf{L}_{1})}\mathbf{I}\right)\mathbf{w}^{(k)}+\lambda{\mathbf{d}_{p,u}^{(k)}}-\frac{2}{T}\mathbf{X}^{\top}\mathbf{r}^{b}\right).\]


  • This problem can be solved with a closed-form update algorithm.

Solution

Proposition 1

The optimal solution to the previous problem with \(u=1\) is: \[w_{i}^{\star}=\begin{cases} -\frac{\mu+q_{i}}{2},\quad & i\in\mathcal{A},\\ 0,\quad & i\notin\mathcal{A}, \end{cases}\] with \[\mu=-\frac{\sum_{i\in A}q_{i}+2}{\mathit{card}(\mathcal{A})},\] and \[\mathcal{A}=\big\{ i\big|\mu+q_{i}<0\big\},\] where \(\mathcal{A}\) can be determined in \(O(\log(N))\) steps.

Algorithm SLAIT

Algorithm 2: Specialized Linear Approximation for the Index Tracking problem (SLAIT)

Set \(k=0\) and choose \(\mathbf{w}^{(0)}\in\mathcal{W}\).

repeat

  • Compute \(\mathbf{q}_{1}^{(k)}\)

  • \({\bf w}^{(k+1)}=\arg\min_{{\bf w}\in\mathcal{W}} \mathbf{w}^{\top}\mathbf{w}+{\mathbf{q}_{1}^{(k)}}^{\top}\mathbf{w}\)

  • \(k \gets k+1\)

until convergence
return \(\mathbf{w}^{(k)}\)

The big picture

The big picture