MAFS5310 - Portfolio Optimization with R
MSc in Financial Mathematics
The Hong Kong University of Science and Technology (HKUST)
Fall 2020-21

Outline

  • Robust Optimization

  • Markowitz Portfolio Optimization

  • Robust Maximum Return Portfolio Optimization

  • Robust Minimum Variance Portfolio Optimization

  • Robust Markowitz Portfolio Optimization

  • Robust Sharpe Ratio Portfolio Optimization

  • Summary

Robust Optimization

Convex optimization

  • A convex optimization problem is written as \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & f_{0}\left(\mathbf{x}\right)\\ \textsf{subject to} & f_{i}\left(\mathbf{x}\right)\leq0\qquad i=1,\ldots,m\\ & \mathbf{h}\left(\mathbf{x}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}=0 \end{array} \] where \(f_{0},\,f_{1},\ldots,f_{m}\) are convex and equality constraints are affine.

  • Convex problems enjoy a rich theory (KKT conditions, zero duality gap, etc.) as well as a large number of efficient numerical algorithms guaranteed to deliver an optimal solution \(\mathbf{x}^{\star}\).

  • Many off-the-shelf solvers exist in all the programming languages (e.g., R, Python, Matlab, Julia, C, etc.), tailored to specific classes of problems, namely, LP, QP, QCQP, SOCP, SDP, GP, etc.

Useless in practice!

  • However, the obtained optimal solution \(\mathbf{x}^{\star}\) typically performs very poorly in practice.

  • In many cases, it can be totally useless!

  • Why is that?

  • Recall that a problem formulation contains not only the optimization variables \(\mathbf{x}\) but also the parameters \(\boldsymbol{\theta}\).

  • Such parameters define the problem instance and are typically estimated in practice, i.e., they are not exact: \(\hat{\boldsymbol{\theta}}\neq\boldsymbol{\theta}\) but hopefully close \(\hat{\boldsymbol{\theta}}\simeq\boldsymbol{\theta}\).

  • The question is whether a small error in the parameters is going to be detrimental or can be ignored. That depends on each particular type of problem.

  • In the case of portfolio optimization, small errors in the parameters \(\boldsymbol{\theta}=\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right)\) happen to have a huge effect in the solution \(\mathbf{x}^{\star}\). To the point that most practitioners avoid the use of portfolio optimization!

Parameters: \(\boldsymbol{\theta}\)

  • To make explicit the fact that the functions depend on parameters \(\boldsymbol{\theta}\), we can explicitly write \(f_{i}\left(\mathbf{x};\boldsymbol{\theta}\right)\) and \(h_{i}\left(\mathbf{x};\boldsymbol{\theta}\right)\).
  • For example, consider an LP: \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & \mathbf{c}^{T}\mathbf{x}+\mathbf{d}\\ \textsf{subject to} & \mathbf{A}\mathbf{x}=\mathbf{b}. \end{array} \]

  • The parameters are \(\boldsymbol{\theta}=\left(\mathbf{A},\mathbf{b},\mathbf{c},\mathbf{d}\right)\).

  • The objective function is \(f_{0}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{c}^{T}\mathbf{x}+\mathbf{d}\)

  • The constraint function is \(\mathbf{h}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}\)

  • In practice, we only have an estimation \(\hat{\boldsymbol{\theta}}\). So the problem can only be formulated and solved using \(\hat{\boldsymbol{\theta}}\) obtaining the solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}}\)), which is different from the desired one \(\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\).

Robust optimization

  • The naive approach is to pretend that \(\hat{\boldsymbol{\theta}}\) is close enough to \(\boldsymbol{\theta}\) and solve the approximated problem, obtaining \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).

  • For some type of problems, it may be that \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\approx\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\) and that’s it.

  • For many other problems, however, that’s not the case. So we cannot really rely on the naive solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).

  • The solution is to consider instead a robust formulation that takes into account the fact that we know we only have an estimation of the parameters.

  • There are several ways to make the problem robust to parameters errors, mainly:

    • stochastic robust optimization (involving expectations)
    • worst-case robust optimization
    • chance programming or chance robust optimization.

Taxonomy of robust optimization

  • Stochastic optimization (SO): this includes expectations as well as chance constraints (requires probabilistic modeling of the parameter):
    📘 J. R. Birge and F. V. Louveaux. Introduction to Stochastic Programming. Springer, 2011.
    📘 A. P. Ruszczynski and A. Shapiro. Stochastic Programming. Elsevier, 2003.
    📘 A. Prekopa. Stochastic Programming. Kluwer Academic Publishers, 1995.


  • Robust optimization (RO): this includes the worst-case approach (requires definition of hard uncertainty set for the parameter):
    📘 A. Ben-Tal, L. E. Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press, 2009.
    📄 A. Ben-Tal and A. Nemirovski, “Selected topics in robust convex optimization”, Mathematical Programming, 112 (1), 2008.
    📄 D. Bertsimas, D. B. Brown, and C. Caramanis, “Theory and applications of robust optimization”, SIAM Review, 53 (3), 2011.
    📄 M. S. Lobo. Robust and convex optimization with applications in finance. PhD thesis, Stanford University, 2000.

Stochastic optimization: Expectations

  • In stochastic robust optimization, one models the estimation \(\hat{\boldsymbol{\theta}}\) as a random variable that fluctuates around its true value \(\boldsymbol{\theta}\).

  • Then, instead of considering the approximated function \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\), it uses its expected value \(E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})]\), where \(E_{\boldsymbol{\theta}}[\cdot]\) denotes expectation over the random variable \(\boldsymbol{\theta}\).

  • The random variable is typically modeled around the estimated value as \(\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) following a zero-mean distribution such as Gaussian.

  • For example, if the function is quadratic, say, \(f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\), and we model the parameter as \(\mathbf{c}=\hat{\mathbf{c}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) zero-mean and covariance matrix \(\mathbf{Q}\), then the expected value is \[\begin{aligned} E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})] &= E_{\boldsymbol{\delta}}[((\hat{\mathbf{c}}+\boldsymbol{\delta})^{T}\mathbf{x})^{2}]\\ &= E_{\boldsymbol{\delta}}[\mathbf{x}^{T}\hat{\mathbf{c}}\hat{\mathbf{c}}^{T}\mathbf{x}+\mathbf{x}^{T}\boldsymbol{\delta}\boldsymbol{\delta}^{T}\mathbf{x}]\\ &= (\hat{\mathbf{c}}^{T}\mathbf{x})^{2}+\mathbf{x}^{T}\mathbf{Q}\mathbf{x} \end{aligned}\] where the additional term \(\mathbf{x}^{T}\mathbf{Q}\mathbf{x}\) serves as a regularizer.

Worst-case robust optimization

  • In worst-case robust optimization, the parameter is not characterized statistically. Instead, it is assumed that the true parameter lies in an uncertainty region centered around the estimated value: \(\boldsymbol{\theta}\in\mathcal{U}\).
  • The uncertainty region can be chosen depending on the problem. Typical choices include:
    • sphere region: \[\mathcal{U} =\{\boldsymbol{\theta}|\parallel\boldsymbol{\theta}-\hat{\boldsymbol{\theta}}\parallel_{2})\leq\delta\}\]
    • box region: \[\mathcal{U} =\{\boldsymbol{\theta}|\parallel\boldsymbol{\theta}-\hat{\boldsymbol{\theta}}\parallel_{\infty})\leq\delta\}\]
    • elliptical region: \[\mathcal{U} =\{\boldsymbol{\theta}|(\boldsymbol{\theta}-\hat{\boldsymbol{\theta}})^{T}\mathbf{S}^{-1}(\boldsymbol{\theta}-\hat{\boldsymbol{\theta}}))\leq\delta^{2}\}\] where \(\mathbf{S}\succ\mathbf{0}\) defines the shape of the ellipsoid.

Worst-case robust optimization: Example

Take the previous quadratic function \[f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\] and consider a sphere uncertainty region \[\mathcal{U}=\{\mathbf{c}|\parallel\mathbf{c}-\hat{\mathbf{c}}\parallel_{2})\leq\delta\}.\]


  • If the function is the objective to be minimized or it is a constraint of the form \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\leq0\), then the worst-case value of that function is \[\begin{aligned} \underset{\mathbf{c}\in\mathcal{U}}{\max}\;\left|\mathbf{c}^{T}\mathbf{x}\right| & = \underset{\left\Vert \mathbf{e}\right\Vert\leq\delta}{\max}\left|(\hat{\mathbf{c}}+\mathbf{e})^{T}\mathbf{x}\right|\\ & \leq\underset{\left\Vert \mathbf{e}\right\Vert \leq\delta}{\max}\left|\hat{\mathbf{c}}^{T}\mathbf{x}\right| + \left|\mathbf{e}^{T}\mathbf{x}\right|\\ & \leq \left|\hat{\mathbf{c}}^{T}\mathbf{x}\right| + \delta\left\Vert\mathbf{x}\right\Vert \end{aligned}\] with upper bound achieved by \(\mathbf{e}=\frac{\mathbf{x}}{\left\Vert \mathbf{x}\right\Vert }\delta\).

Worst-case robust optimization: Example

Take the previous quadratic function \[f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\] and consider a sphere uncertainty region \[\mathcal{U}=\{\mathbf{c}|\parallel\mathbf{c}-\hat{\mathbf{c}}\parallel_{2})\leq\delta\}.\]


  • If the function is the objective to be maximized or it is a constraint of the form \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\geq0\), then the worst-case value of that function is \[\begin{aligned} \underset{\mathbf{c}\in\mathcal{U}}{\min}\;\left|\mathbf{c}^{T}\mathbf{x}\right| & = \underset{\left\Vert \mathbf{e}\right\Vert \leq\delta}{\min}\left|(\hat{\mathbf{c}}+\mathbf{e})^{T}\mathbf{x}\right|\\ & \geq \underset{\left\Vert \mathbf{e}\right\Vert \leq\delta}{\min}\left|\hat{\mathbf{c}}^{T}\mathbf{x}\right| - \left|\mathbf{e}^{T}\mathbf{x}\right|\\ & \geq \left|\hat{\mathbf{c}}^{T}\mathbf{x}\right|-\delta\left\Vert\mathbf{x}\right\Vert \end{aligned}\] with lower bound achieved by \(\mathbf{e}=-\frac{\mathbf{x}}{\left\Vert \mathbf{x}\right\Vert }\delta\).

Stochastic optimization: Chance constraints

  • The problem with expectations is that only the average behavior is concerned and nothing is under control about the realizations worse than the average. For example, on average some constraint will be satisfied but it will be violated for many realizations.

  • The problem with worst-case programming is that it is too conservative as one deals with the worst possible case.

  • Chance programming tries to find a compromise. In particular, it also models the estimation errors statistically but instead of focusing on the average it guarantees a performance for, say, 95% of the cases.

  • The naive constraint \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\leq0\) is replaced with \(\mathsf{Pr}_{\boldsymbol{\theta}}\left[f(\mathbf{x};\boldsymbol{\theta})\leq0\right]\geq1-\epsilon=0.95\) with \(\epsilon=0.05\).

  • Chance or probabilistic constraints are generally very hard to deal with and one typically has to resort to approximations.

Markowitz Portfolio Optimization

Markowitz mean-variance portfolio (1952)

  • The idea of the Markowitz mean-variance portfolio (MVP) (Markowitz 1952) is to find a trade-off between the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the risk of the portfolio measured by the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1 \end{array}\] where \(\mathbf{w}^{T}\mathbf{1}=1\) is the capital budget constraint and \(\lambda\) is a parameter that controls how risk-averse the investor is.

  • This is a convex quadratic problem (QP) with only one linear constraint which admits a closed-form solution: \[\mathbf{w}_{\sf MVP} = \frac{1}{2\lambda}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{\mu}+\nu\mathbf{1}\right),\] where \(\nu\) is the optimal dual variable \(\nu=\frac{2\lambda-\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\).

Practical constraints

  • Capital budget constraint: \[\mathbf{1}^T\mathbf{w} = 1.\]

  • Long-only constraint: \[\mathbf{w} \geq 0.\]

  • Dollar-neutral or self-financing constraint: \[\mathbf{1}^T\mathbf{w} = 0.\]

  • Holding constraint: \[\mathbf{l}\leq\mathbf{w}\leq \mathbf{u}\] where \(\mathbf{l}\in\mathbb{R}^{N}\) and \(\mathbf{u}\in\mathbb{R}^{N}\) are lower and upper bounds of the asset positions, respectively.

Practical constraints

  • Leverage constraint: \[\left\Vert \mathbf{w}\right\Vert _{1}\leq L.\]

  • Cardinality constraint: \[\left\Vert \mathbf{w}\right\Vert _{0} \leq K.\]

  • Turnover constraint: \[\left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1} \leq u\] where \(\mathbf{w}_{0}\) is the currently held portfolio.


  • Market-neutral constraint: \[\boldsymbol{\beta}^T\mathbf{w} = 0.\]

Drawbacks of Markowitz’s formulation

Markowitz’s portfolio has never been fully embraced by practitioners, among other reasons because

  1. variance is not a good measure of risk in practice since it penalizes both the unwanted high losses and the desired low losses: the solution is to use alternative measures for risk, e.g., VaR and CVaR,

  2. it is highly sensitive to parameter estimation errors (of covariance matrix \(\boldsymbol{\Sigma}\) and especially mean vector \(\boldsymbol{\mu}\)): solution is robust optimization (Fabozzi 2007) and improved parameter estimation,

  3. it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis): solution is the risk parity portfolio.



👉 We will here consider robust portfolio optimization against estimation errors of the parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).

Mean-variance tradeoff

Efficient frontier: mean-variance trade-off curve (Pareto curve) but it is not achieved in practice due to parameter estimation errors:

R session: Loading market data

We will load some stock market data and divide it into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation).

In particular, we will start by loading some stock data from three different sectors:

library(xts)  # to manipulate time series of stock data
library(quantmod)  # to download stock data
library(PerformanceAnalytics)  # to compute performance measures

# download data from YahooFinance
stock_namelist <- c("AAPL", "AMD", "ADI",  "ABBV", "AEZS", "A",  "APD", "AA","CF")
prices <- xts()
for (i in 1:length(stock_namelist)) {
  tmp <- Ad(getSymbols(stock_namelist[i], from = "2013-01-01", to = "2018-12-31", auto.assign = FALSE))
  tmp <- na.approx(tmp, na.rm = FALSE)  # interpolate NAs
  prices <- cbind(prices, tmp)
}
colnames(prices) <- stock_namelist
indexClass(prices) <- "Date"
str(prices)
R>> An 'xts' object on 2013-01-02/2016-12-30 containing:
R>>   Data: num [1:1008, 1:9] 55.5 54.8 53.2 52.9 53.1 ...
R>>  - attr(*, "dimnames")=List of 2
R>>   ..$ : NULL
R>>   ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ...
R>>   Indexed by objects of class: [Date] TZ: UTC
R>>   xts Attributes:  
R>>  NULL
head(prices)
R>>                AAPL  AMD      ADI     ABBV AEZS        A      APD       AA       CF
R>> 2013-01-02 55.47129 2.53 37.50844 27.60817  253 27.97831 66.43760 20.62187 29.44429
R>> 2013-01-03 54.77112 2.49 36.90318 27.38020  254 28.07852 66.20545 20.80537 29.30506
R>> 2013-01-04 53.24548 2.59 36.24679 27.03431  257 28.63300 67.09529 21.24121 29.96146
R>> 2013-01-07 52.93227 2.67 36.35761 27.08934  259 28.42591 67.03341 20.87419 29.84922
R>> 2013-01-08 53.07474 2.67 35.98254 26.49976  255 28.19877 67.15720 20.87419 29.41162
R>> 2013-01-09 52.24524 2.63 35.88875 26.64913  258 28.96036 68.06254 20.82831 30.44025
tail(prices)
R>>                AAPL   AMD      ADI     ABBV AEZS        A      APD    AA       CF
R>> 2016-12-22 111.8445 11.60 70.07727 55.82264 4.10 44.91877 136.0360 29.75 27.82244
R>> 2016-12-23 112.0657 11.58 70.44936 56.43826 4.10 45.14350 136.3992 29.71 28.37686
R>> 2016-12-27 112.7774 12.07 70.89778 56.58311 4.05 45.44642 137.2373 29.65 29.52265
R>> 2016-12-28 112.2965 11.55 70.18222 56.37488 3.55 44.67448 135.1328 29.43 29.22696
R>> 2016-12-29 112.2676 11.59 70.20129 56.79134 3.60 44.72544 135.2359 28.89 29.47645
R>> 2016-12-30 111.3924 11.34 69.28539 56.69175 3.60 44.64704 134.7206 28.08 29.08836
# compute data matrix with log-returns
X <- diff(log(prices))[-1]
N <- ncol(X)  # number of stocks
T <- nrow(X)  # number of days

We can take a look at the prices of the stocks:

plot(prices/rep(prices[1, ], each = nrow(prices)), col = rainbow10equal, legend.loc = "topleft",
     main = "Normalized prices")

We can now use obtain the sample estimates from the returns \(\mathbf{x}_t\) (i.e., sample means and sample covariance matrix) as \[ \begin{align} \hat{\boldsymbol{\mu}} & = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t\\ \hat{\boldsymbol{\Sigma}} & = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})(\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T \end{align} \]

mu <- colMeans(X)
Sigma <- cov(X)

Now that we have some realistic parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\), we will design a portfolio with some small noise in the parameters to observe the effect.

R session: Sensitivity of Global Maximum Return Portfolio (GMRP)

Let’s start with the global maximum return portfolio (GMRP) formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] Of course the optimal solution is to allocate all the budget on the single stock with largest expected return, not very diversified. We can still implement it using a solver:

library(CVXR)

portfolioMaxReturn <- function(mu) {
  w <- Variable(length(mu))
  prob <- Problem(Maximize(t(w) %*% mu), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

w_GMRP <- portfolioMaxReturn(mu)
names(w_GMRP) <- colnames(X)
w_all_GMRP <- cbind(w_GMRP)

# plot allocation
barplot(t(w_all_GMRP), col = rainbow8equal[1], legend = colnames(w_all_GMRP), beside = TRUE,
        main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

As expected, all the budget is allocated to a single stock.

Let’s now introduce some estimation error in \(\boldsymbol{\mu}\) to see the effect:

library(mvtnorm)

# generate a few noisy estimations of mu
set.seed(357)
for (i in 1:6) {
  mu_noisy <- as.vector(rmvnorm(n = 1, mean = mu, sigma = (1/T)*Sigma))
  w_GMRP_noisy <- portfolioMaxReturn(mu_noisy)
  w_all_GMRP <- cbind(w_all_GMRP, w_GMRP_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMRP), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP), beside = TRUE,
        main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

We can see that the optimal allocation changes totally from realization to realization, which is highly undesirable (in fact, it is totally unacceptable).

R session: Sensitivity of Global Minimum Variance Portfolio (GMVP)

Let’s continue with the GMVP formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]

portfolioGMVP <- function(Sigma) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Minimize(quad_form(w, Sigma)), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

w_GMVP <- portfolioGMVP(Sigma)
names(w_GMVP) <- colnames(X)
w_all_GMVP <- cbind(w_GMVP)

# plot allocation
barplot(t(w_all_GMVP), col = rainbow8equal[1], legend = colnames(w_all_GMVP), beside = TRUE,
        main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")

The GMVP is totally different from the maximum return portfolio. It’s quite the opposite: the budget is allocated to most of the stocks to diversity the risk.

Let’s now introduce some estimation error in \(\boldsymbol{\Sigma}\) to see the effect:

library(PerformanceAnalytics)
# generate a few noisy estimations of Sigma
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = rep(0, N), sigma = Sigma)
  Sigma_noisy <- cov(X_noisy)
  
  w_GMVP_noisy <- portfolioGMVP(Sigma_noisy)
  w_all_GMVP <- cbind(w_all_GMVP, w_GMVP_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMVP), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP), beside = TRUE,
        main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMVP), main = "GMVP allocation", ylab = "w", space = 0, border = NA)

We can see that the optimal allocation still changes from realization to realization, although in this case is not that extreme.

R session: Sensitivity of Markowitz’s mean-variance portfolio

Finally, let’s take a look at Markowitz’s mean-variance portfolio formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w} -\lambda\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]

portfolioMarkowitz <- function(mu, Sigma, lmd = 0.5) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)),
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

w_Markowitz <- portfolioMarkowitz(mu, Sigma)
names(w_Markowitz) <- colnames(X)
w_all_Markowitz <- cbind(w_Markowitz)

# plot allocation
barplot(t(w_all_Markowitz), col = rainbow8equal[1], legend = colnames(w_all_Markowitz), beside = TRUE,
        main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")

The mean-variance portfolio seems to be not very diversified. This is one of the reasons why practitioners do not use it. The other reason is the sensitivity or lack of robustness to the estimation of the parameters as we will explore next.

# generate a few noisy estimations of mu and Sigma
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_Markowitz_noisy <- portfolioMarkowitz(mu_noisy, Sigma_noisy)
  w_all_Markowitz <- cbind(w_all_Markowitz, w_Markowitz_noisy)
}

# plot to compare the allocations
barplot(t(w_all_Markowitz), col = rainbow8equal[1:7], legend = colnames(w_all_Markowitz), beside = TRUE,
        main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_Markowitz), main = "Markowitz portfolio allocation", ylab = "w", space = 0, border = NA)

Again, the allocations are totally different from realization to realization. Totally unacceptable. Who is going to trust these allocations to invest their money?

Robust Global Maximum Return Portfolio Optimization

Global maximum return portfolio (GMRP)

  • The portfolio that maximizes the return (while ignoring the variance) is the linear program (LP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

  • In practice, however, \(\boldsymbol{\mu}\) is unknown and has to be estimated \(\hat{\boldsymbol{\mu}}\), e.g., with the sample mean \[\hat{\boldsymbol{\mu}}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}_{t},\] where \(\mathbf{x}_{t}\) is the return at day \(t\).


  • Unfortunately, it is well known that the estimation of \(\boldsymbol{\mu}\) is extremely noisy in practice (Chopra and Ziemba 1993).

Worst-case robust GMRP

  • Instead of assuming that \(\boldsymbol{\mu}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\mu}}\).

  • The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]

  • We assume the expected returns are only known within an ellipsoid: \[\mathcal{U}_{\boldsymbol{\mu}} = \left\{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\mid\left\Vert \mathbf{u}\right\Vert _{2}\leq1\right\}\] where one can use the estimated covariance matrix to shape the uncertainty ellipsoid, i.e., \(\mathbf{S}=\hat{\boldsymbol{\Sigma}}\).

Worst-case robust GMRP

  • We can solve easily the inner minimization: \[\begin{array}{ll} \underset{\boldsymbol{\mu},\mathbf{u}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u},\\ & \left\Vert \mathbf{u}\right\Vert _{2}\leq1. \end{array}\]

  • It’s easy to find the minimum value using Cauchy-Schwartz’s inequality: \[\begin{aligned} \mathbf{w}^{T}\boldsymbol{\mu} &= \mathbf{w}^{T}\left(\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\right)\\ &= \mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\kappa\mathbf{w}^{T}\mathbf{S}^{1/2}\mathbf{u}\\ &\geq \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2} \end{aligned}\] with equality achieved with \(\mathbf{u}=-\frac{\mathbf{S}^{1/2}\mathbf{w}}{\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}}\).

Worst-case robust GMRP

  • Finally, the robust formulation becomes the SOCP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]

  • Recall the vanilla problem formulation was the LP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

  • So, we have gone from an LP to an SOCP.

  • In general, when a problem is robustified, the complexity of the problem increases. For example:

    • LP becomes SOCP
    • QP also becomes SOCP
    • SOCP becomes SDP.

R session: Robust optimization of global maximum return portfolio

The robust formulation of the global maximum return portfolio (GMRP) reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min} \mathbf{w}^T\boldsymbol{\mu}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

  • For an elliptical uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^e = \{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\ | \ \left\Vert \mathbf{u}\right\Vert_{2}\leq 1\}, \] where typically \(\mathbf{S}=\hat{\mathbf{\Sigma}}\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

  • For a box uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^b = \{\boldsymbol{\mu}\ | -\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\}, \] where typically \(\delta_i = \kappa\hat{\sigma}_i\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - |\mathbf{w}|^T\boldsymbol{\delta}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

Let’s start implementing the robust solution for the box constraint:

portfolioMaxReturnRobustBox <- function(mu_hat, delta) {
  w <- Variable(length(mu_hat))
  prob <- Problem(Maximize( t(w) %*% mu_hat - t(abs(w)) %*% delta ), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMRP <- portfolioMaxReturn(mu)
names(w_GMRP) <- colnames(X)
w_all_GMRP_robust_box <- cbind(w_GMRP)

# multiple robust solutions
kappa <- 0.1
delta <- kappa*sqrt(diag(Sigma_noisy))
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_GMRP_robust_box_noisy <- portfolioMaxReturnRobustBox(mu_noisy, delta)
  w_all_GMRP_robust_box <- cbind(w_all_GMRP_robust_box, w_GMRP_robust_box_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMRP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_box), 
        beside = TRUE, args.legend = list(bg = "white"),
        main = "Robust (box) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

Not much improvement with respect to the naive case. The sensitivity is still there. In fact, for the particular case of \(\boldsymbol{\delta}=\mathbf{1}\), it follows that \(|\mathbf{w}|^T\boldsymbol{\delta}=\mathbf{w}^T\mathbf{1}=1\)!

Let’s try the robust solution for the elliptical constraint:

portfolioMaxReturnRobustEllipsoid <- function(mu_hat, S, kappa = 0.1) {
  S12 <- chol(S)  # t(S12) %*% S12 = Sigma
  w <- Variable(length(mu_hat))
  prob <- Problem(Maximize( t(w) %*% mu_hat - kappa*norm2(S12 %*% w) ), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMRP <- portfolioMaxReturn(mu)
names(w_GMRP) <- colnames(X)
w_all_GMRP_robust_ellipsoid <- cbind(w_GMRP)

# multiple robust solutions
kappa <- 0.2
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_GMRP_robust_ellipsoid_noisy <- portfolioMaxReturnRobustEllipsoid(mu_noisy, Sigma_noisy, kappa)
  w_all_GMRP_robust_ellipsoid <- cbind(w_all_GMRP_robust_ellipsoid, w_GMRP_robust_ellipsoid_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMRP_robust_ellipsoid), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_ellipsoid), 
        beside = TRUE, args.legend = list(bg = "white"),
        main = "Robust (ellipsoid) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMRP_robust_ellipsoid), 
                 main = "Robust (ellipsoid) global maximum return portfolio allocation", 
                 ylab = "w", space = 0, border = NA)

The elliptical uncertainty set seems to work better for the maximum return portfolio. It definitely provides more diversity compared to the naive solution, as well as being more stable (less sensitive). However, it is still too sensitive.

Summarizing, for the robust global maximum return portfolio, it seems that the ellipsoid uncertainty in \(\boldsymbol{\mu}\) is more stable (and diverse) than the box uncertainly (which is totally unstable). Nevertheless, the allocation is still too sensitive owing to the fact that the global maximum return portfolio is probably not a good formulation as it ignores the covariance matrix.

Robust Global Minimum Variance Portfolio Optimization

Global Minimum Variance Portfolio (GMVP)

  • The global minimum variance portfolio (GMVP) ignores the expected return and focuses on the risk only: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

  • It is a simple convex QP with solution \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}.\]

  • It is widely used in academic papers for simplicity of evaluation and comparison of different estimators of the covariance matrix \(\boldsymbol{\Sigma}\) (while ignoring the estimation of \(\boldsymbol{\mu}\)).

  • In practice, \(\boldsymbol{\Sigma}\) is unknown and has to be estimated \(\hat{\boldsymbol{\Sigma}}\), e.g., with the sample covariance matrix.

  • Then, the naive portfolio becomes \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}.\]

Worst-case robust GMVP

  • Instead of assuming that \(\boldsymbol{\Sigma}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\Sigma}}\).

  • The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]

  • In particular, we will assume that the estimation comes from the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\) where \(\mathbf{X}\) is a \(T\times N\) matrix containing the return data (assumed demeaned already).

  • However, we will assume that the data matrix is noisy \(\hat{\mathbf{X}}\) and the actual matrix can be written as \(\mathbf{X}=\hat{\mathbf{X}}+\boldsymbol{\Delta}\), where \(\boldsymbol{\Delta}\) is some error matrix bounded in its norm.

  • Thus, we will then model the data matrix as \[\mathcal{U}_{\mathbf{X}} =\left\{\mathbf{X}\mid\left\Vert\mathbf{X}-\hat{\mathbf{X}}\right\Vert_{F}\leq\delta_{\mathbf{X}}\right\}.\]

Worst-case robust GMVP

  • The worst-case robust formulation becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\mathbf{w}^{T}\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]

  • Let’s focus on the inner maximization: \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\underset{\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\delta_{\mathbf{X}}}{\max}\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}^{2}\]

  • We first invoke the triangle inequality to get an upper bound: \[\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}\leq\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\] with equality achieved when the two vectors \(\hat{\mathbf{X}}\mathbf{w}\) and \(\boldsymbol{\Delta}\mathbf{w}\) are aligned.

  • Next, we invoke the norm inequality \[\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\leq\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\left\Vert \mathbf{w}\right\Vert _{2}\leq\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\] with equality achieved when \(\boldsymbol{\Delta}\) is rank-one with right singular vector aligned with \(\mathbf{w}\) and when \(\left\Vert \boldsymbol{\Delta}\right\Vert _{F}=\delta_{\mathbf{X}}\). (This follows easily from \(\mathbf{w}^{T}\mathbf{M}\mathbf{w}\leq\lambda_{\max}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\leq\mathsf{Tr}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\) for \(\mathbf{M}\succeq\mathbf{0}\).)

Worst-case robust GMVP

  • Finally, we can see that both upper bounds can be actually achieved if the error is properly chosen as \[\boldsymbol{\Delta}=\delta_{\mathbf{X}}\frac{\hat{\mathbf{X}}\mathbf{w}\mathbf{w}^{T}}{\left\Vert \mathbf{w}\right\Vert _{2}\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}}.\]

  • Thus, \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\left(\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\right)^{2}.\]

  • The robust problem formulation finally becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is a (convex) SOCP.

Worst-case robust GMVP

  • Recall the vanilla problem formulation was the QP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\]

  • Now, the robust problem formulation is the SOCP (from QP to SOCP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which contains the regularization term \(\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\).

  • One common heuristic, called Tikhonov regularization, is to consider instead \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}^{2}=\mathbf{w}^{T}\left(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I}\right)\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is equivalent to the vanilla formulation but using the regularized sample covariance matrix \(\hat{\boldsymbol{\Sigma}}^{\mathsf{tik}}=\frac{1}{T}(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I})=\hat{\boldsymbol{\Sigma}}+\frac{\delta_{\mathbf{X}}}{T}\mathbf{I}\).

R session: Robust optimization of GMVP

The robust formulation of the GMVP reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max} \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

  • For a spherical uncertainty set on the data matrix \(\mathbf{X}\) (assuming we use the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T-1}\mathbf{X}^T\mathbf{X}\)) \[ \mathcal{U}_\mathbf{X}^s = \{\mathbf{X}\ | \ \|\mathbf{X}-\hat{\mathbf{X}}\|_F \le \delta\}, \] the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \|\hat{\mathbf{X}}\mathbf{w}\|_2 + \delta \|\mathbf{w}\|_2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

  • For a box uncertainty set on the covariance matrix \[ \mathcal{U}_\boldsymbol{\Sigma}^b = \{\boldsymbol{\Sigma}\ | \ \underline{\boldsymbol{\Sigma}} \le \boldsymbol{\Sigma} \le \overline{\boldsymbol{\Sigma}},\ \boldsymbol{\Sigma} \succeq \mathbf{0}\}, \] where a choice for the bounds is \(\underline{\boldsymbol{\Sigma}}=\delta^{-1}\hat{\boldsymbol{\Sigma}}\) and \(\overline{\boldsymbol{\Sigma}}=\delta\hat{\boldsymbol{\Sigma}}\) with \(\delta>1\), the robust formulation becomes

\[ \begin{array}{ll} \underset{\mathbf{w},\underline{\boldsymbol{\Lambda}},\overline{\boldsymbol{\Lambda}}}{\textsf{minimize}} & \textsf{Tr}(\overline{\boldsymbol{\Lambda}}\ \overline{\boldsymbol{\Sigma}}) - \textsf{Tr}(\underline{\boldsymbol{\Lambda}}\ \underline{\boldsymbol{\Sigma}})\\ \textsf{subject to} & \left[\begin{array}{cc}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1\end{array} \right]\succeq\mathbf{0}\\ & \overline{\boldsymbol{\Lambda}}\ge\mathbf{0},\ \underline{\boldsymbol{\Lambda}}\ge\mathbf{0}\\ & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

  • For an elliptical uncertainty set on the covariance matrix \[ \mathcal{U}_{\boldsymbol{\Sigma}}^{e} = \{\boldsymbol{\Sigma}\ |\ (\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}}))^{T}\mathbf{S}_{\boldsymbol{\Sigma}}^{-1}(\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}}))\leq\delta^{2},\ \boldsymbol{\Sigma}\succeq\mathbf{0}\}, \] where one simple choice for \(\mathbf{S}_{\boldsymbol{\Sigma}}\) is the identity matrix, the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w},\mathbf{X},\mathbf{Z}}{\textsf{minimize}} & \textsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{X}+\mathbf{Z}\right)\right) + \delta\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\textsf{vec}(\mathbf{X})+\textsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad \mathbf{w}\ge\mathbf{0}\\ & \left[\begin{array}{cc}\mathbf{X} & \mathbf{w}\\ \mathbf{w}^{T} & 1\end{array} \right]\succeq\mathbf{0}\\ & \mathbf{Z}\succeq\mathbf{0}. \end{array} \]

Let’s start implementing the robust formulation for the box constraint on \(\boldsymbol{\Sigma}\):

# define function (check: https://cvxr.rbind.io/cvxr_functions/)
portfolioGMVPRobustBox <- function(Sigma_lb, Sigma_ub) {
  N <- nrow(Sigma_lb)
  w         <- Variable(N)
  Lambda_ub <- Variable(N, N)
  Lambda_lb <- Variable(N, N)
  Tmp       <- Variable(N+1, N+1, PSD = TRUE)
  prob <- Problem(Minimize(matrix_trace(Lambda_ub %*% Sigma_ub) - matrix_trace(Lambda_lb %*% Sigma_lb)),
                  constraints = list(w >= 0, sum(w) == 1,
                                     Lambda_ub >= 0, Lambda_lb >= 0,
                                     bmat(list(list(Lambda_ub - Lambda_lb, w),
                                               list(t(w),                  1))) == Tmp))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMVP <- portfolioGMVP(Sigma)
names(w_GMVP) <- colnames(X)
w_all_GMVP_robust_box <- cbind(w_GMVP)

# multiple robust solutions
delta <- 1.1
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  Sigma_lb <- ifelse(Sigma_noisy >= 0,  (1/delta)*Sigma_noisy, delta*Sigma_noisy)
  Sigma_ub <- ifelse(Sigma_noisy >= 0,  delta*Sigma_noisy, (1/delta)*Sigma_noisy)
  w_GMVP_robust_box_noisy <- portfolioGMVPRobustBox(Sigma_lb, Sigma_ub)
  w_all_GMVP_robust_box <- cbind(w_all_GMVP_robust_box, w_GMVP_robust_box_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMVP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_box), 
        beside = TRUE, args.legend = list(bg = "white", x = "topleft"),
        main = "Robust (box) GMVP allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)

chart.StackedBar(t(w_all_GMVP_robust_box), 
                 main = "Robust (box) GMVP allocation", ylab = "w", space = 0, border = NA)

The naive GMVP was already kind of stable. Nevertheless, the robust GMVP becomes even more stable (less sensitive to the estimation errors). Better results could be obtained by choosing more appropriate upper and lower bounds on \(\boldsymbol{\Sigma}\).

Let’s consider now the robust formulation for the spherical uncertainty region on the data matrix \(\mathbf{X}\):

# define function
portfolioGMVPRobustSphereX <- function(X_hat, delta) {
  N <- ncol(X_hat)
  w <- Variable(N)
  X_ <- scale(X_hat, center = TRUE, scale = FALSE)  # demean
  prob <- Problem(Minimize(norm2(X_ %*% w) + delta*norm2(w)),
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMVP <- portfolioGMVP(Sigma)
names(w_GMVP) <- colnames(X)
w_all_GMVP_robust_sphereX <- cbind(w_GMVP)

# multiple robust solutions
delta <- 0.1
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)

  w_GMVP_robust_sphereX_noisy <- portfolioGMVPRobustSphereX(X_noisy, delta)
  w_all_GMVP_robust_sphereX <- cbind(w_all_GMVP_robust_sphereX, w_GMVP_robust_sphereX_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMVP_robust_sphereX), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_sphereX), 
        beside = TRUE, args.legend = list(bg = "white", x = "topleft"),
        main = "Robust (sphere in X) GMVP allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)

chart.StackedBar(t(w_all_GMVP_robust_sphereX), 
                 main = "Robust (sphere in X) GMVP allocation", ylab = "w", space = 0, border = NA)