Robust Portfolio Optimization

MAFS5310 - Portfolio Optimization with R
MSc in Financial Mathematics
The Hong Kong University of Science and Technology (HKUST)
Fall 2020-21

Outline

Robust Optimization
Markowitz Portfolio Optimization
Robust Maximum Return Portfolio Optimization
Robust Minimum Variance Portfolio Optimization
Robust Markowitz Portfolio Optimization
Robust Sharpe Ratio Portfolio Optimization
Summary

Robust Optimization

Convex optimization

A convex optimization problem is written as \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & f_{0}\left(\mathbf{x}\right)\\ \textsf{subject to} & f_{i}\left(\mathbf{x}\right)\leq0\qquad i=1,\ldots,m\\ & \mathbf{h}\left(\mathbf{x}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}=0 \end{array} \] where \(f_{0},\,f_{1},\ldots,f_{m}\) are convex and equality constraints are affine.
Convex problems enjoy a rich theory (KKT conditions, zero duality gap, etc.) as well as a large number of efficient numerical algorithms guaranteed to deliver an optimal solution \(\mathbf{x}^{\star}\).
Many off-the-shelf solvers exist in all the programming languages (e.g., R, Python, Matlab, Julia, C, etc.), tailored to specific classes of problems, namely, LP, QP, QCQP, SOCP, SDP, GP, etc.

Useless in practice!

However, the obtained optimal solution \(\mathbf{x}^{\star}\) typically performs very poorly in practice.
In many cases, it can be totally useless!
Why is that?
Recall that a problem formulation contains not only the optimization variables \(\mathbf{x}\) but also the parameters \(\boldsymbol{\theta}\).
Such parameters define the problem instance and are typically estimated in practice, i.e., they are not exact: \(\hat{\boldsymbol{\theta}}\neq\boldsymbol{\theta}\) but hopefully close \(\hat{\boldsymbol{\theta}}\simeq\boldsymbol{\theta}\).
The question is whether a small error in the parameters is going to be detrimental or can be ignored. That depends on each particular type of problem.
In the case of portfolio optimization, small errors in the parameters \(\boldsymbol{\theta}=\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right)\) happen to have a huge effect in the solution \(\mathbf{x}^{\star}\). To the point that most practitioners avoid the use of portfolio optimization!

Parameters: \(\boldsymbol{\theta}\)

To make explicit the fact that the functions depend on parameters \(\boldsymbol{\theta}\), we can explicitly write \(f_{i}\left(\mathbf{x};\boldsymbol{\theta}\right)\) and \(h_{i}\left(\mathbf{x};\boldsymbol{\theta}\right)\).

For example, consider an LP: \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & \mathbf{c}^{T}\mathbf{x}+\mathbf{d}\\ \textsf{subject to} & \mathbf{A}\mathbf{x}=\mathbf{b}. \end{array} \]
The parameters are \(\boldsymbol{\theta}=\left(\mathbf{A},\mathbf{b},\mathbf{c},\mathbf{d}\right)\).
The objective function is \(f_{0}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{c}^{T}\mathbf{x}+\mathbf{d}\)
The constraint function is \(\mathbf{h}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}\)

In practice, we only have an estimation \(\hat{\boldsymbol{\theta}}\). So the problem can only be formulated and solved using \(\hat{\boldsymbol{\theta}}\) obtaining the solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}}\)), which is different from the desired one \(\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\).

Robust optimization

The naive approach is to pretend that \(\hat{\boldsymbol{\theta}}\) is close enough to \(\boldsymbol{\theta}\) and solve the approximated problem, obtaining \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).
For some type of problems, it may be that \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\approx\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\) and that’s it.
For many other problems, however, that’s not the case. So we cannot really rely on the naive solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).
The solution is to consider instead a robust formulation that takes into account the fact that we know we only have an estimation of the parameters.
There are several ways to make the problem robust to parameters errors, mainly:
- stochastic robust optimization (involving expectations)
- worst-case robust optimization
- chance programming or chance robust optimization.

Taxonomy of robust optimization

Stochastic optimization (SO): this includes expectations as well as chance constraints (requires probabilistic modeling of the parameter):
📘 J. R. Birge and F. V. Louveaux. Introduction to Stochastic Programming. Springer, 2011.
📘 A. P. Ruszczynski and A. Shapiro. Stochastic Programming. Elsevier, 2003.
📘 A. Prekopa. Stochastic Programming. Kluwer Academic Publishers, 1995.

Robust optimization (RO): this includes the worst-case approach (requires definition of hard uncertainty set for the parameter):
📘 A. Ben-Tal, L. E. Ghaoui, and A. Nemirovski. Robust Optimization. Princeton University Press, 2009.
📄 A. Ben-Tal and A. Nemirovski, “Selected topics in robust convex optimization”, Mathematical Programming, 112 (1), 2008.
📄 D. Bertsimas, D. B. Brown, and C. Caramanis, “Theory and applications of robust optimization”, SIAM Review, 53 (3), 2011.
📄 M. S. Lobo. Robust and convex optimization with applications in finance. PhD thesis, Stanford University, 2000.

Stochastic optimization: Expectations

In stochastic robust optimization, one models the estimation \(\hat{\boldsymbol{\theta}}\) as a random variable that fluctuates around its true value \(\boldsymbol{\theta}\).
Then, instead of considering the approximated function \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\), it uses its expected value \(E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})]\), where \(E_{\boldsymbol{\theta}}[\cdot]\) denotes expectation over the random variable \(\boldsymbol{\theta}\).
The random variable is typically modeled around the estimated value as \(\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) following a zero-mean distribution such as Gaussian.
For example, if the function is quadratic, say, \(f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\), and we model the parameter as \(\mathbf{c}=\hat{\mathbf{c}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) zero-mean and covariance matrix \(\mathbf{Q}\), then the expected value is \[\begin{aligned} E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})] &= E_{\boldsymbol{\delta}}[((\hat{\mathbf{c}}+\boldsymbol{\delta})^{T}\mathbf{x})^{2}]\\ &= E_{\boldsymbol{\delta}}[\mathbf{x}^{T}\hat{\mathbf{c}}\hat{\mathbf{c}}^{T}\mathbf{x}+\mathbf{x}^{T}\boldsymbol{\delta}\boldsymbol{\delta}^{T}\mathbf{x}]\\ &= (\hat{\mathbf{c}}^{T}\mathbf{x})^{2}+\mathbf{x}^{T}\mathbf{Q}\mathbf{x} \end{aligned}\] where the additional term \(\mathbf{x}^{T}\mathbf{Q}\mathbf{x}\) serves as a regularizer.

Worst-case robust optimization

In worst-case robust optimization, the parameter is not characterized statistically. Instead, it is assumed that the true parameter lies in an uncertainty region centered around the estimated value: \(\boldsymbol{\theta}\in\mathcal{U}\).

The uncertainty region can be chosen depending on the problem. Typical choices include:
- sphere region: \[\mathcal{U} =\{\boldsymbol{\theta}|\parallel\boldsymbol{\theta}-\hat{\boldsymbol{\theta}}\parallel_{2})\leq\delta\}\]
- box region: \[\mathcal{U} =\{\boldsymbol{\theta}|\parallel\boldsymbol{\theta}-\hat{\boldsymbol{\theta}}\parallel_{\infty})\leq\delta\}\]
- elliptical region: \[\mathcal{U} =\{\boldsymbol{\theta}|(\boldsymbol{\theta}-\hat{\boldsymbol{\theta}})^{T}\mathbf{S}^{-1}(\boldsymbol{\theta}-\hat{\boldsymbol{\theta}}))\leq\delta^{2}\}\] where \(\mathbf{S}\succ\mathbf{0}\) defines the shape of the ellipsoid.

Worst-case robust optimization: Example

Take the previous quadratic function \[f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\] and consider a sphere uncertainty region \[\mathcal{U}=\{\mathbf{c}|\parallel\mathbf{c}-\hat{\mathbf{c}}\parallel_{2})\leq\delta\}.\]

If the function is the objective to be minimized or it is a constraint of the form \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\leq0\), then the worst-case value of that function is \[\begin{aligned} \underset{\mathbf{c}\in\mathcal{U}}{\max}\;\left|\mathbf{c}^{T}\mathbf{x}\right| & = \underset{\left\Vert \mathbf{e}\right\Vert\leq\delta}{\max}\left|(\hat{\mathbf{c}}+\mathbf{e})^{T}\mathbf{x}\right|\\ & \leq\underset{\left\Vert \mathbf{e}\right\Vert \leq\delta}{\max}\left|\hat{\mathbf{c}}^{T}\mathbf{x}\right| + \left|\mathbf{e}^{T}\mathbf{x}\right|\\ & \leq \left|\hat{\mathbf{c}}^{T}\mathbf{x}\right| + \delta\left\Vert\mathbf{x}\right\Vert \end{aligned}\] with upper bound achieved by \(\mathbf{e}=\frac{\mathbf{x}}{\left\Vert \mathbf{x}\right\Vert }\delta\).

Worst-case robust optimization: Example

If the function is the objective to be maximized or it is a constraint of the form \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\geq0\), then the worst-case value of that function is \[\begin{aligned} \underset{\mathbf{c}\in\mathcal{U}}{\min}\;\left|\mathbf{c}^{T}\mathbf{x}\right| & = \underset{\left\Vert \mathbf{e}\right\Vert \leq\delta}{\min}\left|(\hat{\mathbf{c}}+\mathbf{e})^{T}\mathbf{x}\right|\\ & \geq \underset{\left\Vert \mathbf{e}\right\Vert \leq\delta}{\min}\left|\hat{\mathbf{c}}^{T}\mathbf{x}\right| - \left|\mathbf{e}^{T}\mathbf{x}\right|\\ & \geq \left|\hat{\mathbf{c}}^{T}\mathbf{x}\right|-\delta\left\Vert\mathbf{x}\right\Vert \end{aligned}\] with lower bound achieved by \(\mathbf{e}=-\frac{\mathbf{x}}{\left\Vert \mathbf{x}\right\Vert }\delta\).

Stochastic optimization: Chance constraints

The problem with expectations is that only the average behavior is concerned and nothing is under control about the realizations worse than the average. For example, on average some constraint will be satisfied but it will be violated for many realizations.
The problem with worst-case programming is that it is too conservative as one deals with the worst possible case.
Chance programming tries to find a compromise. In particular, it also models the estimation errors statistically but instead of focusing on the average it guarantees a performance for, say, 95% of the cases.
The naive constraint \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\leq0\) is replaced with \(\mathsf{Pr}_{\boldsymbol{\theta}}\left[f(\mathbf{x};\boldsymbol{\theta})\leq0\right]\geq1-\epsilon=0.95\) with \(\epsilon=0.05\).
Chance or probabilistic constraints are generally very hard to deal with and one typically has to resort to approximations.

Markowitz Portfolio Optimization

Markowitz mean-variance portfolio (1952)

The idea of the Markowitz mean-variance portfolio (MVP) (Markowitz 1952) is to find a trade-off between the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the risk of the portfolio measured by the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1 \end{array}\] where \(\mathbf{w}^{T}\mathbf{1}=1\) is the capital budget constraint and \(\lambda\) is a parameter that controls how risk-averse the investor is.
This is a convex quadratic problem (QP) with only one linear constraint which admits a closed-form solution: \[\mathbf{w}_{\sf MVP} = \frac{1}{2\lambda}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{\mu}+\nu\mathbf{1}\right),\] where \(\nu\) is the optimal dual variable \(\nu=\frac{2\lambda-\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\).

Practical constraints

Capital budget constraint: \[\mathbf{1}^T\mathbf{w} = 1.\]
Long-only constraint: \[\mathbf{w} \geq 0.\]
Dollar-neutral or self-financing constraint: \[\mathbf{1}^T\mathbf{w} = 0.\]
Holding constraint: \[\mathbf{l}\leq\mathbf{w}\leq \mathbf{u}\] where \(\mathbf{l}\in\mathbb{R}^{N}\) and \(\mathbf{u}\in\mathbb{R}^{N}\) are lower and upper bounds of the asset positions, respectively.

Practical constraints

Leverage constraint: \[\left\Vert \mathbf{w}\right\Vert _{1}\leq L.\]
Cardinality constraint: \[\left\Vert \mathbf{w}\right\Vert _{0} \leq K.\]
Turnover constraint: \[\left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1} \leq u\] where \(\mathbf{w}_{0}\) is the currently held portfolio.

Market-neutral constraint: \[\boldsymbol{\beta}^T\mathbf{w} = 0.\]

Drawbacks of Markowitz’s formulation

Markowitz’s portfolio has never been fully embraced by practitioners, among other reasons because

variance is not a good measure of risk in practice since it penalizes both the unwanted high losses and the desired low losses: the solution is to use alternative measures for risk, e.g., VaR and CVaR,
it is highly sensitive to parameter estimation errors (of covariance matrix \(\boldsymbol{\Sigma}\) and especially mean vector \(\boldsymbol{\mu}\)): solution is robust optimization (Fabozzi 2007) and improved parameter estimation,
it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis): solution is the risk parity portfolio.

👉 We will here consider robust portfolio optimization against estimation errors of the parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).

Mean-variance tradeoff

Efficient frontier: mean-variance trade-off curve (Pareto curve) but it is not achieved in practice due to parameter estimation errors:

R session: Loading market data

We will load some stock market data and divide it into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation).

In particular, we will start by loading some stock data from three different sectors:

sector technology: AAPL, AMD, ADI.
sector healthcare: ABBV, AEZS, A.
sector basic materials: APD, AA, CF.

library(xts)  # to manipulate time series of stock data
library(quantmod)  # to download stock data
library(PerformanceAnalytics)  # to compute performance measures

# download data from YahooFinance
stock_namelist <- c("AAPL", "AMD", "ADI",  "ABBV", "AEZS", "A",  "APD", "AA","CF")
prices <- xts()
for (i in 1:length(stock_namelist)) {
  tmp <- Ad(getSymbols(stock_namelist[i], from = "2013-01-01", to = "2018-12-31", auto.assign = FALSE))
  tmp <- na.approx(tmp, na.rm = FALSE)  # interpolate NAs
  prices <- cbind(prices, tmp)
}
colnames(prices) <- stock_namelist
indexClass(prices) <- "Date"

str(prices)

R>> An 'xts' object on 2013-01-02/2016-12-30 containing:
R>>   Data: num [1:1008, 1:9] 55.5 54.8 53.2 52.9 53.1 ...
R>>  - attr(*, "dimnames")=List of 2
R>>   ..$ : NULL
R>>   ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ...
R>>   Indexed by objects of class: [Date] TZ: UTC
R>>   xts Attributes:  
R>>  NULL

head(prices)

R>>                AAPL  AMD      ADI     ABBV AEZS        A      APD       AA       CF
R>> 2013-01-02 55.47129 2.53 37.50844 27.60817  253 27.97831 66.43760 20.62187 29.44429
R>> 2013-01-03 54.77112 2.49 36.90318 27.38020  254 28.07852 66.20545 20.80537 29.30506
R>> 2013-01-04 53.24548 2.59 36.24679 27.03431  257 28.63300 67.09529 21.24121 29.96146
R>> 2013-01-07 52.93227 2.67 36.35761 27.08934  259 28.42591 67.03341 20.87419 29.84922
R>> 2013-01-08 53.07474 2.67 35.98254 26.49976  255 28.19877 67.15720 20.87419 29.41162
R>> 2013-01-09 52.24524 2.63 35.88875 26.64913  258 28.96036 68.06254 20.82831 30.44025

tail(prices)

R>>                AAPL   AMD      ADI     ABBV AEZS        A      APD    AA       CF
R>> 2016-12-22 111.8445 11.60 70.07727 55.82264 4.10 44.91877 136.0360 29.75 27.82244
R>> 2016-12-23 112.0657 11.58 70.44936 56.43826 4.10 45.14350 136.3992 29.71 28.37686
R>> 2016-12-27 112.7774 12.07 70.89778 56.58311 4.05 45.44642 137.2373 29.65 29.52265
R>> 2016-12-28 112.2965 11.55 70.18222 56.37488 3.55 44.67448 135.1328 29.43 29.22696
R>> 2016-12-29 112.2676 11.59 70.20129 56.79134 3.60 44.72544 135.2359 28.89 29.47645
R>> 2016-12-30 111.3924 11.34 69.28539 56.69175 3.60 44.64704 134.7206 28.08 29.08836

# compute data matrix with log-returns
X <- diff(log(prices))[-1]
N <- ncol(X)  # number of stocks
T <- nrow(X)  # number of days

We can take a look at the prices of the stocks:

plot(prices/rep(prices[1, ], each = nrow(prices)), col = rainbow10equal, legend.loc = "topleft",
     main = "Normalized prices")

We can now use obtain the sample estimates from the returns \(\mathbf{x}_t\) (i.e., sample means and sample covariance matrix) as \[ \begin{align} \hat{\boldsymbol{\mu}} & = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t\\ \hat{\boldsymbol{\Sigma}} & = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})(\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T \end{align} \]

mu <- colMeans(X)
Sigma <- cov(X)

Now that we have some realistic parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\), we will design a portfolio with some small noise in the parameters to observe the effect.

R session: Sensitivity of Global Maximum Return Portfolio (GMRP)

Let’s start with the global maximum return portfolio (GMRP) formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] Of course the optimal solution is to allocate all the budget on the single stock with largest expected return, not very diversified. We can still implement it using a solver:

library(CVXR)

portfolioMaxReturn <- function(mu) {
  w <- Variable(length(mu))
  prob <- Problem(Maximize(t(w) %*% mu), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

w_GMRP <- portfolioMaxReturn(mu)
names(w_GMRP) <- colnames(X)
w_all_GMRP <- cbind(w_GMRP)

# plot allocation
barplot(t(w_all_GMRP), col = rainbow8equal[1], legend = colnames(w_all_GMRP), beside = TRUE,
        main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

As expected, all the budget is allocated to a single stock.

Let’s now introduce some estimation error in \(\boldsymbol{\mu}\) to see the effect:

library(mvtnorm)

# generate a few noisy estimations of mu
set.seed(357)
for (i in 1:6) {
  mu_noisy <- as.vector(rmvnorm(n = 1, mean = mu, sigma = (1/T)*Sigma))
  w_GMRP_noisy <- portfolioMaxReturn(mu_noisy)
  w_all_GMRP <- cbind(w_all_GMRP, w_GMRP_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMRP), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP), beside = TRUE,
        main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

We can see that the optimal allocation changes totally from realization to realization, which is highly undesirable (in fact, it is totally unacceptable).

R session: Sensitivity of Global Minimum Variance Portfolio (GMVP)

Let’s continue with the GMVP formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]

portfolioGMVP <- function(Sigma) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Minimize(quad_form(w, Sigma)), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

w_GMVP <- portfolioGMVP(Sigma)
names(w_GMVP) <- colnames(X)
w_all_GMVP <- cbind(w_GMVP)

# plot allocation
barplot(t(w_all_GMVP), col = rainbow8equal[1], legend = colnames(w_all_GMVP), beside = TRUE,
        main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")

The GMVP is totally different from the maximum return portfolio. It’s quite the opposite: the budget is allocated to most of the stocks to diversity the risk.

Let’s now introduce some estimation error in \(\boldsymbol{\Sigma}\) to see the effect:

library(PerformanceAnalytics)
# generate a few noisy estimations of Sigma
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = rep(0, N), sigma = Sigma)
  Sigma_noisy <- cov(X_noisy)
  
  w_GMVP_noisy <- portfolioGMVP(Sigma_noisy)
  w_all_GMVP <- cbind(w_all_GMVP, w_GMVP_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMVP), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP), beside = TRUE,
        main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMVP), main = "GMVP allocation", ylab = "w", space = 0, border = NA)

We can see that the optimal allocation still changes from realization to realization, although in this case is not that extreme.

R session: Sensitivity of Markowitz’s mean-variance portfolio

Finally, let’s take a look at Markowitz’s mean-variance portfolio formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w} -\lambda\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]

portfolioMarkowitz <- function(mu, Sigma, lmd = 0.5) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)),
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

w_Markowitz <- portfolioMarkowitz(mu, Sigma)
names(w_Markowitz) <- colnames(X)
w_all_Markowitz <- cbind(w_Markowitz)

# plot allocation
barplot(t(w_all_Markowitz), col = rainbow8equal[1], legend = colnames(w_all_Markowitz), beside = TRUE,
        main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")

The mean-variance portfolio seems to be not very diversified. This is one of the reasons why practitioners do not use it. The other reason is the sensitivity or lack of robustness to the estimation of the parameters as we will explore next.

# generate a few noisy estimations of mu and Sigma
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_Markowitz_noisy <- portfolioMarkowitz(mu_noisy, Sigma_noisy)
  w_all_Markowitz <- cbind(w_all_Markowitz, w_Markowitz_noisy)
}

# plot to compare the allocations
barplot(t(w_all_Markowitz), col = rainbow8equal[1:7], legend = colnames(w_all_Markowitz), beside = TRUE,
        main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_Markowitz), main = "Markowitz portfolio allocation", ylab = "w", space = 0, border = NA)

Again, the allocations are totally different from realization to realization. Totally unacceptable. Who is going to trust these allocations to invest their money?

Robust Global Maximum Return Portfolio Optimization

Global maximum return portfolio (GMRP)

The portfolio that maximizes the return (while ignoring the variance) is the linear program (LP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
In practice, however, \(\boldsymbol{\mu}\) is unknown and has to be estimated \(\hat{\boldsymbol{\mu}}\), e.g., with the sample mean \[\hat{\boldsymbol{\mu}}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}_{t},\] where \(\mathbf{x}_{t}\) is the return at day \(t\).

Unfortunately, it is well known that the estimation of \(\boldsymbol{\mu}\) is extremely noisy in practice (Chopra and Ziemba 1993).

Worst-case robust GMRP

Instead of assuming that \(\boldsymbol{\mu}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\mu}}\).
The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
We assume the expected returns are only known within an ellipsoid: \[\mathcal{U}_{\boldsymbol{\mu}} = \left\{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\mid\left\Vert \mathbf{u}\right\Vert _{2}\leq1\right\}\] where one can use the estimated covariance matrix to shape the uncertainty ellipsoid, i.e., \(\mathbf{S}=\hat{\boldsymbol{\Sigma}}\).

Worst-case robust GMRP

We can solve easily the inner minimization: \[\begin{array}{ll} \underset{\boldsymbol{\mu},\mathbf{u}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u},\\ & \left\Vert \mathbf{u}\right\Vert _{2}\leq1. \end{array}\]
It’s easy to find the minimum value using Cauchy-Schwartz’s inequality: \[\begin{aligned} \mathbf{w}^{T}\boldsymbol{\mu} &= \mathbf{w}^{T}\left(\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\right)\\ &= \mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\kappa\mathbf{w}^{T}\mathbf{S}^{1/2}\mathbf{u}\\ &\geq \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2} \end{aligned}\] with equality achieved with \(\mathbf{u}=-\frac{\mathbf{S}^{1/2}\mathbf{w}}{\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}}\).

Worst-case robust GMRP

Finally, the robust formulation becomes the SOCP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
Recall the vanilla problem formulation was the LP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
So, we have gone from an LP to an SOCP.
In general, when a problem is robustified, the complexity of the problem increases. For example:
- LP becomes SOCP
- QP also becomes SOCP
- SOCP becomes SDP.

R session: Robust optimization of global maximum return portfolio

The robust formulation of the global maximum return portfolio (GMRP) reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min} \mathbf{w}^T\boldsymbol{\mu}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

For an elliptical uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^e = \{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\ | \ \left\Vert \mathbf{u}\right\Vert_{2}\leq 1\}, \] where typically \(\mathbf{S}=\hat{\mathbf{\Sigma}}\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a box uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^b = \{\boldsymbol{\mu}\ | -\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\}, \] where typically \(\delta_i = \kappa\hat{\sigma}_i\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - |\mathbf{w}|^T\boldsymbol{\delta}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

Let’s start implementing the robust solution for the box constraint:

portfolioMaxReturnRobustBox <- function(mu_hat, delta) {
  w <- Variable(length(mu_hat))
  prob <- Problem(Maximize( t(w) %*% mu_hat - t(abs(w)) %*% delta ), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMRP <- portfolioMaxReturn(mu)
names(w_GMRP) <- colnames(X)
w_all_GMRP_robust_box <- cbind(w_GMRP)

# multiple robust solutions
kappa <- 0.1
delta <- kappa*sqrt(diag(Sigma_noisy))
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_GMRP_robust_box_noisy <- portfolioMaxReturnRobustBox(mu_noisy, delta)
  w_all_GMRP_robust_box <- cbind(w_all_GMRP_robust_box, w_GMRP_robust_box_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMRP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_box), 
        beside = TRUE, args.legend = list(bg = "white"),
        main = "Robust (box) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

Not much improvement with respect to the naive case. The sensitivity is still there. In fact, for the particular case of \(\boldsymbol{\delta}=\mathbf{1}\), it follows that \(|\mathbf{w}|^T\boldsymbol{\delta}=\mathbf{w}^T\mathbf{1}=1\)!

Let’s try the robust solution for the elliptical constraint:

portfolioMaxReturnRobustEllipsoid <- function(mu_hat, S, kappa = 0.1) {
  S12 <- chol(S)  # t(S12) %*% S12 = Sigma
  w <- Variable(length(mu_hat))
  prob <- Problem(Maximize( t(w) %*% mu_hat - kappa*norm2(S12 %*% w) ), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMRP <- portfolioMaxReturn(mu)
names(w_GMRP) <- colnames(X)
w_all_GMRP_robust_ellipsoid <- cbind(w_GMRP)

# multiple robust solutions
kappa <- 0.2
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_GMRP_robust_ellipsoid_noisy <- portfolioMaxReturnRobustEllipsoid(mu_noisy, Sigma_noisy, kappa)
  w_all_GMRP_robust_ellipsoid <- cbind(w_all_GMRP_robust_ellipsoid, w_GMRP_robust_ellipsoid_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMRP_robust_ellipsoid), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_ellipsoid), 
        beside = TRUE, args.legend = list(bg = "white"),
        main = "Robust (ellipsoid) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMRP_robust_ellipsoid), 
                 main = "Robust (ellipsoid) global maximum return portfolio allocation", 
                 ylab = "w", space = 0, border = NA)

The elliptical uncertainty set seems to work better for the maximum return portfolio. It definitely provides more diversity compared to the naive solution, as well as being more stable (less sensitive). However, it is still too sensitive.

Summarizing, for the robust global maximum return portfolio, it seems that the ellipsoid uncertainty in \(\boldsymbol{\mu}\) is more stable (and diverse) than the box uncertainly (which is totally unstable). Nevertheless, the allocation is still too sensitive owing to the fact that the global maximum return portfolio is probably not a good formulation as it ignores the covariance matrix.

Robust Global Minimum Variance Portfolio Optimization

Global Minimum Variance Portfolio (GMVP)

The global minimum variance portfolio (GMVP) ignores the expected return and focuses on the risk only: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
It is a simple convex QP with solution \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}.\]
It is widely used in academic papers for simplicity of evaluation and comparison of different estimators of the covariance matrix \(\boldsymbol{\Sigma}\) (while ignoring the estimation of \(\boldsymbol{\mu}\)).
In practice, \(\boldsymbol{\Sigma}\) is unknown and has to be estimated \(\hat{\boldsymbol{\Sigma}}\), e.g., with the sample covariance matrix.
Then, the naive portfolio becomes \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}.\]

Worst-case robust GMVP

Instead of assuming that \(\boldsymbol{\Sigma}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\Sigma}}\).
The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
In particular, we will assume that the estimation comes from the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\) where \(\mathbf{X}\) is a \(T\times N\) matrix containing the return data (assumed demeaned already).
However, we will assume that the data matrix is noisy \(\hat{\mathbf{X}}\) and the actual matrix can be written as \(\mathbf{X}=\hat{\mathbf{X}}+\boldsymbol{\Delta}\), where \(\boldsymbol{\Delta}\) is some error matrix bounded in its norm.
Thus, we will then model the data matrix as \[\mathcal{U}_{\mathbf{X}} =\left\{\mathbf{X}\mid\left\Vert\mathbf{X}-\hat{\mathbf{X}}\right\Vert_{F}\leq\delta_{\mathbf{X}}\right\}.\]

Worst-case robust GMVP

The worst-case robust formulation becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\mathbf{w}^{T}\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
Let’s focus on the inner maximization: \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\underset{\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\delta_{\mathbf{X}}}{\max}\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}^{2}\]
We first invoke the triangle inequality to get an upper bound: \[\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}\leq\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\] with equality achieved when the two vectors \(\hat{\mathbf{X}}\mathbf{w}\) and \(\boldsymbol{\Delta}\mathbf{w}\) are aligned.
Next, we invoke the norm inequality \[\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\leq\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\left\Vert \mathbf{w}\right\Vert _{2}\leq\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\] with equality achieved when \(\boldsymbol{\Delta}\) is rank-one with right singular vector aligned with \(\mathbf{w}\) and when \(\left\Vert \boldsymbol{\Delta}\right\Vert _{F}=\delta_{\mathbf{X}}\). (This follows easily from \(\mathbf{w}^{T}\mathbf{M}\mathbf{w}\leq\lambda_{\max}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\leq\mathsf{Tr}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\) for \(\mathbf{M}\succeq\mathbf{0}\).)

Worst-case robust GMVP

Finally, we can see that both upper bounds can be actually achieved if the error is properly chosen as \[\boldsymbol{\Delta}=\delta_{\mathbf{X}}\frac{\hat{\mathbf{X}}\mathbf{w}\mathbf{w}^{T}}{\left\Vert \mathbf{w}\right\Vert _{2}\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}}.\]
Thus, \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\left(\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\right)^{2}.\]
The robust problem formulation finally becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is a (convex) SOCP.

Worst-case robust GMVP

Recall the vanilla problem formulation was the QP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\]
Now, the robust problem formulation is the SOCP (from QP to SOCP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which contains the regularization term \(\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\).
One common heuristic, called Tikhonov regularization, is to consider instead \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}^{2}=\mathbf{w}^{T}\left(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I}\right)\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is equivalent to the vanilla formulation but using the regularized sample covariance matrix \(\hat{\boldsymbol{\Sigma}}^{\mathsf{tik}}=\frac{1}{T}(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I})=\hat{\boldsymbol{\Sigma}}+\frac{\delta_{\mathbf{X}}}{T}\mathbf{I}\).

R session: Robust optimization of GMVP

The robust formulation of the GMVP reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max} \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

For a spherical uncertainty set on the data matrix \(\mathbf{X}\) (assuming we use the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T-1}\mathbf{X}^T\mathbf{X}\)) \[ \mathcal{U}_\mathbf{X}^s = \{\mathbf{X}\ | \ \|\mathbf{X}-\hat{\mathbf{X}}\|_F \le \delta\}, \] the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \|\hat{\mathbf{X}}\mathbf{w}\|_2 + \delta \|\mathbf{w}\|_2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a box uncertainty set on the covariance matrix \[ \mathcal{U}_\boldsymbol{\Sigma}^b = \{\boldsymbol{\Sigma}\ | \ \underline{\boldsymbol{\Sigma}} \le \boldsymbol{\Sigma} \le \overline{\boldsymbol{\Sigma}},\ \boldsymbol{\Sigma} \succeq \mathbf{0}\}, \] where a choice for the bounds is \(\underline{\boldsymbol{\Sigma}}=\delta^{-1}\hat{\boldsymbol{\Sigma}}\) and \(\overline{\boldsymbol{\Sigma}}=\delta\hat{\boldsymbol{\Sigma}}\) with \(\delta>1\), the robust formulation becomes

\[ \begin{array}{ll} \underset{\mathbf{w},\underline{\boldsymbol{\Lambda}},\overline{\boldsymbol{\Lambda}}}{\textsf{minimize}} & \textsf{Tr}(\overline{\boldsymbol{\Lambda}}\ \overline{\boldsymbol{\Sigma}}) - \textsf{Tr}(\underline{\boldsymbol{\Lambda}}\ \underline{\boldsymbol{\Sigma}})\\ \textsf{subject to} & \left[\begin{array}{cc}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1\end{array} \right]\succeq\mathbf{0}\\ & \overline{\boldsymbol{\Lambda}}\ge\mathbf{0},\ \underline{\boldsymbol{\Lambda}}\ge\mathbf{0}\\ & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

For an elliptical uncertainty set on the covariance matrix \[ \mathcal{U}_{\boldsymbol{\Sigma}}^{e} = \{\boldsymbol{\Sigma}\ |\ (\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}}))^{T}\mathbf{S}_{\boldsymbol{\Sigma}}^{-1}(\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}}))\leq\delta^{2},\ \boldsymbol{\Sigma}\succeq\mathbf{0}\}, \] where one simple choice for \(\mathbf{S}_{\boldsymbol{\Sigma}}\) is the identity matrix, the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w},\mathbf{X},\mathbf{Z}}{\textsf{minimize}} & \textsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{X}+\mathbf{Z}\right)\right) + \delta\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\textsf{vec}(\mathbf{X})+\textsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad \mathbf{w}\ge\mathbf{0}\\ & \left[\begin{array}{cc}\mathbf{X} & \mathbf{w}\\ \mathbf{w}^{T} & 1\end{array} \right]\succeq\mathbf{0}\\ & \mathbf{Z}\succeq\mathbf{0}. \end{array} \]

Let’s start implementing the robust formulation for the box constraint on \(\boldsymbol{\Sigma}\):

# define function (check: https://cvxr.rbind.io/cvxr_functions/)
portfolioGMVPRobustBox <- function(Sigma_lb, Sigma_ub) {
  N <- nrow(Sigma_lb)
  w         <- Variable(N)
  Lambda_ub <- Variable(N, N)
  Lambda_lb <- Variable(N, N)
  Tmp       <- Variable(N+1, N+1, PSD = TRUE)
  prob <- Problem(Minimize(matrix_trace(Lambda_ub %*% Sigma_ub) - matrix_trace(Lambda_lb %*% Sigma_lb)),
                  constraints = list(w >= 0, sum(w) == 1,
                                     Lambda_ub >= 0, Lambda_lb >= 0,
                                     bmat(list(list(Lambda_ub - Lambda_lb, w),
                                               list(t(w),                  1))) == Tmp))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMVP <- portfolioGMVP(Sigma)
names(w_GMVP) <- colnames(X)
w_all_GMVP_robust_box <- cbind(w_GMVP)

# multiple robust solutions
delta <- 1.1
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  Sigma_lb <- ifelse(Sigma_noisy >= 0,  (1/delta)*Sigma_noisy, delta*Sigma_noisy)
  Sigma_ub <- ifelse(Sigma_noisy >= 0,  delta*Sigma_noisy, (1/delta)*Sigma_noisy)
  w_GMVP_robust_box_noisy <- portfolioGMVPRobustBox(Sigma_lb, Sigma_ub)
  w_all_GMVP_robust_box <- cbind(w_all_GMVP_robust_box, w_GMVP_robust_box_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMVP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_box), 
        beside = TRUE, args.legend = list(bg = "white", x = "topleft"),
        main = "Robust (box) GMVP allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)

chart.StackedBar(t(w_all_GMVP_robust_box), 
                 main = "Robust (box) GMVP allocation", ylab = "w", space = 0, border = NA)

The naive GMVP was already kind of stable. Nevertheless, the robust GMVP becomes even more stable (less sensitive to the estimation errors). Better results could be obtained by choosing more appropriate upper and lower bounds on \(\boldsymbol{\Sigma}\).

Let’s consider now the robust formulation for the spherical uncertainty region on the data matrix \(\mathbf{X}\):

# define function
portfolioGMVPRobustSphereX <- function(X_hat, delta) {
  N <- ncol(X_hat)
  w <- Variable(N)
  X_ <- scale(X_hat, center = TRUE, scale = FALSE)  # demean
  prob <- Problem(Minimize(norm2(X_ %*% w) + delta*norm2(w)),
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMVP <- portfolioGMVP(Sigma)
names(w_GMVP) <- colnames(X)
w_all_GMVP_robust_sphereX <- cbind(w_GMVP)

# multiple robust solutions
delta <- 0.1
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)

  w_GMVP_robust_sphereX_noisy <- portfolioGMVPRobustSphereX(X_noisy, delta)
  w_all_GMVP_robust_sphereX <- cbind(w_all_GMVP_robust_sphereX, w_GMVP_robust_sphereX_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMVP_robust_sphereX), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_sphereX), 
        beside = TRUE, args.legend = list(bg = "white", x = "topleft"),
        main = "Robust (sphere in X) GMVP allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)

chart.StackedBar(t(w_all_GMVP_robust_sphereX), 
                 main = "Robust (sphere in X) GMVP allocation", ylab = "w", space = 0, border = NA)

This robust GMVP with spherical uncertainty in \(\mathbf{X}\) seems quite stable and the solution is not sensitive to the choice of \(\delta\), so very straightforward to use (as opposed to the previous case with box uncertainty in \(\boldsymbol{\Sigma}\), which requires a careful choice of the upper and lower bounds). Note that better results could be obtained by using a better choice for the shape of the ellipsoid instead of just a sphere.

Finally, let’s consider the robust formulation for the elliptical uncertainty region on the covariance matrix \(\boldsymbol{\Sigma}\):

# define function
portfolioGMVPRobustEllipsoid <- function(Sigma_hat, S, delta) {
  N <- ncol(Sigma_hat)
  S12 <- chol(S)  # t(S12) %*% S12 = Sigma
  w <- Variable(N)
  X <- Variable(N, N)
  Z <- Variable(N, N, PSD = TRUE)
  Tmp  <- Variable(N+1, N+1, PSD = TRUE)
  prob <- Problem(Minimize(matrix_trace(Sigma_hat %*% (X + Z)) + delta*norm2(S12 %*% (vec(X) + vec(Z)))),
                  constraints = list(w >= 0, sum(w) == 1,
                                     bmat(list(list(X,    w),
                                               list(t(w), 1))) == Tmp))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_GMVP <- portfolioGMVP(Sigma)
names(w_GMVP) <- colnames(X)
w_all_GMVP_robust_ellipsoid <- cbind(w_GMVP)

# multiple robust solutions
delta <- 0.0005
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  Sigma_noisy <- cov(X_noisy)

  w_GMVP_robust_ellipsoid_noisy <- portfolioGMVPRobustEllipsoid(Sigma_noisy, diag(N^2), delta)
  w_all_GMVP_robust_ellipsoid <- cbind(w_all_GMVP_robust_ellipsoid, w_GMVP_robust_ellipsoid_noisy)
}

# plot to compare the allocations
barplot(t(w_all_GMVP_robust_ellipsoid), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_ellipsoid), 
        beside = TRUE, args.legend = list(bg = "white", x = "topleft"),
        main = "Robust (ellipsoid) GMVP allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)

chart.StackedBar(t(w_all_GMVP_robust_ellipsoid), 
                 main = "Robust (ellipsoid) GMVP allocation", ylab = "w", space = 0, border = NA)

This robust GMVP with spherical uncertainty in \(\boldsymbol{\Sigma}\) is also stable but seems to be a bit less accurate than the previous robut GMVP with spherical uncertainty in \(\mathbf{X}\) (just see the allocation to stock AMD).

Summarizing, for the robust GMVP it seems that the spherical uncertainty in \(\mathbf{X}\) is the best uncertainty set in terms of stability (least sensitivity among different noisy realizations) and accuracy compared to the clairvoyant allocation.

Robust Markowitz’s Portfolio Optimization

Markowitz’s portfolio formulation

Recall Markowitz’s formulation: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \quad\mathbf{w}\in\mathcal{W}, \end{array}\] where \(\mathcal{W}\) denotes some other constraints on \(\mathbf{w}\).

Instead of assuming \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) are known perfectly, now we assume they belong to some convex uncertainty sets, denoted as \(\mathcal{U}_{\boldsymbol{\mu}}\) and \(\mathcal{U}_{\boldsymbol{\Sigma}}\), respectively.
The worst-case formulation will consider the worst-case points within those uncertainty sets.

Worst-case Markowitz’s portfolio

A conservative and practical investment approach is to optimize the worst-case objective over the uncertainty sets (Cornuejols and Tütüncü 2006), (Fabozzi 2007): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}. \end{array}\]
The two key issues are:
1. How to choose the uncertainty sets \(\mathcal{U}_{\boldsymbol{\mu}}\) and \(\mathcal{U}_{\boldsymbol{\Sigma}}\) so that they are meaningful in practice.
2. To make sure the optimization problem above is still easy to solve.

Worst-case mean: Box set

Box uncertainty set: \[\mathcal{U}_{\boldsymbol{\mu}}^{b} =\left\{\boldsymbol{\mu}\mid-\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\right\},\] where the predefined parameters \(\hat{\boldsymbol{\mu}}\) and \(\boldsymbol{\delta}\) denote the location and size of the box uncertainty set, respectively.
Easily, the worst-case mean is \[\underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}^{b}}{\min}\mathbf{w}^{T}\boldsymbol{\mu} =\mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\underset{-\boldsymbol{\delta}\leq\boldsymbol{\gamma}\leq\boldsymbol{\delta}}{\min}\mathbf{w}^{T}\boldsymbol{\gamma}=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}-|\mathbf{w}|^{T}\boldsymbol{\delta},\] where \(|\mathbf{w}|\) denotes elementwise absolute value of \(\mathbf{w}\).
Note that it is a concave function of \(\mathbf{w}\) (as it should be since it is the minimum of linear functions).

Worst-case mean: Elliptical set

Elliptical uncertainty set: \[\mathcal{U}_{\boldsymbol{\mu}}^{e} =\left\{\boldsymbol{\mu}\mid(\boldsymbol{\mu}-\hat{\boldsymbol{\mu}})^{T}\mathbf{S}_{\boldsymbol{\mu}}^{-1}(\boldsymbol{\mu}-\hat{\boldsymbol{\mu}})\leq\delta_{\boldsymbol{\mu}}^{2}\right\},\] where the predefined parameters \(\hat{\boldsymbol{\mu}}\), \(\delta_{\boldsymbol{\mu}}>0\), and \(\mathbf{S}_{\boldsymbol{\mu}}\succ\mathbf{0}\) denote the location, size, and the shape of the uncertainty set, respectively.
The worst-case mean is \[\begin{aligned} \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}^{e}}{\min}\mathbf{w}^{T}\boldsymbol{\mu} &= \underset{\left\Vert \mathbf{S}_{\boldsymbol{\mu}}^{-1/2}\boldsymbol{\gamma}\right\Vert _{2}\leq\delta_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}(\hat{\boldsymbol{\mu}}+\boldsymbol{\gamma})=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\underset{\left\Vert \mathbf{S}_{\boldsymbol{\mu}}^{-1/2}\boldsymbol{\gamma}\right\Vert _{2}\leq\delta_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\gamma}\\ &=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\underset{\left\Vert \tilde{\boldsymbol{\gamma}}\right\Vert _{2}\leq\delta_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\mathbf{S}_{\boldsymbol{\mu}}^{1/2}\tilde{\boldsymbol{\gamma}}=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\delta_{\boldsymbol{\mu}}\left\Vert \mathbf{S}_{\boldsymbol{\mu}}^{1/2}\mathbf{w}\right\Vert _{2}. \end{aligned}\]
Note that it is a concave function of \(\mathbf{w}\) (as it should be since it is the minimum of linear functions).

Worst-case variance: Box set

Box uncertainty set: \[\mathcal{U}_{\boldsymbol{\Sigma}}^{b} =\left\{\boldsymbol{\Sigma}\mid\underline{\boldsymbol{\Sigma}}\leq\boldsymbol{\Sigma}\leq\overline{\boldsymbol{\Sigma}},\boldsymbol{\Sigma}\succeq\mathbf{0}\right\}.\]

The worst-case value \(\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}^{b}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) is given by the (convex) semidefinite problem (SDP) \[\begin{array}{ll} \underset{\boldsymbol{\Sigma}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \underline{\boldsymbol{\Sigma}}\leq\boldsymbol{\Sigma}\leq\overline{\boldsymbol{\Sigma}},\\ & \boldsymbol{\Sigma}\succeq\mathbf{0}. \end{array}\]

Worst-case variance: Box set

The equivalent dual problem is (Lobo and Boyd 2000) \[\begin{array}{ll} \underset{\overline{\boldsymbol{\Lambda}},\,\underline{\boldsymbol{\Lambda}}}{\textsf{minimize}} & \textsf{Tr}(\overline{\boldsymbol{\Lambda}}\,\overline{\boldsymbol{\Sigma}})-\textsf{Tr}(\underline{\boldsymbol{\Lambda}}\,\underline{\boldsymbol{\Sigma}})\\ \textsf{subject to} & \begin{bmatrix}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{bmatrix}\succeq\mathbf{0},\\ & \overline{\boldsymbol{\Lambda}}\geq\mathbf{0},\quad\underline{\boldsymbol{\Lambda}}\geq\mathbf{0}, \end{array}\] which is a convex SDP.
The constraints are jointly convex in the inner dual variable variables \(\overline{\boldsymbol{\Lambda}}\) and \(\underline{\boldsymbol{\Lambda}}\) and the outer variable \(\mathbf{w}\).

Worst-case variance: Elliptical set

Elliptical uncertainty set: \[\mathcal{U}_{\boldsymbol{\Sigma}}^{e} =\left\{\boldsymbol{\Sigma}\mid\left(\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}})\right)^{T}\mathbf{S}_{\boldsymbol{\Sigma}}^{-1}\left(\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}})\right)\leq\delta_{\boldsymbol{\Sigma}}^{2},\boldsymbol{\Sigma}\succeq\mathbf{0}\right\}\] where \(\hat{\boldsymbol{\Sigma}}\) denotes the location, \(\delta_{\boldsymbol{\Sigma}}\) denotes the size, and \(\mathbf{S}_{\boldsymbol{\Sigma}}\) determines the shape.

The worst-case value \(\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}^{e}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) is given by the (convex) SDP \[\begin{array}{ll} \underset{\boldsymbol{\Sigma}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} &\left(\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}})\right)^{T}\mathbf{S}_{\boldsymbol{\Sigma}}^{-1}\left(\mathsf{vec}(\boldsymbol{\Sigma})-\mathsf{vec}(\hat{\boldsymbol{\Sigma}})\right)\leq\delta_{\boldsymbol{\Sigma}}^{2},\\ & \boldsymbol{\Sigma}\succeq\mathbf{0}. \end{array}\]

Worst-case variance: Elliptical set

Since the problem is convex, strong duality holds (zero duality gap).
Thus, the maximum objective value equals the minimum objective value of its dual problem: \[\begin{array}{ll} \underset{\mathbf{Z}}{\textsf{minimize}} & \mathsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{w}\mathbf{w}^{T}+\mathbf{Z}\right)\right)+\delta_{\boldsymbol{\Sigma}}\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\mathsf{vec}(\mathbf{w}\mathbf{w}^{T})+\mathsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\\ \textsf{subject to} & \mathbf{Z}\succeq\mathbf{0}. \end{array}\]
We can now plug in this inner problem in the original problem: \[\begin{array}{ll} \underset{\mathbf{w},\mathbf{Z}}{\textsf{maximize}} & \begin{array}{r} \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\left(\mathsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{w}\mathbf{w}^{T}+\mathbf{Z}\right)\right)\right.\hspace{2cm}\\ \left.+\delta_{\boldsymbol{\Sigma}}\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\mathsf{vec}(\mathbf{w}\mathbf{w}^{T})+\mathsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\right) \end{array}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}\\ & \mathbf{Z}\succeq\mathbf{0}. \end{array}\]
However, now this problem contains a complicated term with the composition of \(\mathsf{vec}(\mathbf{w}\mathbf{w}^{T})\) and the norm \(\left\Vert \cdot\right\Vert _{2}\).

Worst-case variance: Elliptical set

We can further include a new variable as \(\mathbf{X}=\mathbf{w}\mathbf{w}^{T}\) so that the objective function becomes nicer.
But this constraint is not convex!
Luckily we can instead use \(\mathbf{X}\succeq\mathbf{w}\mathbf{w}^{T}\) (because we can easily show that at the optimal point it will be achieved with equality), which can be further expressed as \(\left[\begin{array}{cc} \mathbf{X} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{array}\right]\succeq\mathbf{0}\).
The final problem is \[\begin{array}{ll} \underset{\mathbf{w},\mathbf{X},\mathbf{Z}}{\textsf{maximize}} & \begin{array}{r} \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\left(\mathsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{X}+\mathbf{Z}\right)\right)\right.\hspace{2cm}\\ \left.+\delta_{\boldsymbol{\Sigma}}\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\mathsf{vec}(\mathbf{X})+\mathsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\right) \end{array}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}\\ &\left[\begin{array}{cc} \mathbf{X} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{array}\right]\succeq\mathbf{0}\\ & \mathbf{Z}\succeq\mathbf{0}. \end{array}\]

Equivalent formulation

Consider the two uncertainty sets: \[\begin{aligned} \mathcal{U}_{\boldsymbol{\mu}}^{b} &= \left\{\boldsymbol{\mu}\mid-\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\right\},\\ \mathcal{U}_{\boldsymbol{\Sigma}}^{b} &= \left\{\boldsymbol{\Sigma}\mid\underline{\boldsymbol{\Sigma}}\leq\boldsymbol{\Sigma}\leq\overline{\boldsymbol{\Sigma}},\boldsymbol{\Sigma}\succeq\mathbf{0}\right\}. \end{aligned}\]
The worst-case robust formulation \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}. \end{array}\] becomes \[\begin{array}[t]{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \begin{array}[t]{r} \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-|\mathbf{w}|^{T}\boldsymbol{\delta}-\lambda \begin{array}[t]{l} \thinspace\underset{\overline{\boldsymbol{\Lambda}},\underline{\boldsymbol{\Lambda}}}{\min}\left\{ \mathsf{Tr}(\overline{\boldsymbol{\Lambda}}\,\overline{\boldsymbol{\Sigma}})-\mathsf{Tr}(\underline{\boldsymbol{\Lambda}}\,\underline{\boldsymbol{\Sigma}})\right\}\\ \textsf{subject to}\quad \begin{array}[t]{l} \begin{bmatrix}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{bmatrix}\succeq\mathbf{0},\\ \overline{\boldsymbol{\Lambda}}\geq\mathbf{0},\quad\underline{\boldsymbol{\Lambda}}\geq\mathbf{0}, \end{array} \end{array} \end{array}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}. \end{array}\]

Equivalent formulation

Finally, the worst-case robust formulation can be written as the (convex) SDP: \[\begin{array}{ll} \underset{\mathbf{w},\overline{\boldsymbol{\Lambda}},\underline{\boldsymbol{\Lambda}}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-|\mathbf{w}|^{T}\boldsymbol{\delta}-\lambda\left(\textsf{Tr}(\overline{\boldsymbol{\Lambda}}\,\overline{\boldsymbol{\Sigma}})-\textsf{Tr}(\underline{\boldsymbol{\Lambda}}\,\underline{\boldsymbol{\Sigma}})\right)\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W},\\ & \begin{bmatrix}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{bmatrix}\succeq\mathbf{0},\\ & \overline{\boldsymbol{\Lambda}}\geq\mathbf{0},\quad\underline{\boldsymbol{\Lambda}}\geq\mathbf{0}. \end{array}\]

This problem does not have a closed-form solution, but it is an SDP that can be easily solved with an off-the-shelf SDP solver.

More cases of robust formulations

📘 F. J. Fabozzi. Robust Portfolio Optimization and Management. Wiley, 2007.

📄 D. Goldfarb and G. Iyengar, “Robust portfolio selection problems”, Mathematics of Operations Research, vol. 28, no. 1, pp. 1–38, 2003.

📄 R. H. Tutuncu and M. Koenig, “Robust asset allocation”, Annals Operations Research, vol. 132, no. 1, pp. 157–187, 2004.

📄 L. El Ghaoui, M. Oks, and F. Oustry, “Worst-case value-at-risk and robust portfolio optimization: A conic programming approach”, Operations Research, pp. 543–556, 2003.

📄 Z. Lu, “Robust portfolio selection based on a joint ellipsoidal uncertainty set”, Optimization Methods & Software, vol. 26, no. 1, pp. 89–104, 2011.

Markowitz’s portfolio: Naive vs robust

In terms of Sharpe ratio, the robust is clearly superior to the naive (note that the mean-variance portfolio is not the same as the maximum Sharpe ration portfolio).

R session: Robust optimization of Markowitz’s mean-variance portfolio

The robust formulation of the mean-variance Markowitz portfolio reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min} \mathbf{w}^T\boldsymbol{\mu} - \lambda\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max} \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] There are many choices for the uncertainty sets \(\mathcal{U}_{\boldsymbol{\mu}}\) and \(\mathcal{U}_{\boldsymbol{\Sigma}}\) leading to many combinations for the robust Markowitz portfolio. In light of our previous experiments with the maximum-return portfolio and GMVP, we will simply consider one choice given by an elliptical uncertainty set for the mean \(\boldsymbol{\mu}\) and a spherical uncertainty set for the data matrix \(\mathbf{X}\): \[ \mathcal{U}_\boldsymbol{\mu}^e = \{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\ | \ \left\Vert \mathbf{u}\right\Vert_{2}\leq 1\}, \] with \(\mathbf{S}=\hat{\mathbf{\Sigma}}\), and \[ \mathcal{U}_\mathbf{X}^s = \{\mathbf{X}\ | \ \|\mathbf{X}-\hat{\mathbf{X}}\|_F \le \delta\}. \]

The robust formulation with those uncertainty sets becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_{2} -\frac{\lambda}{T-1}\left(\|\hat{\mathbf{X}}\mathbf{w}\|_2 + \delta \|\mathbf{w}\|_2\right)^2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] It can be further simplified noting that \(\hat{\mathbf{\Sigma}} = \frac{1}{T-1}\hat{\mathbf{X}}^T\hat{\mathbf{X}}\) implies \(\hat{\mathbf{\Sigma}}^{1/2} = \frac{1}{\sqrt(T-1)}\hat{\mathbf{X}}\) and \(\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_2 = \frac{1}{\sqrt(T-1)}\|\hat{\mathbf{X}}\mathbf{w}\|_2\): \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_{2} - \lambda\left(\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_2 + \frac{\delta}{\sqrt(T-1)} \|\mathbf{w}\|_2\right)^2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

Let’s implement this robust portfolio optimization:

portfolioMarkowitzRobust <- function(mu_hat, Sigma_hat, kappa, delta_, lmd = 0.5) {
  N <- length(mu_hat)
  S12 <- chol(Sigma_hat)  # t(S12) %*% S12 = Sigma
  w <- Variable(N)
  prob <- Problem(Maximize(t(w) %*% mu_hat - kappa*norm2(S12 %*% w) 
                           - lmd*(norm2(S12 %*% w) + delta_*norm2(w))^2),
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# clairvoyant solution
w_Markowitz <- portfolioMarkowitz(mu, Sigma)
names(w_Markowitz) <- colnames(X)
w_all_Markowitz_robust <- cbind(w_Markowitz)

# multiple robust solutions
kappa <- 1.0
delta <- 0.1*sqrt(T)
set.seed(357)
for (i in 1:6) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_Markowitz_robust_noisy <- portfolioMarkowitzRobust(mu_noisy, Sigma_noisy, kappa, delta/sqrt(T-1))
  w_all_Markowitz_robust <- cbind(w_all_Markowitz_robust, w_Markowitz_robust_noisy)
}

# plot to compare the allocations
barplot(t(w_all_Markowitz_robust), col = rainbow8equal[1:7], legend = colnames(w_all_Markowitz_robust), 
        beside = TRUE, args.legend = list(bg = "white", x = "topleft"),
        main = "Robust Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")

chart.StackedBar(t(w_all_Markowitz), 
                 main = "Naive Markowitz allocation", ylab = "w", space = 0, border = NA)

chart.StackedBar(t(w_all_Markowitz_robust), 
                 main = "Robust Markowitz portfolio allocation", ylab = "w", space = 0, border = NA)

We can see that the robust mean-variance Markowitz portfolio is more stable. By playing with the parameters \(\kappa\) and \(\delta\) one can obtain very different results (the naive formulation is recovered for \(\kappa=0\) and \(\delta=0\)). For large \(\delta\), the allocation tends to the uniform \(1/N\) allocation.

R session: Sensitivity of naive and robust Markowitz’s mean-variance portfolio

Now that we know how to obtain less sensitive (more stable) allocations, we can study what the performance is. We will consider the mean-variance Markowitz as an example; in particular, we will compute the naive and robust Markowitz portfolios for many random realizations of the estimated parameters \(\hat{\boldsymbol{\mu}}\) and \(\hat{\boldsymbol{\Sigma}}\), and then assess them in terms of the real parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).

# clairvoyant solutions
w_Markowitz <- portfolioMarkowitz(mu, Sigma)
names(w_Markowitz) <- colnames(X)
portfolioMaxSharpeRatio <- function(mu, Sigma) {
  w_ <- Variable(nrow(Sigma))
  prob <- Problem(Minimize(quad_form(w_, Sigma)),
                  constraints = list(w_ >= 0, t(mu) %*% w_ == 1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w_)/sum(result$getValue(w_)))
  names(w) <- colnames(Sigma)
  return(w)
}
w_MaxSR <- portfolioMaxSharpeRatio(mu, Sigma)
names(w_MaxSR) <- colnames(X)

# multiple naive and robust solutions
kappa <- 0.25  # smaller gives some bad outliers
delta <- 0.01*sqrt(T)  #larger gives a much more stable performance
w_all_Markowitz_naive <- NULL
w_all_Markowitz_robust <- NULL
set.seed(357)
for (i in 1:300) {
  X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma)
  mu_noisy <- colMeans(X_noisy)
  Sigma_noisy <- cov(X_noisy)
  
  w_Markowitz_noisy <- portfolioMarkowitz(mu_noisy, Sigma_noisy)
  w_Markowitz_robust_noisy <- portfolioMarkowitzRobust(mu_noisy, Sigma_noisy, kappa, delta/sqrt(T-1))
  w_all_Markowitz_naive <- cbind(w_all_Markowitz_naive, w_Markowitz_noisy)
  w_all_Markowitz_robust <- cbind(w_all_Markowitz_robust, w_Markowitz_robust_noisy)
}

# performance
mean_variance <- function(w, mu, Sigma) 
  return(t(w) %*% mu - 0.5 * t(w) %*% Sigma %*% w)

Sharpe_ratio <- function(w, mu, Sigma) 
  return(t(w) %*% mu / sqrt(t(w) %*% Sigma %*% w))

mean_variance_clairvoyant <- mean_variance(w_Markowitz, mu, Sigma)
mean_variance_naive <- apply(w_all_Markowitz_naive, MARGIN = 2, FUN = mean_variance, mu, Sigma)
mean_variance_robust <- apply(w_all_Markowitz_robust, MARGIN = 2, FUN = mean_variance, mu, Sigma)

SR_clairvoyant <- Sharpe_ratio(w_MaxSR, mu, Sigma)
SR_naive <- apply(w_all_Markowitz_naive, MARGIN = 2, FUN = Sharpe_ratio, mu, Sigma)
SR_robust <- apply(w_all_Markowitz_robust, MARGIN = 2, FUN = Sharpe_ratio, mu, Sigma)

Let’s plot the mean-variance achieved, i.e., \(\boldsymbol{\mu}^T\mathbf{w} -0.5\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\):

# plots
library(rbokeh)  # devtools::install_github('bokeh/rbokeh')
figure(width = 800, title = "Performance Markowitz", 
       xlab = "realization", ylab = "mean-variance", legend_location = "bottom_right") %>%
  ly_points(mean_variance_naive, color = "blue", legend = "naive") %>%
  ly_points(mean_variance_robust, color = "red", legend = "robust") %>%
  ly_abline(h = mean_variance_clairvoyant, legend = "clairvoyant")

We can see that the robust Markowitz portfolio is more stable and does not suffer from the really bad outliers.

Let’s plot now the Sharpe ratio achieved, i.e., \(\boldsymbol{\mu}^T\mathbf{w}/\sqrt(\mathbf{w}^T\mathbf{\Sigma}\mathbf{w})\):

figure(width = 800, title = "Performance Markowitz", 
       xlab = "realization", ylab = "Sharpe-ratio", legend_location = "bottom_right") %>%
  ly_points(SR_naive, color = "blue", legend = "naive") %>%
  ly_points(SR_robust, color = "red", legend = "robust") %>%
  ly_abline(h = SR_clairvoyant, legend = "clairvoyant")

Observe that the clairvoyant solution does not present the maximum Sharpe ratio because it was not designed for that (a different formulation would be required for that). Clearly, the robust Markowitz portfolios show much higher Sharpe ratio than the naive ones. This final plot illustrates well the power of robust optimization applied to portfolio design.

Extra: Variance uncertainty based on factor model

We will consider one particular example based on modeling the returns via a factor model: \[\mathbf{r}=\boldsymbol{\mu}+\mathbf{V}^{T}\mathbf{f}+\boldsymbol{\epsilon}\] where \(\mathbf{f}\) denotes the random factors distributed according to \(\mathbf{f}\sim\mathcal{N}\left(\mathbf{0},\mathbf{F}\right)\) and \(\boldsymbol{\epsilon}\) denotes the a random residual error with uncorrelated elements distributed according to \(\boldsymbol{\epsilon}\sim\mathcal{N}\left(\mathbf{0},\mathbf{D}\right)\) with \(\mathbf{D}=\mathsf{diag}\left(\mathbf{d}\right)\).
The returns are then distributed according to \(\mathbf{r}\sim\mathcal{N}\left(\boldsymbol{\mu},\mathbf{V}^{T}\mathbf{F}\mathbf{V}+\mathbf{D}\right)\) and the obtained return using portfolio \(\mathbf{w}\) has mean \(\boldsymbol{\mu}^{T}\mathbf{w}\) and variance \(\mathbf{w}^{T}\left(\mathbf{V}^{T}\mathbf{F}\mathbf{V}+\mathbf{D}\right)\mathbf{w}\).
We will consider uncertainty in the knowledge of \(\boldsymbol{\mu}\), \(\mathbf{V}\), and \(\mathbf{D}\), while \(\mathbf{F}\) is assumed know; in fact, we consider \(\mathbf{F}=\mathbf{I}\).

Extra: Variance uncertainty based on factor model

We can then formulate the robust mean-variance problem as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\mathbf{V}\in\mathcal{S}_{\mathbf{V}},\mathbf{D}\in\mathcal{S}_{\mathbf{D}}}{\textsf{max}} \mathbf{w}^{T}\left(\mathbf{V}^{T}\mathbf{V}+\mathbf{D}\right)\mathbf{w}\\ \textsf{subject to} & \underset{\boldsymbol{\mu}\in\mathcal{S}_{\boldsymbol{\mu}}}{\textsf{min}}\mathbf{w}^{T}\boldsymbol{\mu}\geq\beta\\ & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
We will define the uncertainty as follows: \[\begin{aligned} \mathcal{S}_{\boldsymbol{\mu}} &= \left\{ \boldsymbol{\mu}\mid\boldsymbol{\mu}=\boldsymbol{\mu}_{0}+\boldsymbol{\delta},\,\left|\delta_{i}\right|\leq\gamma_{i},\,i-1,\ldots,M\right\}\\ \mathcal{S}_{\mathbf{V}} &= \left\{ \mathbf{V}\mid\mathbf{V}=\mathbf{V}_{0}+\boldsymbol{\Delta},\,\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\rho\right\}\\ \mathcal{S}_{\mathbf{D}} &= \left\{ \mathbf{D}\mid\mathbf{D}=\mathsf{diag}\left(\mathbf{d}\right),\,d_{i}\in\left[\underline{d}_{i},\overline{d}_{i}\right],\,i-1,\ldots,M\right\} . \end{aligned}\]
Let’s now elaborate on each of the inner optimizations…

Extra: Variance uncertainty based on factor model

First, consider the worst case mean return: \[\underset{\boldsymbol{\mu}\in\mathcal{S}_{\boldsymbol{\mu}}}{\textsf{min}}\;\boldsymbol{\mu}^{T}\mathbf{w}=\boldsymbol{\mu}_{0}^{T}\mathbf{w}+\underset{\left|\delta_{i}\right|\leq\gamma_{i}}{\textsf{min}}\boldsymbol{\delta}^{T}\mathbf{w}=\boldsymbol{\mu}_{0}^{T}\mathbf{w}-\boldsymbol{\gamma}^{T}\left|\mathbf{w}\right|\] which is a concave function.
Second, let’s turn to the second term of the worst-case variance: \[\underset{\mathbf{D}\in\mathcal{S}_{\mathbf{D}}}{\textsf{max}} \;\mathbf{w}^{T}\mathbf{D}\mathbf{w}=\underset{d_{i}\in\left[\underline{d}_{i},\overline{d}_{i}\right]}{\textsf{max}} \sum_{i=1}^{M}d_{i}w_{i}^{2}=\sum_{i=1}^{M}\overline{d}_{i}w_{i}^{2}=\mathbf{w}^{T}\overline{\mathbf{D}}\mathbf{w}.\]
Finally, let’s focus on the first term of the worst-case variance: \[\underset{\mathbf{V}\in\mathcal{S}_{\mathbf{V}}}{\textsf{max}}\; \mathbf{w}^{T}\mathbf{V}^{T}\mathbf{V}\mathbf{w}\equiv\underset{\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\rho}{\textsf{max}} \left\Vert \mathbf{V}_{0}\mathbf{w}+\boldsymbol{\Delta}\mathbf{w}\right\Vert =\left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\rho\left\Vert \mathbf{w}\right\Vert.\]

Extra: Variance uncertainty based on factor model

Proof: From the triangle inequality we have \[\begin{aligned} \left\Vert \mathbf{V}_{0}\mathbf{w}+\boldsymbol{\Delta}\mathbf{w}\right\Vert &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert\\ &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\sqrt{\mathbf{w}^{T}\boldsymbol{\Delta}^{T}\boldsymbol{\Delta}\mathbf{w}}\\ &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\left\Vert \mathbf{w}\right\Vert \left\Vert \boldsymbol{\Delta}\right\Vert _{F}\\ &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\left\Vert \mathbf{w}\right\Vert \rho \end{aligned}\] but this upper bound is achievable by the worst-case variable \[\boldsymbol{\Delta}=\mathbf{u}\frac{\mathbf{w}^{T}}{\left\Vert \mathbf{w}\right\Vert }\rho\] where \[\mathbf{u}=\left\{ \begin{array}{lll} \frac{\mathbf{V}_{0}\mathbf{w}}{\left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert } & & \text{if }\mathbf{V}_{0}\mathbf{w}\neq\mathbf{0}\\ \text{any unitary vector} & & \text{otherwise}. \end{array}\right.\]

Extra: Variance uncertainty based on factor model

Finally, the robust portfolio formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left(\left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\rho\left\Vert \mathbf{w}\right\Vert \right)^{2}+\mathbf{w}^{T}\overline{\mathbf{D}}\mathbf{w}\\ \textsf{subject to} & \boldsymbol{\mu}_{0}^{T}\mathbf{w}-\boldsymbol{\gamma}^{T}\left|\mathbf{w}\right|\geq\beta\\ & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\] or, better, as the SOCP \[\begin{array}{ll} \underset{\mathbf{w},t}{\textsf{minimize}} & t^{2}+\mathbf{w}^{T}\overline{\mathbf{D}}\mathbf{w}\\ \textsf{subject to} & t\geq\left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\rho\left\Vert \mathbf{w}\right\Vert \\ & \boldsymbol{\mu}_{0}^{T}\mathbf{w}\geq\beta+\boldsymbol{\gamma}^{T}\left|\mathbf{w}\right|\\ & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]

Robust Sharpe Ratio Portfolio Optimization

Sharpe ratio portfolio optimization formulation

Recall the maximum Sharpe ratio portfolio formulation in convex form: \[\begin{array}{ll} \underset{\tilde{\mathbf{w}}}{\textsf{minimize}} & \tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}\\ \textsf{subject to} & \tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})=1\\ & \mathbf{1}^{T}\tilde{\mathbf{w}}\geq0\\ & \tilde{\mathbf{w}}\geq\mathbf{0}. \end{array}\]
The portfolio is then obtained with the correct scaling factor as \[\mathbf{w}=\tilde{\mathbf{w}}/(\mathbf{1}^{T}\tilde{\mathbf{w}}).\]

In order to obtain a worst-case robust formulation, we first relax the equality \(\tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})=1\) to inequality \(\tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})\geq1\) since optimality is always achieved with equality anyway.

Worst-Case Sharpe Ratio Portfolio Optimization Formulation

The worst-case robut Sharpe ratio problem can be formulated as \[\begin{array}{ll} \underset{\tilde{\mathbf{w}}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}\\ \textsf{subject to} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})\geq1\\ & \mathbf{1}^{T}\tilde{\mathbf{w}}\geq0\\ & \tilde{\mathbf{w}}\geq\mathbf{0}. \end{array}\]

Now we can use our favorite uncertainty sets \(\mathcal{U}_{\boldsymbol{\mu}}\) and \(\mathcal{U}_{\boldsymbol{\Sigma}}\) and proceed as before. Easy!

Summary

Naive optimization: optimization problems formulated assuming that the parameters are perfectly known when they are not.
👉 the naive solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\) may totally differ from the desired one \(\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\) (or not, depends on the type of problem)

Optimization under uncertainty of parameters:

stochastic optimization (SO): models the parameters statistically and uses expectations and probabilities
- requires modeling the probability distribution function
- expectations only satisfy constraints on average, not for every instance
- chance constraints are very hard to manipulate
robust optimization (RO): assumes the true parameter is inside an uncertainty region centered around the estimation
- the shape of the uncertainty region has to be chosen appropriately for the problem at hand
- the size of the uncertainty region has to be carefully chosen or the solution may be too conservative to the point of being meaningless
- usually easy to manipulate.

Thanks

For more information visit: https://www.danielppalomar.com

References

Chopra, V., & Ziemba, W. (1993). The effect of errors in means, variances and covariances on optimal portfolio choice. Journal of Portfolio Management.

Cornuejols, G., & Tütüncü, R. (2006). Optimization methods in finance. Cambridge University Press.

Fabozzi, F. J. (2007). Robust portfolio optimization and management. Wiley.

Lobo, M. S., & Boyd, S. (2000). The worst-case risk of a portfolio.

Markowitz, H. (1952). Portfolio selection. J. Financ., 7(1), 77–91.