Robust Optimization
Markowitz Portfolio Optimization
Robust Maximum Return Portfolio Optimization
Robust Minimum Variance Portfolio Optimization
Robust Markowitz Portfolio Optimization
Robust Sharpe Ratio Portfolio Optimization
Summary
Robust Optimization
Markowitz Portfolio Optimization
Robust Maximum Return Portfolio Optimization
Robust Minimum Variance Portfolio Optimization
Robust Markowitz Portfolio Optimization
Robust Sharpe Ratio Portfolio Optimization
Summary
A convex optimization problem is written as \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & f_{0}\left(\mathbf{x}\right)\\ \textsf{subject to} & f_{i}\left(\mathbf{x}\right)\leq0\qquad i=1,\ldots,m\\ & \mathbf{h}\left(\mathbf{x}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}=0 \end{array} \] where \(f_{0},\,f_{1},\ldots,f_{m}\) are convex and equality constraints are affine.
Convex problems enjoy a rich theory (KKT conditions, zero duality gap, etc.) as well as a large number of efficient numerical algorithms guaranteed to deliver an optimal solution \(\mathbf{x}^{\star}\).
Many off-the-shelf solvers exist in all the programming languages (e.g., R, Python, Matlab, Julia, C, etc.), tailored to specific classes of problems, namely, LP, QP, QCQP, SOCP, SDP, GP, etc.
However, the obtained optimal solution \(\mathbf{x}^{\star}\) typically performs very poorly in practice.
In many cases, it can be totally useless!
Why is that?
Recall that a problem formulation contains not only the optimization variables \(\mathbf{x}\) but also the parameters \(\boldsymbol{\theta}\).
Such parameters define the problem instance and are typically estimated in practice, i.e., they are not exact: \(\hat{\boldsymbol{\theta}}\neq\boldsymbol{\theta}\) but hopefully close \(\hat{\boldsymbol{\theta}}\simeq\boldsymbol{\theta}\).
The question is whether a small error in the parameters is going to be detrimental or can be ignored. That depends on each particular type of problem.
In the case of portfolio optimization, small errors in the parameters \(\boldsymbol{\theta}=\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right)\) happen to have a huge effect in the solution \(\mathbf{x}^{\star}\). To the point that most practitioners avoid the use of portfolio optimization!
For example, consider an LP: \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & \mathbf{c}^{T}\mathbf{x}+\mathbf{d}\\ \textsf{subject to} & \mathbf{A}\mathbf{x}=\mathbf{b}. \end{array} \]
The parameters are \(\boldsymbol{\theta}=\left(\mathbf{A},\mathbf{b},\mathbf{c},\mathbf{d}\right)\).
The objective function is \(f_{0}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{c}^{T}\mathbf{x}+\mathbf{d}\)
The constraint function is \(\mathbf{h}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}\)
The naive approach is to pretend that \(\hat{\boldsymbol{\theta}}\) is close enough to \(\boldsymbol{\theta}\) and solve the approximated problem, obtaining \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).
For some type of problems, it may be that \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\approx\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\) and that’s it.
For many other problems, however, that’s not the case. So we cannot really rely on the naive solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).
The solution is to consider instead a robust formulation that takes into account the fact that we know we only have an estimation of the parameters.
There are several ways to make the problem robust to parameters errors, mainly:
In stochastic robust optimization, one models the estimation \(\hat{\boldsymbol{\theta}}\) as a random variable that fluctuates around its true value \(\boldsymbol{\theta}\).
Then, instead of considering the approximated function \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\), it uses its expected value \(E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})]\), where \(E_{\boldsymbol{\theta}}[\cdot]\) denotes expectation over the random variable \(\boldsymbol{\theta}\).
The random variable is typically modeled around the estimated value as \(\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) following a zero-mean distribution such as Gaussian.
For example, if the function is quadratic, say, \(f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\), and we model the parameter as \(\mathbf{c}=\hat{\mathbf{c}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) zero-mean and covariance matrix \(\mathbf{Q}\), then the expected value is \[\begin{aligned} E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})] &= E_{\boldsymbol{\delta}}[((\hat{\mathbf{c}}+\boldsymbol{\delta})^{T}\mathbf{x})^{2}]\\ &= E_{\boldsymbol{\delta}}[\mathbf{x}^{T}\hat{\mathbf{c}}\hat{\mathbf{c}}^{T}\mathbf{x}+\mathbf{x}^{T}\boldsymbol{\delta}\boldsymbol{\delta}^{T}\mathbf{x}]\\ &= (\hat{\mathbf{c}}^{T}\mathbf{x})^{2}+\mathbf{x}^{T}\mathbf{Q}\mathbf{x} \end{aligned}\] where the additional term \(\mathbf{x}^{T}\mathbf{Q}\mathbf{x}\) serves as a regularizer.
Take the previous quadratic function \[f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\] and consider a sphere uncertainty region \[\mathcal{U}=\{\mathbf{c}|\parallel\mathbf{c}-\hat{\mathbf{c}}\parallel_{2})\leq\delta\}.\]
Take the previous quadratic function \[f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\] and consider a sphere uncertainty region \[\mathcal{U}=\{\mathbf{c}|\parallel\mathbf{c}-\hat{\mathbf{c}}\parallel_{2})\leq\delta\}.\]
The problem with expectations is that only the average behavior is concerned and nothing is under control about the realizations worse than the average. For example, on average some constraint will be satisfied but it will be violated for many realizations.
The problem with worst-case programming is that it is too conservative as one deals with the worst possible case.
Chance programming tries to find a compromise. In particular, it also models the estimation errors statistically but instead of focusing on the average it guarantees a performance for, say, 95% of the cases.
The naive constraint \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\leq0\) is replaced with \(\mathsf{Pr}_{\boldsymbol{\theta}}\left[f(\mathbf{x};\boldsymbol{\theta})\leq0\right]\geq1-\epsilon=0.95\) with \(\epsilon=0.05\).
Chance or probabilistic constraints are generally very hard to deal with and one typically has to resort to approximations.
The idea of the Markowitz mean-variance portfolio (MVP) (Markowitz 1952) is to find a trade-off between the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the risk of the portfolio measured by the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1 \end{array}\] where \(\mathbf{w}^{T}\mathbf{1}=1\) is the capital budget constraint and \(\lambda\) is a parameter that controls how risk-averse the investor is.
This is a convex quadratic problem (QP) with only one linear constraint which admits a closed-form solution: \[\mathbf{w}_{\sf MVP} = \frac{1}{2\lambda}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{\mu}+\nu\mathbf{1}\right),\] where \(\nu\) is the optimal dual variable \(\nu=\frac{2\lambda-\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\).
Capital budget constraint: \[\mathbf{1}^T\mathbf{w} = 1.\]
Long-only constraint: \[\mathbf{w} \geq 0.\]
Dollar-neutral or self-financing constraint: \[\mathbf{1}^T\mathbf{w} = 0.\]
Holding constraint: \[\mathbf{l}\leq\mathbf{w}\leq \mathbf{u}\] where \(\mathbf{l}\in\mathbb{R}^{N}\) and \(\mathbf{u}\in\mathbb{R}^{N}\) are lower and upper bounds of the asset positions, respectively.
Leverage constraint: \[\left\Vert \mathbf{w}\right\Vert _{1}\leq L.\]
Cardinality constraint: \[\left\Vert \mathbf{w}\right\Vert _{0} \leq K.\]
Turnover constraint: \[\left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1} \leq u\] where \(\mathbf{w}_{0}\) is the currently held portfolio.
Markowitz’s portfolio has never been fully embraced by practitioners, among other reasons because
variance is not a good measure of risk in practice since it penalizes both the unwanted high losses and the desired low losses: the solution is to use alternative measures for risk, e.g., VaR and CVaR,
it is highly sensitive to parameter estimation errors (of covariance matrix \(\boldsymbol{\Sigma}\) and especially mean vector \(\boldsymbol{\mu}\)): solution is robust optimization (Fabozzi 2007) and improved parameter estimation,
it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis): solution is the risk parity portfolio.
👉 We will here consider robust portfolio optimization against estimation errors of the parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).
Efficient frontier: mean-variance trade-off curve (Pareto curve) but it is not achieved in practice due to parameter estimation errors:
We will load some stock market data and divide it into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation).
In particular, we will start by loading some stock data from three different sectors:
library(xts) # to manipulate time series of stock data library(quantmod) # to download stock data library(PerformanceAnalytics) # to compute performance measures # download data from YahooFinance stock_namelist <- c("AAPL", "AMD", "ADI", "ABBV", "AEZS", "A", "APD", "AA","CF") prices <- xts() for (i in 1:length(stock_namelist)) { tmp <- Ad(getSymbols(stock_namelist[i], from = "2013-01-01", to = "2018-12-31", auto.assign = FALSE)) tmp <- na.approx(tmp, na.rm = FALSE) # interpolate NAs prices <- cbind(prices, tmp) } colnames(prices) <- stock_namelist indexClass(prices) <- "Date"
str(prices)
R>> An 'xts' object on 2013-01-02/2016-12-30 containing: R>> Data: num [1:1008, 1:9] 55.5 54.8 53.2 52.9 53.1 ... R>> - attr(*, "dimnames")=List of 2 R>> ..$ : NULL R>> ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ... R>> Indexed by objects of class: [Date] TZ: UTC R>> xts Attributes: R>> NULL
head(prices)
R>> AAPL AMD ADI ABBV AEZS A APD AA CF R>> 2013-01-02 55.47129 2.53 37.50844 27.60817 253 27.97831 66.43760 20.62187 29.44429 R>> 2013-01-03 54.77112 2.49 36.90318 27.38020 254 28.07852 66.20545 20.80537 29.30506 R>> 2013-01-04 53.24548 2.59 36.24679 27.03431 257 28.63300 67.09529 21.24121 29.96146 R>> 2013-01-07 52.93227 2.67 36.35761 27.08934 259 28.42591 67.03341 20.87419 29.84922 R>> 2013-01-08 53.07474 2.67 35.98254 26.49976 255 28.19877 67.15720 20.87419 29.41162 R>> 2013-01-09 52.24524 2.63 35.88875 26.64913 258 28.96036 68.06254 20.82831 30.44025
tail(prices)
R>> AAPL AMD ADI ABBV AEZS A APD AA CF R>> 2016-12-22 111.8445 11.60 70.07727 55.82264 4.10 44.91877 136.0360 29.75 27.82244 R>> 2016-12-23 112.0657 11.58 70.44936 56.43826 4.10 45.14350 136.3992 29.71 28.37686 R>> 2016-12-27 112.7774 12.07 70.89778 56.58311 4.05 45.44642 137.2373 29.65 29.52265 R>> 2016-12-28 112.2965 11.55 70.18222 56.37488 3.55 44.67448 135.1328 29.43 29.22696 R>> 2016-12-29 112.2676 11.59 70.20129 56.79134 3.60 44.72544 135.2359 28.89 29.47645 R>> 2016-12-30 111.3924 11.34 69.28539 56.69175 3.60 44.64704 134.7206 28.08 29.08836
# compute data matrix with log-returns X <- diff(log(prices))[-1] N <- ncol(X) # number of stocks T <- nrow(X) # number of days
We can take a look at the prices of the stocks:
plot(prices/rep(prices[1, ], each = nrow(prices)), col = rainbow10equal, legend.loc = "topleft", main = "Normalized prices")
We can now use obtain the sample estimates from the returns \(\mathbf{x}_t\) (i.e., sample means and sample covariance matrix) as \[ \begin{align} \hat{\boldsymbol{\mu}} & = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t\\ \hat{\boldsymbol{\Sigma}} & = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})(\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T \end{align} \]
mu <- colMeans(X) Sigma <- cov(X)
Now that we have some realistic parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\), we will design a portfolio with some small noise in the parameters to observe the effect.
Let’s start with the global maximum return portfolio (GMRP) formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] Of course the optimal solution is to allocate all the budget on the single stock with largest expected return, not very diversified. We can still implement it using a solver:
library(CVXR) portfolioMaxReturn <- function(mu) { w <- Variable(length(mu)) prob <- Problem(Maximize(t(w) %*% mu), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } w_GMRP <- portfolioMaxReturn(mu) names(w_GMRP) <- colnames(X) w_all_GMRP <- cbind(w_GMRP) # plot allocation barplot(t(w_all_GMRP), col = rainbow8equal[1], legend = colnames(w_all_GMRP), beside = TRUE, main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
As expected, all the budget is allocated to a single stock.
Let’s now introduce some estimation error in \(\boldsymbol{\mu}\) to see the effect:
library(mvtnorm) # generate a few noisy estimations of mu set.seed(357) for (i in 1:6) { mu_noisy <- as.vector(rmvnorm(n = 1, mean = mu, sigma = (1/T)*Sigma)) w_GMRP_noisy <- portfolioMaxReturn(mu_noisy) w_all_GMRP <- cbind(w_all_GMRP, w_GMRP_noisy) } # plot to compare the allocations barplot(t(w_all_GMRP), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP), beside = TRUE, main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
We can see that the optimal allocation changes totally from realization to realization, which is highly undesirable (in fact, it is totally unacceptable).
Let’s continue with the GMVP formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]
portfolioGMVP <- function(Sigma) { w <- Variable(nrow(Sigma)) prob <- Problem(Minimize(quad_form(w, Sigma)), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } w_GMVP <- portfolioGMVP(Sigma) names(w_GMVP) <- colnames(X) w_all_GMVP <- cbind(w_GMVP) # plot allocation barplot(t(w_all_GMVP), col = rainbow8equal[1], legend = colnames(w_all_GMVP), beside = TRUE, main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")
The GMVP is totally different from the maximum return portfolio. It’s quite the opposite: the budget is allocated to most of the stocks to diversity the risk.
Let’s now introduce some estimation error in \(\boldsymbol{\Sigma}\) to see the effect:
library(PerformanceAnalytics) # generate a few noisy estimations of Sigma set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = rep(0, N), sigma = Sigma) Sigma_noisy <- cov(X_noisy) w_GMVP_noisy <- portfolioGMVP(Sigma_noisy) w_all_GMVP <- cbind(w_all_GMVP, w_GMVP_noisy) } # plot to compare the allocations barplot(t(w_all_GMVP), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP), beside = TRUE, main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMVP), main = "GMVP allocation", ylab = "w", space = 0, border = NA)
We can see that the optimal allocation still changes from realization to realization, although in this case is not that extreme.
Finally, let’s take a look at Markowitz’s mean-variance portfolio formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w} -\lambda\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]
portfolioMarkowitz <- function(mu, Sigma, lmd = 0.5) { w <- Variable(nrow(Sigma)) prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } w_Markowitz <- portfolioMarkowitz(mu, Sigma) names(w_Markowitz) <- colnames(X) w_all_Markowitz <- cbind(w_Markowitz) # plot allocation barplot(t(w_all_Markowitz), col = rainbow8equal[1], legend = colnames(w_all_Markowitz), beside = TRUE, main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")
The mean-variance portfolio seems to be not very diversified. This is one of the reasons why practitioners do not use it. The other reason is the sensitivity or lack of robustness to the estimation of the parameters as we will explore next.
# generate a few noisy estimations of mu and Sigma set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_Markowitz_noisy <- portfolioMarkowitz(mu_noisy, Sigma_noisy) w_all_Markowitz <- cbind(w_all_Markowitz, w_Markowitz_noisy) } # plot to compare the allocations barplot(t(w_all_Markowitz), col = rainbow8equal[1:7], legend = colnames(w_all_Markowitz), beside = TRUE, main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_Markowitz), main = "Markowitz portfolio allocation", ylab = "w", space = 0, border = NA)
Again, the allocations are totally different from realization to realization. Totally unacceptable. Who is going to trust these allocations to invest their money?
The portfolio that maximizes the return (while ignoring the variance) is the linear program (LP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
In practice, however, \(\boldsymbol{\mu}\) is unknown and has to be estimated \(\hat{\boldsymbol{\mu}}\), e.g., with the sample mean \[\hat{\boldsymbol{\mu}}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}_{t},\] where \(\mathbf{x}_{t}\) is the return at day \(t\).
Instead of assuming that \(\boldsymbol{\mu}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\mu}}\).
The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
We assume the expected returns are only known within an ellipsoid: \[\mathcal{U}_{\boldsymbol{\mu}} = \left\{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\mid\left\Vert \mathbf{u}\right\Vert _{2}\leq1\right\}\] where one can use the estimated covariance matrix to shape the uncertainty ellipsoid, i.e., \(\mathbf{S}=\hat{\boldsymbol{\Sigma}}\).
We can solve easily the inner minimization: \[\begin{array}{ll} \underset{\boldsymbol{\mu},\mathbf{u}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u},\\ & \left\Vert \mathbf{u}\right\Vert _{2}\leq1. \end{array}\]
It’s easy to find the minimum value using Cauchy-Schwartz’s inequality: \[\begin{aligned} \mathbf{w}^{T}\boldsymbol{\mu} &= \mathbf{w}^{T}\left(\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\right)\\ &= \mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\kappa\mathbf{w}^{T}\mathbf{S}^{1/2}\mathbf{u}\\ &\geq \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2} \end{aligned}\] with equality achieved with \(\mathbf{u}=-\frac{\mathbf{S}^{1/2}\mathbf{w}}{\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}}\).
Finally, the robust formulation becomes the SOCP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
Recall the vanilla problem formulation was the LP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
So, we have gone from an LP to an SOCP.
In general, when a problem is robustified, the complexity of the problem increases. For example:
The robust formulation of the global maximum return portfolio (GMRP) reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min} \mathbf{w}^T\boldsymbol{\mu}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For an elliptical uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^e = \{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\ | \ \left\Vert \mathbf{u}\right\Vert_{2}\leq 1\}, \] where typically \(\mathbf{S}=\hat{\mathbf{\Sigma}}\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a box uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^b = \{\boldsymbol{\mu}\ | -\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\}, \] where typically \(\delta_i = \kappa\hat{\sigma}_i\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - |\mathbf{w}|^T\boldsymbol{\delta}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
Let’s start implementing the robust solution for the box constraint:
portfolioMaxReturnRobustBox <- function(mu_hat, delta) { w <- Variable(length(mu_hat)) prob <- Problem(Maximize( t(w) %*% mu_hat - t(abs(w)) %*% delta ), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMRP <- portfolioMaxReturn(mu) names(w_GMRP) <- colnames(X) w_all_GMRP_robust_box <- cbind(w_GMRP) # multiple robust solutions kappa <- 0.1 delta <- kappa*sqrt(diag(Sigma_noisy)) set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_GMRP_robust_box_noisy <- portfolioMaxReturnRobustBox(mu_noisy, delta) w_all_GMRP_robust_box <- cbind(w_all_GMRP_robust_box, w_GMRP_robust_box_noisy) } # plot to compare the allocations barplot(t(w_all_GMRP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_box), beside = TRUE, args.legend = list(bg = "white"), main = "Robust (box) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
Not much improvement with respect to the naive case. The sensitivity is still there. In fact, for the particular case of \(\boldsymbol{\delta}=\mathbf{1}\), it follows that \(|\mathbf{w}|^T\boldsymbol{\delta}=\mathbf{w}^T\mathbf{1}=1\)!
Let’s try the robust solution for the elliptical constraint:
portfolioMaxReturnRobustEllipsoid <- function(mu_hat, S, kappa = 0.1) { S12 <- chol(S) # t(S12) %*% S12 = Sigma w <- Variable(length(mu_hat)) prob <- Problem(Maximize( t(w) %*% mu_hat - kappa*norm2(S12 %*% w) ), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMRP <- portfolioMaxReturn(mu) names(w_GMRP) <- colnames(X) w_all_GMRP_robust_ellipsoid <- cbind(w_GMRP) # multiple robust solutions kappa <- 0.2 set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_GMRP_robust_ellipsoid_noisy <- portfolioMaxReturnRobustEllipsoid(mu_noisy, Sigma_noisy, kappa) w_all_GMRP_robust_ellipsoid <- cbind(w_all_GMRP_robust_ellipsoid, w_GMRP_robust_ellipsoid_noisy) } # plot to compare the allocations barplot(t(w_all_GMRP_robust_ellipsoid), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_ellipsoid), beside = TRUE, args.legend = list(bg = "white"), main = "Robust (ellipsoid) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMRP_robust_ellipsoid), main = "Robust (ellipsoid) global maximum return portfolio allocation", ylab = "w", space = 0, border = NA)
The elliptical uncertainty set seems to work better for the maximum return portfolio. It definitely provides more diversity compared to the naive solution, as well as being more stable (less sensitive). However, it is still too sensitive.
Summarizing, for the robust global maximum return portfolio, it seems that the ellipsoid uncertainty in \(\boldsymbol{\mu}\) is more stable (and diverse) than the box uncertainly (which is totally unstable). Nevertheless, the allocation is still too sensitive owing to the fact that the global maximum return portfolio is probably not a good formulation as it ignores the covariance matrix.
The global minimum variance portfolio (GMVP) ignores the expected return and focuses on the risk only: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
It is a simple convex QP with solution \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}.\]
It is widely used in academic papers for simplicity of evaluation and comparison of different estimators of the covariance matrix \(\boldsymbol{\Sigma}\) (while ignoring the estimation of \(\boldsymbol{\mu}\)).
In practice, \(\boldsymbol{\Sigma}\) is unknown and has to be estimated \(\hat{\boldsymbol{\Sigma}}\), e.g., with the sample covariance matrix.
Then, the naive portfolio becomes \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}.\]
Instead of assuming that \(\boldsymbol{\Sigma}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\Sigma}}\).
The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
In particular, we will assume that the estimation comes from the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\) where \(\mathbf{X}\) is a \(T\times N\) matrix containing the return data (assumed demeaned already).
However, we will assume that the data matrix is noisy \(\hat{\mathbf{X}}\) and the actual matrix can be written as \(\mathbf{X}=\hat{\mathbf{X}}+\boldsymbol{\Delta}\), where \(\boldsymbol{\Delta}\) is some error matrix bounded in its norm.
Thus, we will then model the data matrix as \[\mathcal{U}_{\mathbf{X}} =\left\{\mathbf{X}\mid\left\Vert\mathbf{X}-\hat{\mathbf{X}}\right\Vert_{F}\leq\delta_{\mathbf{X}}\right\}.\]
The worst-case robust formulation becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\mathbf{w}^{T}\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
Let’s focus on the inner maximization: \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\underset{\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\delta_{\mathbf{X}}}{\max}\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}^{2}\]
We first invoke the triangle inequality to get an upper bound: \[\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}\leq\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\] with equality achieved when the two vectors \(\hat{\mathbf{X}}\mathbf{w}\) and \(\boldsymbol{\Delta}\mathbf{w}\) are aligned.
Next, we invoke the norm inequality \[\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\leq\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\left\Vert \mathbf{w}\right\Vert _{2}\leq\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\] with equality achieved when \(\boldsymbol{\Delta}\) is rank-one with right singular vector aligned with \(\mathbf{w}\) and when \(\left\Vert \boldsymbol{\Delta}\right\Vert _{F}=\delta_{\mathbf{X}}\). (This follows easily from \(\mathbf{w}^{T}\mathbf{M}\mathbf{w}\leq\lambda_{\max}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\leq\mathsf{Tr}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\) for \(\mathbf{M}\succeq\mathbf{0}\).)
Finally, we can see that both upper bounds can be actually achieved if the error is properly chosen as \[\boldsymbol{\Delta}=\delta_{\mathbf{X}}\frac{\hat{\mathbf{X}}\mathbf{w}\mathbf{w}^{T}}{\left\Vert \mathbf{w}\right\Vert _{2}\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}}.\]
Thus, \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\left(\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\right)^{2}.\]
The robust problem formulation finally becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is a (convex) SOCP.
Recall the vanilla problem formulation was the QP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\]
Now, the robust problem formulation is the SOCP (from QP to SOCP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which contains the regularization term \(\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\).
One common heuristic, called Tikhonov regularization, is to consider instead \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}^{2}=\mathbf{w}^{T}\left(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I}\right)\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is equivalent to the vanilla formulation but using the regularized sample covariance matrix \(\hat{\boldsymbol{\Sigma}}^{\mathsf{tik}}=\frac{1}{T}(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I})=\hat{\boldsymbol{\Sigma}}+\frac{\delta_{\mathbf{X}}}{T}\mathbf{I}\).
The robust formulation of the GMVP reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max} \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a spherical uncertainty set on the data matrix \(\mathbf{X}\) (assuming we use the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T-1}\mathbf{X}^T\mathbf{X}\)) \[ \mathcal{U}_\mathbf{X}^s = \{\mathbf{X}\ | \ \|\mathbf{X}-\hat{\mathbf{X}}\|_F \le \delta\}, \] the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \|\hat{\mathbf{X}}\mathbf{w}\|_2 + \delta \|\mathbf{w}\|_2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a box uncertainty set on the covariance matrix \[ \mathcal{U}_\boldsymbol{\Sigma}^b = \{\boldsymbol{\Sigma}\ | \ \underline{\boldsymbol{\Sigma}} \le \boldsymbol{\Sigma} \le \overline{\boldsymbol{\Sigma}},\ \boldsymbol{\Sigma} \succeq \mathbf{0}\}, \] where a choice for the bounds is \(\underline{\boldsymbol{\Sigma}}=\delta^{-1}\hat{\boldsymbol{\Sigma}}\) and \(\overline{\boldsymbol{\Sigma}}=\delta\hat{\boldsymbol{\Sigma}}\) with \(\delta>1\), the robust formulation becomes
\[ \begin{array}{ll} \underset{\mathbf{w},\underline{\boldsymbol{\Lambda}},\overline{\boldsymbol{\Lambda}}}{\textsf{minimize}} & \textsf{Tr}(\overline{\boldsymbol{\Lambda}}\ \overline{\boldsymbol{\Sigma}}) - \textsf{Tr}(\underline{\boldsymbol{\Lambda}}\ \underline{\boldsymbol{\Sigma}})\\ \textsf{subject to} & \left[\begin{array}{cc}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1\end{array} \right]\succeq\mathbf{0}\\ & \overline{\boldsymbol{\Lambda}}\ge\mathbf{0},\ \underline{\boldsymbol{\Lambda}}\ge\mathbf{0}\\ & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
Let’s start implementing the robust formulation for the box constraint on \(\boldsymbol{\Sigma}\):
# define function (check: https://cvxr.rbind.io/cvxr_functions/) portfolioGMVPRobustBox <- function(Sigma_lb, Sigma_ub) { N <- nrow(Sigma_lb) w <- Variable(N) Lambda_ub <- Variable(N, N) Lambda_lb <- Variable(N, N) Tmp <- Variable(N+1, N+1, PSD = TRUE) prob <- Problem(Minimize(matrix_trace(Lambda_ub %*% Sigma_ub) - matrix_trace(Lambda_lb %*% Sigma_lb)), constraints = list(w >= 0, sum(w) == 1, Lambda_ub >= 0, Lambda_lb >= 0, bmat(list(list(Lambda_ub - Lambda_lb, w), list(t(w), 1))) == Tmp)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMVP <- portfolioGMVP(Sigma) names(w_GMVP) <- colnames(X) w_all_GMVP_robust_box <- cbind(w_GMVP) # multiple robust solutions delta <- 1.1 set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) Sigma_lb <- ifelse(Sigma_noisy >= 0, (1/delta)*Sigma_noisy, delta*Sigma_noisy) Sigma_ub <- ifelse(Sigma_noisy >= 0, delta*Sigma_noisy, (1/delta)*Sigma_noisy) w_GMVP_robust_box_noisy <- portfolioGMVPRobustBox(Sigma_lb, Sigma_ub) w_all_GMVP_robust_box <- cbind(w_all_GMVP_robust_box, w_GMVP_robust_box_noisy) } # plot to compare the allocations barplot(t(w_all_GMVP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_box), beside = TRUE, args.legend = list(bg = "white", x = "topleft"), main = "Robust (box) GMVP allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)
chart.StackedBar(t(w_all_GMVP_robust_box), main = "Robust (box) GMVP allocation", ylab = "w", space = 0, border = NA)
The naive GMVP was already kind of stable. Nevertheless, the robust GMVP becomes even more stable (less sensitive to the estimation errors). Better results could be obtained by choosing more appropriate upper and lower bounds on \(\boldsymbol{\Sigma}\).
Let’s consider now the robust formulation for the spherical uncertainty region on the data matrix \(\mathbf{X}\):
# define function portfolioGMVPRobustSphereX <- function(X_hat, delta) { N <- ncol(X_hat) w <- Variable(N) X_ <- scale(X_hat, center = TRUE, scale = FALSE) # demean prob <- Problem(Minimize(norm2(X_ %*% w) + delta*norm2(w)), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMVP <- portfolioGMVP(Sigma) names(w_GMVP) <- colnames(X) w_all_GMVP_robust_sphereX <- cbind(w_GMVP) # multiple robust solutions delta <- 0.1 set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) w_GMVP_robust_sphereX_noisy <- portfolioGMVPRobustSphereX(X_noisy, delta) w_all_GMVP_robust_sphereX <- cbind(w_all_GMVP_robust_sphereX, w_GMVP_robust_sphereX_noisy) } # plot to compare the allocations barplot(t(w_all_GMVP_robust_sphereX), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_sphereX), beside = TRUE, args.legend = list(bg = "white", x = "topleft"), main = "Robust (sphere in X) GMVP allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)
chart.StackedBar(t(w_all_GMVP_robust_sphereX), main = "Robust (sphere in X) GMVP allocation", ylab = "w", space = 0, border = NA)