Robust Optimization
Markowitz Portfolio Optimization
Robust Maximum Return Portfolio Optimization
Robust Minimum Variance Portfolio Optimization
Robust Markowitz Portfolio Optimization
Robust Sharpe Ratio Portfolio Optimization
Summary
Robust Optimization
Markowitz Portfolio Optimization
Robust Maximum Return Portfolio Optimization
Robust Minimum Variance Portfolio Optimization
Robust Markowitz Portfolio Optimization
Robust Sharpe Ratio Portfolio Optimization
Summary
A convex optimization problem is written as \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & f_{0}\left(\mathbf{x}\right)\\ \textsf{subject to} & f_{i}\left(\mathbf{x}\right)\leq0\qquad i=1,\ldots,m\\ & \mathbf{h}\left(\mathbf{x}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}=0 \end{array} \] where \(f_{0},\,f_{1},\ldots,f_{m}\) are convex and equality constraints are affine.
Convex problems enjoy a rich theory (KKT conditions, zero duality gap, etc.) as well as a large number of efficient numerical algorithms guaranteed to deliver an optimal solution \(\mathbf{x}^{\star}\).
Many off-the-shelf solvers exist in all the programming languages (e.g., R, Python, Matlab, Julia, C, etc.), tailored to specific classes of problems, namely, LP, QP, QCQP, SOCP, SDP, GP, etc.
However, the obtained optimal solution \(\mathbf{x}^{\star}\) typically performs very poorly in practice.
In many cases, it can be totally useless!
Why is that?
Recall that a problem formulation contains not only the optimization variables \(\mathbf{x}\) but also the parameters \(\boldsymbol{\theta}\).
Such parameters define the problem instance and are typically estimated in practice, i.e., they are not exact: \(\hat{\boldsymbol{\theta}}\neq\boldsymbol{\theta}\) but hopefully close \(\hat{\boldsymbol{\theta}}\simeq\boldsymbol{\theta}\).
The question is whether a small error in the parameters is going to be detrimental or can be ignored. That depends on each particular type of problem.
In the case of portfolio optimization, small errors in the parameters \(\boldsymbol{\theta}=\left(\boldsymbol{\mu},\boldsymbol{\Sigma}\right)\) happen to have a huge effect in the solution \(\mathbf{x}^{\star}\). To the point that most practitioners avoid the use of portfolio optimization!
For example, consider an LP: \[ \begin{array}{ll} \underset{\mathbf{x}}{\textsf{minimize}} & \mathbf{c}^{T}\mathbf{x}+\mathbf{d}\\ \textsf{subject to} & \mathbf{A}\mathbf{x}=\mathbf{b}. \end{array} \]
The parameters are \(\boldsymbol{\theta}=\left(\mathbf{A},\mathbf{b},\mathbf{c},\mathbf{d}\right)\).
The objective function is \(f_{0}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{c}^{T}\mathbf{x}+\mathbf{d}\)
The constraint function is \(\mathbf{h}\left(\mathbf{x};\boldsymbol{\theta}\right)=\mathbf{A}\mathbf{x}-\mathbf{b}\)
The naive approach is to pretend that \(\hat{\boldsymbol{\theta}}\) is close enough to \(\boldsymbol{\theta}\) and solve the approximated problem, obtaining \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).
For some type of problems, it may be that \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\approx\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\) and that’s it.
For many other problems, however, that’s not the case. So we cannot really rely on the naive solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\).
The solution is to consider instead a robust formulation that takes into account the fact that we know we only have an estimation of the parameters.
There are several ways to make the problem robust to parameters errors, mainly:
In stochastic robust optimization, one models the estimation \(\hat{\boldsymbol{\theta}}\) as a random variable that fluctuates around its true value \(\boldsymbol{\theta}\).
Then, instead of considering the approximated function \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\), it uses its expected value \(E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})]\), where \(E_{\boldsymbol{\theta}}[\cdot]\) denotes expectation over the random variable \(\boldsymbol{\theta}\).
The random variable is typically modeled around the estimated value as \(\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) following a zero-mean distribution such as Gaussian.
For example, if the function is quadratic, say, \(f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\), and we model the parameter as \(\mathbf{c}=\hat{\mathbf{c}}+\boldsymbol{\delta}\) with \(\boldsymbol{\delta}\) zero-mean and covariance matrix \(\mathbf{Q}\), then the expected value is \[\begin{aligned} E_{\boldsymbol{\theta}}[f(\mathbf{x};\boldsymbol{\theta})] &= E_{\boldsymbol{\delta}}[((\hat{\mathbf{c}}+\boldsymbol{\delta})^{T}\mathbf{x})^{2}]\\ &= E_{\boldsymbol{\delta}}[\mathbf{x}^{T}\hat{\mathbf{c}}\hat{\mathbf{c}}^{T}\mathbf{x}+\mathbf{x}^{T}\boldsymbol{\delta}\boldsymbol{\delta}^{T}\mathbf{x}]\\ &= (\hat{\mathbf{c}}^{T}\mathbf{x})^{2}+\mathbf{x}^{T}\mathbf{Q}\mathbf{x} \end{aligned}\] where the additional term \(\mathbf{x}^{T}\mathbf{Q}\mathbf{x}\) serves as a regularizer.
Take the previous quadratic function \[f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\] and consider a sphere uncertainty region \[\mathcal{U}=\{\mathbf{c}|\parallel\mathbf{c}-\hat{\mathbf{c}}\parallel_{2})\leq\delta\}.\]
Take the previous quadratic function \[f(\mathbf{x};\hat{\boldsymbol{\theta}})=(\hat{\mathbf{c}}^{T}\mathbf{x})^{2}\] and consider a sphere uncertainty region \[\mathcal{U}=\{\mathbf{c}|\parallel\mathbf{c}-\hat{\mathbf{c}}\parallel_{2})\leq\delta\}.\]
The problem with expectations is that only the average behavior is concerned and nothing is under control about the realizations worse than the average. For example, on average some constraint will be satisfied but it will be violated for many realizations.
The problem with worst-case programming is that it is too conservative as one deals with the worst possible case.
Chance programming tries to find a compromise. In particular, it also models the estimation errors statistically but instead of focusing on the average it guarantees a performance for, say, 95% of the cases.
The naive constraint \(f(\mathbf{x};\hat{\boldsymbol{\theta}})\leq0\) is replaced with \(\mathsf{Pr}_{\boldsymbol{\theta}}\left[f(\mathbf{x};\boldsymbol{\theta})\leq0\right]\geq1-\epsilon=0.95\) with \(\epsilon=0.05\).
Chance or probabilistic constraints are generally very hard to deal with and one typically has to resort to approximations.
The idea of the Markowitz mean-variance portfolio (MVP) (Markowitz 1952) is to find a trade-off between the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the risk of the portfolio measured by the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1 \end{array}\] where \(\mathbf{w}^{T}\mathbf{1}=1\) is the capital budget constraint and \(\lambda\) is a parameter that controls how risk-averse the investor is.
This is a convex quadratic problem (QP) with only one linear constraint which admits a closed-form solution: \[\mathbf{w}_{\sf MVP} = \frac{1}{2\lambda}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{\mu}+\nu\mathbf{1}\right),\] where \(\nu\) is the optimal dual variable \(\nu=\frac{2\lambda-\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\).
Capital budget constraint: \[\mathbf{1}^T\mathbf{w} = 1.\]
Long-only constraint: \[\mathbf{w} \geq 0.\]
Dollar-neutral or self-financing constraint: \[\mathbf{1}^T\mathbf{w} = 0.\]
Holding constraint: \[\mathbf{l}\leq\mathbf{w}\leq \mathbf{u}\] where \(\mathbf{l}\in\mathbb{R}^{N}\) and \(\mathbf{u}\in\mathbb{R}^{N}\) are lower and upper bounds of the asset positions, respectively.
Leverage constraint: \[\left\Vert \mathbf{w}\right\Vert _{1}\leq L.\]
Cardinality constraint: \[\left\Vert \mathbf{w}\right\Vert _{0} \leq K.\]
Turnover constraint: \[\left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1} \leq u\] where \(\mathbf{w}_{0}\) is the currently held portfolio.
Markowitz’s portfolio has never been fully embraced by practitioners, among other reasons (Zhao et al. 2019) because
variance is not a good measure of risk in practice since it penalizes both the unwanted high losses and the desired low losses: the solution is to use alternative measures for risk, e.g., VaR and CVaR,
it is highly sensitive to parameter estimation errors (of covariance matrix \(\boldsymbol{\Sigma}\) and especially mean vector \(\boldsymbol{\mu}\)): solution is robust optimization (Fabozzi 2007) and improved parameter estimation,
it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis): solution is the risk parity portfolio.
👉 We will here consider robust portfolio optimization against estimation errors of the parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).
Efficient frontier: mean-variance trade-off curve (Pareto curve) but it is not achieved in practice due to parameter estimation errors:
We will load some stock market data and divide it into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation).
In particular, we will start by loading some stock data from three different sectors:
library(xts) # to manipulate time series of stock data library(quantmod) # to download stock data library(PerformanceAnalytics) # to compute performance measures # download data from YahooFinance stock_namelist <- c("AAPL", "AMD", "ADI", "ABBV", "AEZS", "A", "APD", "AA","CF") prices <- xts() for (i in 1:length(stock_namelist)) { tmp <- Ad(getSymbols(stock_namelist[i], from = "2013-01-01", to = "2018-12-31", auto.assign = FALSE)) tmp <- na.approx(tmp, na.rm = FALSE) # interpolate NAs prices <- cbind(prices, tmp) } colnames(prices) <- stock_namelist indexClass(prices) <- "Date"
str(prices)
R>> An 'xts' object on 2013-01-02/2016-12-30 containing: R>> Data: num [1:1008, 1:9] 55.5 54.8 53.2 52.9 53.1 ... R>> - attr(*, "dimnames")=List of 2 R>> ..$ : NULL R>> ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ... R>> Indexed by objects of class: [Date] TZ: UTC R>> xts Attributes: R>> NULL
head(prices)
R>> AAPL AMD ADI ABBV AEZS A APD AA CF R>> 2013-01-02 55.47129 2.53 37.50844 27.60817 253 27.97831 66.43760 20.62187 29.44429 R>> 2013-01-03 54.77112 2.49 36.90318 27.38020 254 28.07852 66.20545 20.80537 29.30506 R>> 2013-01-04 53.24548 2.59 36.24679 27.03431 257 28.63300 67.09529 21.24121 29.96146 R>> 2013-01-07 52.93227 2.67 36.35761 27.08934 259 28.42591 67.03341 20.87419 29.84922 R>> 2013-01-08 53.07474 2.67 35.98254 26.49976 255 28.19877 67.15720 20.87419 29.41162 R>> 2013-01-09 52.24524 2.63 35.88875 26.64913 258 28.96036 68.06254 20.82831 30.44025
tail(prices)
R>> AAPL AMD ADI ABBV AEZS A APD AA CF R>> 2016-12-22 111.8445 11.60 70.07727 55.82264 4.10 44.91877 136.0360 29.75 27.82244 R>> 2016-12-23 112.0657 11.58 70.44936 56.43826 4.10 45.14350 136.3992 29.71 28.37686 R>> 2016-12-27 112.7774 12.07 70.89778 56.58311 4.05 45.44642 137.2373 29.65 29.52265 R>> 2016-12-28 112.2965 11.55 70.18222 56.37488 3.55 44.67448 135.1328 29.43 29.22696 R>> 2016-12-29 112.2676 11.59 70.20129 56.79134 3.60 44.72544 135.2359 28.89 29.47645 R>> 2016-12-30 111.3924 11.34 69.28539 56.69175 3.60 44.64704 134.7206 28.08 29.08836
# compute data matrix with log-returns X <- diff(log(prices))[-1] N <- ncol(X) # number of stocks T <- nrow(X) # number of days
We can take a look at the prices of the stocks:
plot(prices/rep(prices[1, ], each = nrow(prices)), col = rainbow10equal, legend.loc = "topleft", main = "Normalized prices")
We can now use obtain the sample estimates from the returns \(\mathbf{x}_t\) (i.e., sample means and sample covariance matrix) as \[ \begin{align} \hat{\boldsymbol{\mu}} & = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t\\ \hat{\boldsymbol{\Sigma}} & = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})(\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T \end{align} \]
mu <- colMeans(X) Sigma <- cov(X)
Now that we have some realistic parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\), we will design a portfolio with some small noise in the parameters to observe the effect.
Let’s start with the global maximum return portfolio (GMRP) formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] Of course the optimal solution is to allocate all the budget on the single stock with largest expected return, not very diversified. We can still implement it using a solver:
library(CVXR) portfolioMaxReturn <- function(mu) { w <- Variable(length(mu)) prob <- Problem(Maximize(t(w) %*% mu), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } w_GMRP <- portfolioMaxReturn(mu) names(w_GMRP) <- colnames(X) w_all_GMRP <- cbind(w_GMRP) # plot allocation barplot(t(w_all_GMRP), col = rainbow8equal[1], legend = colnames(w_all_GMRP), beside = TRUE, main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
As expected, all the budget is allocated to a single stock.
Let’s now introduce some estimation error in \(\boldsymbol{\mu}\) to see the effect:
library(mvtnorm) # generate a few noisy estimations of mu set.seed(357) for (i in 1:6) { mu_noisy <- as.vector(rmvnorm(n = 1, mean = mu, sigma = (1/T)*Sigma)) w_GMRP_noisy <- portfolioMaxReturn(mu_noisy) w_all_GMRP <- cbind(w_all_GMRP, w_GMRP_noisy) } # plot to compare the allocations barplot(t(w_all_GMRP), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP), beside = TRUE, main = "Global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
We can see that the optimal allocation changes totally from realization to realization, which is highly undesirable (in fact, it is totally unacceptable).
Let’s continue with the GMVP formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]
portfolioGMVP <- function(Sigma) { w <- Variable(nrow(Sigma)) prob <- Problem(Minimize(quad_form(w, Sigma)), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } w_GMVP <- portfolioGMVP(Sigma) names(w_GMVP) <- colnames(X) w_all_GMVP <- cbind(w_GMVP) # plot allocation barplot(t(w_all_GMVP), col = rainbow8equal[1], legend = colnames(w_all_GMVP), beside = TRUE, main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")
The GMVP is totally different from the maximum return portfolio. It’s quite the opposite: the budget is allocated to most of the stocks to diversity the risk.
Let’s now introduce some estimation error in \(\boldsymbol{\Sigma}\) to see the effect:
library(PerformanceAnalytics) # generate a few noisy estimations of Sigma set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = rep(0, N), sigma = Sigma) Sigma_noisy <- cov(X_noisy) w_GMVP_noisy <- portfolioGMVP(Sigma_noisy) w_all_GMVP <- cbind(w_all_GMVP, w_GMVP_noisy) } # plot to compare the allocations barplot(t(w_all_GMVP), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP), beside = TRUE, main = "Global minimum variance portfolio allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMVP), main = "GMVP allocation", ylab = "w", space = 0, border = NA)
We can see that the optimal allocation still changes from realization to realization, although in this case is not that extreme.
Finally, let’s take a look at Markowitz’s mean-variance portfolio formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w} -\lambda\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]
portfolioMarkowitz <- function(mu, Sigma, lmd = 0.5) { w <- Variable(nrow(Sigma)) prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } w_Markowitz <- portfolioMarkowitz(mu, Sigma) names(w_Markowitz) <- colnames(X) w_all_Markowitz <- cbind(w_Markowitz) # plot allocation barplot(t(w_all_Markowitz), col = rainbow8equal[1], legend = colnames(w_all_Markowitz), beside = TRUE, main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")
The mean-variance portfolio seems to be not very diversified. This is one of the reasons why practitioners do not use it. The other reason is the sensitivity or lack of robustness to the estimation of the parameters as we will explore next.
# generate a few noisy estimations of mu and Sigma set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_Markowitz_noisy <- portfolioMarkowitz(mu_noisy, Sigma_noisy) w_all_Markowitz <- cbind(w_all_Markowitz, w_Markowitz_noisy) } # plot to compare the allocations barplot(t(w_all_Markowitz), col = rainbow8equal[1:7], legend = colnames(w_all_Markowitz), beside = TRUE, main = "Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_Markowitz), main = "Markowitz portfolio allocation", ylab = "w", space = 0, border = NA)
Again, the allocations are totally different from realization to realization. Totally unacceptable. Who is going to trust these allocations to invest their money?
The portfolio that maximizes the return (while ignoring the variance) is the linear program (LP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
In practice, however, \(\boldsymbol{\mu}\) is unknown and has to be estimated \(\hat{\boldsymbol{\mu}}\), e.g., with the sample mean \[\hat{\boldsymbol{\mu}}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}_{t},\] where \(\mathbf{x}_{t}\) is the return at day \(t\).
Instead of assuming that \(\boldsymbol{\mu}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\mu}}\).
The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
We assume the expected returns are only known within an ellipsoid: \[\mathcal{U}_{\boldsymbol{\mu}} = \left\{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\mid\left\Vert \mathbf{u}\right\Vert _{2}\leq1\right\}\] where one can use the estimated covariance matrix to shape the uncertainty ellipsoid, i.e., \(\mathbf{S}=\hat{\boldsymbol{\Sigma}}\).
We can solve easily the inner minimization: \[\begin{array}{ll} \underset{\boldsymbol{\mu},\mathbf{u}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u},\\ & \left\Vert \mathbf{u}\right\Vert _{2}\leq1. \end{array}\]
It’s easy to find the minimum value using Cauchy-Schwartz’s inequality: \[\begin{aligned} \mathbf{w}^{T}\boldsymbol{\mu} &= \mathbf{w}^{T}\left(\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\right)\\ &= \mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\kappa\mathbf{w}^{T}\mathbf{S}^{1/2}\mathbf{u}\\ &\geq \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2} \end{aligned}\] with equality achieved with \(\mathbf{u}=-\frac{\mathbf{S}^{1/2}\mathbf{w}}{\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}}\).
Finally, the robust formulation becomes the SOCP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
Recall the vanilla problem formulation was the LP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\hat{\boldsymbol{\mu}}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
So, we have gone from an LP to an SOCP.
In general, when a problem is robustified, the complexity of the problem increases. For example:
The robust formulation of the global maximum return portfolio (GMRP) reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min} \mathbf{w}^T\boldsymbol{\mu}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For an elliptical uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^e = \{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\ | \ \left\Vert \mathbf{u}\right\Vert_{2}\leq 1\}, \] where typically \(\mathbf{S}=\hat{\mathbf{\Sigma}}\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\left\Vert \mathbf{S}^{1/2}\mathbf{w}\right\Vert _{2}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a box uncertainty set \[ \mathcal{U}_\boldsymbol{\mu}^b = \{\boldsymbol{\mu}\ | -\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\}, \] where typically \(\delta_i = \kappa\hat{\sigma}_i\), the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - |\mathbf{w}|^T\boldsymbol{\delta}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
Let’s start implementing the robust solution for the box constraint:
portfolioMaxReturnRobustBox <- function(mu_hat, delta) { w <- Variable(length(mu_hat)) prob <- Problem(Maximize( t(w) %*% mu_hat - t(abs(w)) %*% delta ), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMRP <- portfolioMaxReturn(mu) names(w_GMRP) <- colnames(X) w_all_GMRP_robust_box <- cbind(w_GMRP) # multiple robust solutions kappa <- 0.1 delta <- kappa*sqrt(diag(Sigma_noisy)) set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_GMRP_robust_box_noisy <- portfolioMaxReturnRobustBox(mu_noisy, delta) w_all_GMRP_robust_box <- cbind(w_all_GMRP_robust_box, w_GMRP_robust_box_noisy) } # plot to compare the allocations barplot(t(w_all_GMRP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_box), beside = TRUE, args.legend = list(bg = "white"), main = "Robust (box) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
Not much improvement with respect to the naive case. The sensitivity is still there.
Let’s try the robust solution for the elliptical constraint:
portfolioMaxReturnRobustEllipsoid <- function(mu_hat, S, kappa = 0.1) { S12 <- chol(S) # t(S12) %*% S12 = Sigma w <- Variable(length(mu_hat)) prob <- Problem(Maximize( t(w) %*% mu_hat - kappa*norm2(S12 %*% w) ), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMRP <- portfolioMaxReturn(mu) names(w_GMRP) <- colnames(X) w_all_GMRP_robust_ellipsoid <- cbind(w_GMRP) # multiple robust solutions kappa <- 0.2 set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_GMRP_robust_ellipsoid_noisy <- portfolioMaxReturnRobustEllipsoid(mu_noisy, Sigma_noisy, kappa) w_all_GMRP_robust_ellipsoid <- cbind(w_all_GMRP_robust_ellipsoid, w_GMRP_robust_ellipsoid_noisy) } # plot to compare the allocations barplot(t(w_all_GMRP_robust_ellipsoid), col = rainbow8equal[1:7], legend = colnames(w_all_GMRP_robust_ellipsoid), beside = TRUE, args.legend = list(bg = "white"), main = "Robust (ellipsoid) global maximum return portfolio allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMRP_robust_ellipsoid), main = "Robust (ellipsoid) global maximum return portfolio allocation", ylab = "w", space = 0, border = NA)
The elliptical uncertainty set seems to work better for the maximum return portfolio. It definitely provides more diversity compared to the naive solution, as well as being more stable (less sensitive). However, it is still too sensitive.
Summarizing, for the robust global maximum return portfolio, it seems that the ellipsoid uncertainty in \(\boldsymbol{\mu}\) is more stable (and diverse) than the box uncertainly (which is totally unstable). Nevertheless, the allocation is still too sensitive owing to the fact that the global maximum return portfolio is probably not a good formulation as it ignores the covariance matrix.
The global minimum variance portfolio (GMVP) ignores the expected return and focuses on the risk only: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
It is a simple convex QP with solution \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}.\]
It is widely used in academic papers for simplicity of evaluation and comparison of different estimators of the covariance matrix \(\boldsymbol{\Sigma}\) (while ignoring the estimation of \(\boldsymbol{\mu}\)).
In practice, \(\boldsymbol{\Sigma}\) is unknown and has to be estimated \(\hat{\boldsymbol{\Sigma}}\), e.g., with the sample covariance matrix.
Then, the naive portfolio becomes \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}}\hat{\boldsymbol{\Sigma}}^{-1}\mathbf{1}.\]
Instead of assuming that \(\boldsymbol{\Sigma}\) is known perfectly, we now assume it belongs to some convex uncertainty set, denoted by \(\mathcal{U}_{\boldsymbol{\Sigma}}\).
The worst-case robust formulation is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
In particular, we will assume that the estimation comes from the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\) where \(\mathbf{X}\) is a \(T\times N\) matrix containing the return data (assumed demeaned already).
However, we will assume that the data matrix is noisy \(\hat{\mathbf{X}}\) and the actual matrix can be written as \(\mathbf{X}=\hat{\mathbf{X}}+\boldsymbol{\Delta}\), where \(\boldsymbol{\Delta}\) is some error matrix bounded in its norm.
Thus, we will then model the data matrix as \[\mathcal{U}_{\mathbf{X}} =\left\{\mathbf{X}\mid\left\Vert\mathbf{X}-\hat{\mathbf{X}}\right\Vert_{F}\leq\delta_{\mathbf{X}}\right\}.\]
The worst-case robust formulation becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\mathbf{w}^{T}\frac{1}{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
Let’s focus on the inner maximization: \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\underset{\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\delta_{\mathbf{X}}}{\max}\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}^{2}\]
We first invoke the triangle inequality to get an upper bound: \[\left\Vert\left(\hat{\mathbf{X}}+\boldsymbol{\Delta}\right)\mathbf{w}\right\Vert_{2}\leq\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\] with equality achieved when the two vectors \(\hat{\mathbf{X}}\mathbf{w}\) and \(\boldsymbol{\Delta}\mathbf{w}\) are aligned.
Next, we invoke the norm inequality \[\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert _{2}\leq\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\left\Vert \mathbf{w}\right\Vert _{2}\leq\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\] with equality achieved when \(\boldsymbol{\Delta}\) is rank-one with right singular vector aligned with \(\mathbf{w}\) and when \(\left\Vert \boldsymbol{\Delta}\right\Vert _{F}=\delta_{\mathbf{X}}\). (This follows easily from \(\mathbf{w}^{T}\mathbf{M}\mathbf{w}\leq\lambda_{\max}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\leq\mathsf{Tr}\left(\mathbf{M}\right)\left\Vert \mathbf{w}\right\Vert ^{2}\) for \(\mathbf{M}\succeq\mathbf{0}\).)
Finally, we can see that both upper bounds can be actually achieved if the error is properly chosen as \[\boldsymbol{\Delta}=\delta_{\mathbf{X}}\frac{\hat{\mathbf{X}}\mathbf{w}\mathbf{w}^{T}}{\left\Vert \mathbf{w}\right\Vert _{2}\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}}.\]
Thus, \[\underset{\mathbf{X}\in\mathcal{U}_{\mathbf{X}}}{\max}\;\mathbf{w}^{T}\mathbf{X}^{T}\mathbf{X}\mathbf{w}=\left(\left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\right)^{2}.\]
The robust problem formulation finally becomes: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is a (convex) SOCP.
Recall the vanilla problem formulation was the QP \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\]
Now, the robust problem formulation is the SOCP (from QP to SOCP) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which contains the regularization term \(\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}\).
One common heuristic, called Tikhonov regularization, is to consider instead \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \left\Vert\hat{\mathbf{X}}\mathbf{w}\right\Vert_{2}^{2}+\delta_{\mathbf{X}}\left\Vert \mathbf{w}\right\Vert _{2}^{2}=\mathbf{w}^{T}\left(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I}\right)\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] which is equivalent to the vanilla formulation but using the regularized sample covariance matrix \(\hat{\boldsymbol{\Sigma}}^{\mathsf{tik}}=\frac{1}{T}(\hat{\mathbf{X}}^{T}\hat{\mathbf{X}}+\delta_{\mathbf{X}}\mathbf{I})=\hat{\boldsymbol{\Sigma}}+\frac{\delta_{\mathbf{X}}}{T}\mathbf{I}\).
The robust formulation of the GMVP reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max} \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a spherical uncertainty set on the data matrix \(\mathbf{X}\) (assuming we use the sample covariance matrix \(\hat{\boldsymbol{\Sigma}}=\frac{1}{T-1}\mathbf{X}^T\mathbf{X}\)) \[ \mathcal{U}_\mathbf{X}^s = \{\mathbf{X}\ | \ \|\mathbf{X}-\hat{\mathbf{X}}\|_F \le \delta\}, \] the robust formulation becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \|\hat{\mathbf{X}}\mathbf{w}\|_2 + \delta \|\mathbf{w}\|_2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
For a box uncertainty set on the covariance matrix \[ \mathcal{U}_\boldsymbol{\Sigma}^b = \{\boldsymbol{\Sigma}\ | \ \underline{\boldsymbol{\Sigma}} \le \boldsymbol{\Sigma} \le \overline{\boldsymbol{\Sigma}},\ \boldsymbol{\Sigma} \succeq \mathbf{0}\}, \] where a choice for the bounds is \(\underline{\boldsymbol{\Sigma}}=\delta^{-1}\hat{\boldsymbol{\Sigma}}\) and \(\overline{\boldsymbol{\Sigma}}=\delta\hat{\boldsymbol{\Sigma}}\) with \(\delta>1\), the robust formulation becomes
\[ \begin{array}{ll} \underset{\mathbf{w},\underline{\boldsymbol{\Lambda}},\overline{\boldsymbol{\Lambda}}}{\textsf{minimize}} & \textsf{Tr}(\overline{\boldsymbol{\Lambda}}\ \overline{\boldsymbol{\Sigma}}) - \textsf{Tr}(\underline{\boldsymbol{\Lambda}}\ \underline{\boldsymbol{\Sigma}})\\ \textsf{subject to} & \left[\begin{array}{cc}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1\end{array} \right]\succeq\mathbf{0}\\ & \overline{\boldsymbol{\Lambda}}\ge\mathbf{0},\ \underline{\boldsymbol{\Lambda}}\ge\mathbf{0}\\ & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
Let’s start implementing the robust formulation for the box constraint on \(\boldsymbol{\Sigma}\):
# define function (check: https://cvxr.rbind.io/post/cvxr_functions/) portfolioGMVPRobustBox <- function(Sigma_lb, Sigma_ub) { N <- nrow(Sigma_lb) w <- Variable(N) Lambda_ub <- Variable(N, N) Lambda_lb <- Variable(N, N) prob <- Problem(Minimize(matrix_trace(Lambda_ub %*% Sigma_ub) - matrix_trace(Lambda_lb %*% Sigma_lb)), constraints = list(w >= 0, sum(w) == 1, Lambda_ub >= 0, Lambda_lb >= 0, bmat(list(list(Lambda_ub - Lambda_lb, w), list(t(w), 1))) == Semidef(N+1))) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMVP <- portfolioGMVP(Sigma) names(w_GMVP) <- colnames(X) w_all_GMVP_robust_box <- cbind(w_GMVP) # multiple robust solutions delta <- 3 set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) Sigma_lb <- (1/delta)*Sigma_noisy #diag(Sigma_lb) <- diag(Sigma_noisy) Sigma_ub <- (1/delta)*Sigma_noisy diag(Sigma_ub) <- diag(Sigma_noisy) w_GMVP_robust_box_noisy <- portfolioGMVPRobustBox(Sigma_lb, Sigma_ub) w_all_GMVP_robust_box <- cbind(w_all_GMVP_robust_box, w_GMVP_robust_box_noisy) } # plot to compare the allocations barplot(t(w_all_GMVP_robust_box), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_box), beside = TRUE, args.legend = list(bg = "white", x = "topleft"), main = "Robust (box) GMVP allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)
chart.StackedBar(t(w_all_GMVP_robust_box), main = "Robust (box) GMVP allocation", ylab = "w", space = 0, border = NA)
The naive GMVP was already kind of stable. Nevertheless, the robust GMVP becomes even more stable (less sensitive to the estimation errors). Better results could be obtained by choosing more appropriate upper and lower bounds on \(\boldsymbol{\Sigma}\).
Let’s consider now the robust formulation for the spherical uncertainty region on the data matrix \(\mathbf{X}\):
# define function portfolioGMVPRobustSphereX <- function(X_hat, delta) { N <- ncol(X_hat) w <- Variable(N) X_ <- scale(X_hat, center = TRUE, scale = FALSE) # demean prob <- Problem(Minimize(norm2(X_ %*% w) + delta*norm2(w)), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMVP <- portfolioGMVP(Sigma) names(w_GMVP) <- colnames(X) w_all_GMVP_robust_sphereX <- cbind(w_GMVP) # multiple robust solutions delta <- 0.1 set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) w_GMVP_robust_sphereX_noisy <- portfolioGMVPRobustSphereX(X_noisy, delta) w_all_GMVP_robust_sphereX <- cbind(w_all_GMVP_robust_sphereX, w_GMVP_robust_sphereX_noisy) } # plot to compare the allocations barplot(t(w_all_GMVP_robust_sphereX), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_sphereX), beside = TRUE, args.legend = list(bg = "white", x = "topleft"), main = "Robust (sphere in X) GMVP allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)
chart.StackedBar(t(w_all_GMVP_robust_sphereX), main = "Robust (sphere in X) GMVP allocation", ylab = "w", space = 0, border = NA)
This robust GMVP with spherical uncertainty in \(\mathbf{X}\) seems quite stable and the solution is not sensitive to the choice of \(\delta\), so very straightforward to use (as opposed to the previous case with box uncertainty in \(\boldsymbol{\Sigma}\), which requires a careful choice of the upper and lower bounds). Note that better results could be obtained by using a better choice for the shape of the ellipsoid instead of just a sphere.
Finally, let’s consider the robust formulation for the elliptical uncertainty region on the covariance matrix \(\boldsymbol{\Sigma}\):
# define function portfolioGMVPRobustEllipsoid <- function(Sigma_hat, S, delta) { N <- ncol(Sigma_hat) S12 <- chol(S) # t(S12) %*% S12 = Sigma w <- Variable(N) X <- Variable(N, N) Z <- Variable(N, N) prob <- Problem(Minimize(matrix_trace(Sigma_hat %*% (X + Z)) + delta*norm2(S12 %*% (vec(X) + vec(Z)))), constraints = list(w >= 0, sum(w) == 1, bmat(list(list(X, w), list(t(w), 1))) == Semidef(N+1), Z == Semidef(N))) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_GMVP <- portfolioGMVP(Sigma) names(w_GMVP) <- colnames(X) w_all_GMVP_robust_ellipsoid <- cbind(w_GMVP) # multiple robust solutions delta <- 0.0005 set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) Sigma_noisy <- cov(X_noisy) w_GMVP_robust_ellipsoid_noisy <- portfolioGMVPRobustEllipsoid(Sigma_noisy, diag(N^2), delta) w_all_GMVP_robust_ellipsoid <- cbind(w_all_GMVP_robust_ellipsoid, w_GMVP_robust_ellipsoid_noisy) } # plot to compare the allocations barplot(t(w_all_GMVP_robust_ellipsoid), col = rainbow8equal[1:7], legend = colnames(w_all_GMVP_robust_ellipsoid), beside = TRUE, args.legend = list(bg = "white", x = "topleft"), main = "Robust (ellipsoid) GMVP allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_GMVP), main = "Naive GMVP allocation", ylab = "w", space = 0, border = NA)
chart.StackedBar(t(w_all_GMVP_robust_ellipsoid), main = "Robust (ellipsoid) GMVP allocation", ylab = "w", space = 0, border = NA)
This robust GMVP with spherical uncertainty in \(\boldsymbol{\Sigma}\) is also stable but seems to be a bit less accurate than the previous robut GMVP with spherical uncertainty in \(\mathbf{X}\) (just see the allocation to stock AMD).
Summarizing, for the robust GMVP it seems that the spherical uncertainty in \(\mathbf{X}\) is the best uncertainty set in terms of stability (least sensitivity among different noisy realizations) and accuracy compared to the clairvoyant allocation.
Instead of assuming \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) are known perfectly, now we assume they belong to some convex uncertainty sets, denoted as \(\mathcal{U}_{\boldsymbol{\mu}}\) and \(\mathcal{U}_{\boldsymbol{\Sigma}}\), respectively.
The worst-case formulation will consider the worst-case points within those uncertainty sets.
A conservative and practical investment approach is to optimize the worst-case objective over the uncertainty sets (Cornuejols and Tütüncü 2006), (Fabozzi 2007): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}. \end{array}\]
The two key issues are:
Box uncertainty set: \[\mathcal{U}_{\boldsymbol{\mu}}^{b} =\left\{\boldsymbol{\mu}\mid-\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\right\},\] where the predefined parameters \(\hat{\boldsymbol{\mu}}\) and \(\boldsymbol{\delta}\) denote the location and size of the box uncertainty set, respectively.
Easily, the worst-case mean is \[\underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}^{b}}{\min}\mathbf{w}^{T}\boldsymbol{\mu} =\mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\underset{-\boldsymbol{\delta}\leq\boldsymbol{\gamma}\leq\boldsymbol{\delta}}{\min}\mathbf{w}^{T}\boldsymbol{\gamma}=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}-|\mathbf{w}|^{T}\boldsymbol{\delta},\] where \(|\mathbf{w}|\) denotes elementwise absolute value of \(\mathbf{w}\).
Note that it is a concave function of \(\mathbf{w}\) (as it should be since it is the minimum of linear functions).
Elliptical uncertainty set: \[\mathcal{U}_{\boldsymbol{\mu}}^{e} =\left\{\boldsymbol{\mu}\mid(\boldsymbol{\mu}-\hat{\boldsymbol{\mu}})^{T}\mathbf{S}_{\boldsymbol{\mu}}^{-1}(\boldsymbol{\mu}-\hat{\boldsymbol{\mu}})\leq\delta_{\boldsymbol{\mu}}^{2}\right\},\] where the predefined parameters \(\hat{\boldsymbol{\mu}}\), \(\delta_{\boldsymbol{\mu}}>0\), and \(\mathbf{S}_{\boldsymbol{\mu}}\succ\mathbf{0}\) denote the location, size, and the shape of the uncertainty set, respectively.
The worst-case mean is \[\begin{aligned} \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}^{e}}{\min}\mathbf{w}^{T}\boldsymbol{\mu} &= \underset{\left\Vert \mathbf{S}_{\boldsymbol{\mu}}^{-1/2}\boldsymbol{\gamma}\right\Vert _{2}\leq\delta_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}(\hat{\boldsymbol{\mu}}+\boldsymbol{\gamma})=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\underset{\left\Vert \mathbf{S}_{\boldsymbol{\mu}}^{-1/2}\boldsymbol{\gamma}\right\Vert _{2}\leq\delta_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\gamma}\\ &=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}+\underset{\left\Vert \tilde{\boldsymbol{\gamma}}\right\Vert _{2}\leq\delta_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\mathbf{S}_{\boldsymbol{\mu}}^{1/2}\tilde{\boldsymbol{\gamma}}=\mathbf{w}^{T}\hat{\boldsymbol{\mu}}-\delta_{\boldsymbol{\mu}}\left\Vert \mathbf{S}_{\boldsymbol{\mu}}^{1/2}\mathbf{w}\right\Vert _{2}. \end{aligned}\]
Note that it is a concave function of \(\mathbf{w}\) (as it should be since it is the minimum of linear functions).
The equivalent dual problem is (Lobo and Boyd 2000) \[\begin{array}{ll} \underset{\overline{\boldsymbol{\Lambda}},\,\underline{\boldsymbol{\Lambda}}}{\textsf{minimize}} & \textsf{Tr}(\overline{\boldsymbol{\Lambda}}\,\overline{\boldsymbol{\Sigma}})-\textsf{Tr}(\underline{\boldsymbol{\Lambda}}\,\underline{\boldsymbol{\Sigma}})\\ \textsf{subject to} & \begin{bmatrix}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{bmatrix}\succeq\mathbf{0},\\ & \overline{\boldsymbol{\Lambda}}\geq\mathbf{0},\quad\underline{\boldsymbol{\Lambda}}\geq\mathbf{0}, \end{array}\] which is a convex SDP.
The constraints are jointly convex in the inner dual variable variables \(\overline{\boldsymbol{\Lambda}}\) and \(\underline{\boldsymbol{\Lambda}}\) and the outer variable \(\mathbf{w}\).
Since the problem is convex, strong duality holds (zero duality gap).
Thus, the maximum objective value equals the minimum objective value of its dual problem: \[\begin{array}{ll} \underset{\mathbf{Z}}{\textsf{minimize}} & \mathsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{w}\mathbf{w}^{T}+\mathbf{Z}\right)\right)+\delta_{\boldsymbol{\Sigma}}\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\mathsf{vec}(\mathbf{w}\mathbf{w}^{T})+\mathsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\\ \textsf{subject to} & \mathbf{Z}\succeq\mathbf{0}. \end{array}\]
We can now plug in this inner problem in the original problem: \[\begin{array}{ll} \underset{\mathbf{w},\mathbf{Z}}{\textsf{maximize}} & \begin{array}{r} \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\left(\mathsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{w}\mathbf{w}^{T}+\mathbf{Z}\right)\right)\right.\hspace{2cm}\\ \left.+\delta_{\boldsymbol{\Sigma}}\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\mathsf{vec}(\mathbf{w}\mathbf{w}^{T})+\mathsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\right) \end{array}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}\\ & \mathbf{Z}\succeq\mathbf{0}. \end{array}\]
However, now this problem contains a complicated term with the composition of \(\mathsf{vec}(\mathbf{w}\mathbf{w}^{T})\) and the norm \(\left\Vert \cdot\right\Vert _{2}\).
We can further include a new variable as \(\mathbf{X}=\mathbf{w}\mathbf{w}^{T}\) so that the objective function becomes nicer.
But this constraint is not convex!
Luckily we can instead use \(\mathbf{X}\succeq\mathbf{w}\mathbf{w}^{T}\) (because we can easily show that at the optimal point it will be achieved with equality), which can be further expressed as \(\left[\begin{array}{cc} \mathbf{X} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{array}\right]\succeq\mathbf{0}\).
The final problem is \[\begin{array}{ll} \underset{\mathbf{w},\mathbf{X},\mathbf{Z}}{\textsf{maximize}} & \begin{array}{r} \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\left(\mathsf{Tr}\left(\hat{\boldsymbol{\Sigma}}\left(\mathbf{X}+\mathbf{Z}\right)\right)\right.\hspace{2cm}\\ \left.+\delta_{\boldsymbol{\Sigma}}\left\Vert \mathbf{S}_{\boldsymbol{\Sigma}}^{1/2}\left(\mathsf{vec}(\mathbf{X})+\mathsf{vec}(\mathbf{Z})\right)\right\Vert _{2}\right) \end{array}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}\\ &\left[\begin{array}{cc} \mathbf{X} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{array}\right]\succeq\mathbf{0}\\ & \mathbf{Z}\succeq\mathbf{0}. \end{array}\]
Consider the two uncertainty sets: \[\begin{aligned} \mathcal{U}_{\boldsymbol{\mu}}^{b} &= \left\{\boldsymbol{\mu}\mid-\boldsymbol{\delta}\leq\boldsymbol{\mu}-\hat{\boldsymbol{\mu}}\leq\boldsymbol{\delta}\right\},\\ \mathcal{U}_{\boldsymbol{\Sigma}}^{b} &= \left\{\boldsymbol{\Sigma}\mid\underline{\boldsymbol{\Sigma}}\leq\boldsymbol{\Sigma}\leq\overline{\boldsymbol{\Sigma}},\boldsymbol{\Sigma}\succeq\mathbf{0}\right\}. \end{aligned}\]
The worst-case robust formulation \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min}\mathbf{w}^{T}\boldsymbol{\mu}-\lambda\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max}\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}. \end{array}\] becomes \[\begin{array}[t]{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \begin{array}[t]{r} \mathbf{w}^{T}\hat{\boldsymbol{\mu}}-|\mathbf{w}|^{T}\boldsymbol{\delta}-\lambda \begin{array}[t]{l} \thinspace\underset{\overline{\boldsymbol{\Lambda}},\underline{\boldsymbol{\Lambda}}}{\min}\left\{ \mathsf{Tr}(\overline{\boldsymbol{\Lambda}}\,\overline{\boldsymbol{\Sigma}})-\mathsf{Tr}(\underline{\boldsymbol{\Lambda}}\,\underline{\boldsymbol{\Sigma}})\right\}\\ \textsf{subject to}\quad \begin{array}[t]{l} \begin{bmatrix}\overline{\boldsymbol{\Lambda}}-\underline{\boldsymbol{\Lambda}} & \mathbf{w}\\ \mathbf{w}^{T} & 1 \end{bmatrix}\succeq\mathbf{0},\\ \overline{\boldsymbol{\Lambda}}\geq\mathbf{0},\quad\underline{\boldsymbol{\Lambda}}\geq\mathbf{0}, \end{array} \end{array} \end{array}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1,\quad\mathbf{w}\in\mathcal{W}. \end{array}\]
📘 F. J. Fabozzi. Robust Portfolio Optimization and Management. Wiley, 2007.
📄 D. Goldfarb and G. Iyengar, “Robust portfolio selection problems”, Mathematics of Operations Research, vol. 28, no. 1, pp. 1–38, 2003.
📄 R. H. Tutuncu and M. Koenig, “Robust asset allocation”, Annals Operations Research, vol. 132, no. 1, pp. 157–187, 2004.
📄 L. El Ghaoui, M. Oks, and F. Oustry, “Worst-case value-at-risk and robust portfolio optimization: A conic programming approach”, Operations Research, pp. 543–556, 2003.
📄 Z. Lu, “Robust portfolio selection based on a joint ellipsoidal uncertainty set”, Optimization Methods & Software, vol. 26, no. 1, pp. 89–104, 2011.
The naive may achieve a better mean-variance objective than the robust but occasionally can be pretty bad. The robust is more stable.
In terms of Sharpe ratio, the robust is clearly superior to the naive (note that the mean-variance portfolio is not the same as the maximum Sharpe ration portfolio).
The robust formulation of the mean-variance Markowitz portfolio reads \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \underset{\boldsymbol{\mu}\in\mathcal{U}_{\boldsymbol{\mu}}}{\min} \mathbf{w}^T\boldsymbol{\mu} - \lambda\underset{\boldsymbol{\Sigma}\in\mathcal{U}_{\boldsymbol{\Sigma}}}{\max} \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] There are many choices for the uncertainty sets \(\mathcal{U}_{\boldsymbol{\mu}}\) and \(\mathcal{U}_{\boldsymbol{\Sigma}}\) leading to many combinations for the robust Markowitz portfolio. In light of our previous experiments with the maximum-return portfolio and GMVP, we will simply consider one choice given by an elliptical uncertainty set for the mean \(\boldsymbol{\mu}\) and a spherical uncertainty set for the data matrix \(\mathbf{X}\): \[ \mathcal{U}_\boldsymbol{\mu}^e = \{\boldsymbol{\mu}=\hat{\boldsymbol{\mu}}+\kappa\mathbf{S}^{1/2}\mathbf{u}\ | \ \left\Vert \mathbf{u}\right\Vert_{2}\leq 1\}, \] with \(\mathbf{S}=\hat{\mathbf{\Sigma}}\), and \[ \mathcal{U}_\mathbf{X}^s = \{\mathbf{X}\ | \ \|\mathbf{X}-\hat{\mathbf{X}}\|_F \le \delta\}. \]
The robust formulation with those uncertainty sets becomes \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_{2} -\frac{\lambda}{T-1}\left(\|\hat{\mathbf{X}}\mathbf{w}\|_2 + \delta \|\mathbf{w}\|_2\right)^2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \] It can be further simplified noting that \(\hat{\mathbf{\Sigma}} = \frac{1}{T-1}\hat{\mathbf{X}}^T\hat{\mathbf{X}}\) implies \(\hat{\mathbf{\Sigma}}^{1/2} = \frac{1}{\sqrt(T-1)}\hat{\mathbf{X}}\) and \(\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_2 = \frac{1}{\sqrt(T-1)}\|\hat{\mathbf{X}}\mathbf{w}\|_2\): \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^T\hat{\boldsymbol{\mu}} - \kappa\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_{2} - \lambda\left(\|\hat{\mathbf{\Sigma}}^{1/2}\mathbf{w}\|_2 + \frac{\delta}{\sqrt(T-1)} \|\mathbf{w}\|_2\right)^2\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]
Let’s implement this robust portfolio optimization:
portfolioMarkowitzRobust <- function(mu_hat, Sigma_hat, kappa, delta_, lmd = 0.5) { N <- length(mu_hat) S12 <- chol(Sigma_hat) # t(S12) %*% S12 = Sigma w <- Variable(N) prob <- Problem(Maximize(t(w) %*% mu_hat - kappa*norm2(S12 %*% w) - lmd*(norm2(S12 %*% w) + delta_*norm2(w))^2), constraints = list(w >= 0, sum(w) == 1)) result <- solve(prob) return(as.vector(result$getValue(w))) } # clairvoyant solution w_Markowitz <- portfolioMarkowitz(mu, Sigma) names(w_Markowitz) <- colnames(X) w_all_Markowitz_robust <- cbind(w_Markowitz) # multiple robust solutions kappa <- 1.0 delta <- 0.1*sqrt(T) set.seed(357) for (i in 1:6) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_Markowitz_robust_noisy <- portfolioMarkowitzRobust(mu_noisy, Sigma_noisy, kappa, delta/sqrt(T-1)) w_all_Markowitz_robust <- cbind(w_all_Markowitz_robust, w_Markowitz_robust_noisy) } # plot to compare the allocations barplot(t(w_all_Markowitz_robust), col = rainbow8equal[1:7], legend = colnames(w_all_Markowitz_robust), beside = TRUE, args.legend = list(bg = "white", x = "topleft"), main = "Robust Markowitz portfolio allocation", xlab = "stocks", ylab = "dollars")
chart.StackedBar(t(w_all_Markowitz), main = "Naive Markowitz allocation", ylab = "w", space = 0, border = NA)
chart.StackedBar(t(w_all_Markowitz_robust), main = "Robust Markowitz portfolio allocation", ylab = "w", space = 0, border = NA)
We can see that the robust mean-variance Markowitz portfolio is more stable. By playing with the parameters \(\kappa\) and \(\delta\) one can obtain very different results (the naive formulation is recovered for \(\kappa=0\) and \(\delta=0\)). For large \(\delta\), the allocation tends to the uniform \(1/N\) allocation.
Now that we know how to obtain less sensitive (more stable) allocations, we can study what the performance is. We will consider the mean-variance Markowitz as an example; in particular, we will compute the naive and robust Markowitz portfolios for many random realizations of the estimated parameters \(\hat{\boldsymbol{\mu}}\) and \(\hat{\boldsymbol{\Sigma}}\), and then assess them in terms of the real parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).
# clairvoyant solutions w_Markowitz <- portfolioMarkowitz(mu, Sigma) names(w_Markowitz) <- colnames(X) portfolioMaxSharpeRatio <- function(mu, Sigma) { w_ <- Variable(nrow(Sigma)) prob <- Problem(Minimize(quad_form(w_, Sigma)), constraints = list(w_ >= 0, t(mu) %*% w_ == 1)) result <- solve(prob) w <- as.vector(result$getValue(w_)/sum(result$getValue(w_))) names(w) <- colnames(Sigma) return(w) } w_MaxSR <- portfolioMaxSharpeRatio(mu, Sigma) names(w_MaxSR) <- colnames(X) # multiple naive and robust solutions kappa <- 0.25 # smaller gives some bad outliers delta <- 0.01*sqrt(T) #larger gives a much more stable performance w_all_Markowitz_naive <- NULL w_all_Markowitz_robust <- NULL set.seed(357) for (i in 1:300) { X_noisy <- rmvnorm(n = T, mean = mu, sigma = Sigma) mu_noisy <- colMeans(X_noisy) Sigma_noisy <- cov(X_noisy) w_Markowitz_noisy <- portfolioMarkowitz(mu_noisy, Sigma_noisy) w_Markowitz_robust_noisy <- portfolioMarkowitzRobust(mu_noisy, Sigma_noisy, kappa, delta/sqrt(T-1)) w_all_Markowitz_naive <- cbind(w_all_Markowitz_naive, w_Markowitz_noisy) w_all_Markowitz_robust <- cbind(w_all_Markowitz_robust, w_Markowitz_robust_noisy) } # performance mean_variance <- function(w, mu, Sigma) return(t(w) %*% mu - 0.5 * t(w) %*% Sigma %*% w) Sharpe_ratio <- function(w, mu, Sigma) return(t(w) %*% mu / sqrt(t(w) %*% Sigma %*% w)) mean_variance_clairvoyant <- mean_variance(w_Markowitz, mu, Sigma) mean_variance_naive <- apply(w_all_Markowitz_naive, MARGIN = 2, FUN = mean_variance, mu, Sigma) mean_variance_robust <- apply(w_all_Markowitz_robust, MARGIN = 2, FUN = mean_variance, mu, Sigma) SR_clairvoyant <- Sharpe_ratio(w_MaxSR, mu, Sigma) SR_naive <- apply(w_all_Markowitz_naive, MARGIN = 2, FUN = Sharpe_ratio, mu, Sigma) SR_robust <- apply(w_all_Markowitz_robust, MARGIN = 2, FUN = Sharpe_ratio, mu, Sigma)
Let’s plot the mean-variance achieved, i.e., \(\boldsymbol{\mu}^T\mathbf{w} -0.5\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\):
# plots library(rbokeh) # devtools::install_github('bokeh/rbokeh')
R>> Registered S3 method overwritten by 'pryr': R>> method from R>> print.bytes Rcpp
figure(width = 800, title = "Performance Markowitz", xlab = "realization", ylab = "mean-variance", legend_location = "bottom_right") %>% ly_points(mean_variance_naive, color = "blue", legend = "naive") %>% ly_points(mean_variance_robust, color = "red", legend = "robust") %>% ly_abline(h = mean_variance_clairvoyant, legend = "clairvoyant")
We can see that the robust Markowitz portfolio is more stable and does not suffer from the really bad outliers.
Let’s plot now the Sharpe ratio achieved, i.e., \(\boldsymbol{\mu}^T\mathbf{w}/\sqrt(\mathbf{w}^T\mathbf{\Sigma}\mathbf{w})\):
figure(width = 800, title = "Performance Markowitz", xlab = "realization", ylab = "Sharpe-ratio", legend_location = "bottom_right") %>% ly_points(SR_naive, color = "blue", legend = "naive") %>% ly_points(SR_robust, color = "red", legend = "robust") %>% ly_abline(h = SR_clairvoyant, legend = "clairvoyant")
Observe that the clairvoyant solution does not present the maximum Sharpe ratio because it was not designed for that (a different formulation would be required for that). Clearly, the robust Markowitz portfolios show much higher Sharpe ratio than the naive ones. This final plot illustrates well the power of robust optimization applied to portfolio design.
We will consider one particular example based on modeling the returns via a factor model: \[\mathbf{r}=\boldsymbol{\mu}+\mathbf{V}^{T}\mathbf{f}+\boldsymbol{\epsilon}\] where \(\mathbf{f}\) denotes the random factors distributed according to \(\mathbf{f}\sim\mathcal{N}\left(\mathbf{0},\mathbf{F}\right)\) and \(\boldsymbol{\epsilon}\) denotes the a random residual error with uncorrelated elements distributed according to \(\boldsymbol{\epsilon}\sim\mathcal{N}\left(\mathbf{0},\mathbf{D}\right)\) with \(\mathbf{D}=\mathsf{diag}\left(\mathbf{d}\right)\).
The returns are then distributed according to \(\mathbf{r}\sim\mathcal{N}\left(\boldsymbol{\mu},\mathbf{V}^{T}\mathbf{F}\mathbf{V}+\mathbf{D}\right)\) and the obtained return using portfolio \(\mathbf{w}\) has mean \(\boldsymbol{\mu}^{T}\mathbf{w}\) and variance \(\mathbf{w}^{T}\left(\mathbf{V}^{T}\mathbf{F}\mathbf{V}+\mathbf{D}\right)\mathbf{w}\).
We will consider uncertainty in the knowledge of \(\boldsymbol{\mu}\), \(\mathbf{V}\), and \(\mathbf{D}\), while \(\mathbf{F}\) is assumed know; in fact, we consider \(\mathbf{F}=\mathbf{I}\).
We can then formulate the robust mean-variance problem as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \underset{\mathbf{V}\in\mathcal{S}_{\mathbf{V}},\mathbf{D}\in\mathcal{S}_{\mathbf{D}}}{\textsf{max}} \mathbf{w}^{T}\left(\mathbf{V}^{T}\mathbf{V}+\mathbf{D}\right)\mathbf{w}\\ \textsf{subject to} & \underset{\boldsymbol{\mu}\in\mathcal{S}_{\boldsymbol{\mu}}}{\textsf{min}}\mathbf{w}^{T}\boldsymbol{\mu}\geq\beta\\ & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
We will define the uncertainty as follows: \[\begin{aligned} \mathcal{S}_{\boldsymbol{\mu}} &= \left\{ \boldsymbol{\mu}\mid\boldsymbol{\mu}=\boldsymbol{\mu}_{0}+\boldsymbol{\delta},\,\left|\delta_{i}\right|\leq\gamma_{i},\,i-1,\ldots,M\right\}\\ \mathcal{S}_{\mathbf{V}} &= \left\{ \mathbf{V}\mid\mathbf{V}=\mathbf{V}_{0}+\boldsymbol{\Delta},\,\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\rho\right\}\\ \mathcal{S}_{\mathbf{D}} &= \left\{ \mathbf{D}\mid\mathbf{D}=\mathsf{diag}\left(\mathbf{d}\right),\,d_{i}\in\left[\underline{d}_{i},\overline{d}_{i}\right],\,i-1,\ldots,M\right\} . \end{aligned}\]
Let’s now elaborate on each of the inner optimizations…
First, consider the worst case mean return: \[\underset{\boldsymbol{\mu}\in\mathcal{S}_{\boldsymbol{\mu}}}{\textsf{min}}\;\boldsymbol{\mu}^{T}\mathbf{w}=\boldsymbol{\mu}_{0}^{T}\mathbf{w}+\underset{\left|\delta_{i}\right|\leq\gamma_{i}}{\textsf{min}}\boldsymbol{\delta}^{T}\mathbf{w}=\boldsymbol{\mu}_{0}^{T}\mathbf{w}-\boldsymbol{\gamma}^{T}\left|\mathbf{w}\right|\] which is a concave function.
Second, let’s turn to the second term of the worst-case variance: \[\underset{\mathbf{D}\in\mathcal{S}_{\mathbf{D}}}{\textsf{max}} \;\mathbf{w}^{T}\mathbf{D}\mathbf{w}=\underset{d_{i}\in\left[\underline{d}_{i},\overline{d}_{i}\right]}{\textsf{max}} \sum_{i=1}^{M}d_{i}w_{i}^{2}=\sum_{i=1}^{M}\overline{d}_{i}w_{i}^{2}=\mathbf{w}^{T}\overline{\mathbf{D}}\mathbf{w}.\]
Finally, let’s focus on the first term of the worst-case variance: \[\underset{\mathbf{V}\in\mathcal{S}_{\mathbf{V}}}{\textsf{max}}\; \mathbf{w}^{T}\mathbf{V}^{T}\mathbf{V}\mathbf{w}\equiv\underset{\left\Vert \boldsymbol{\Delta}\right\Vert _{F}\leq\rho}{\textsf{max}} \left\Vert \mathbf{V}_{0}\mathbf{w}+\boldsymbol{\Delta}\mathbf{w}\right\Vert =\left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\rho\left\Vert \mathbf{w}\right\Vert.\]
Proof: From the triangle inequality we have \[\begin{aligned} \left\Vert \mathbf{V}_{0}\mathbf{w}+\boldsymbol{\Delta}\mathbf{w}\right\Vert &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\left\Vert \boldsymbol{\Delta}\mathbf{w}\right\Vert\\ &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\sqrt{\mathbf{w}^{T}\boldsymbol{\Delta}^{T}\boldsymbol{\Delta}\mathbf{w}}\\ &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\left\Vert \mathbf{w}\right\Vert \left\Vert \boldsymbol{\Delta}\right\Vert _{F}\\ &\leq \left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert +\left\Vert \mathbf{w}\right\Vert \rho \end{aligned}\] but this upper bound is achievable by the worst-case variable \[\boldsymbol{\Delta}=\mathbf{u}\frac{\mathbf{w}^{T}}{\left\Vert \mathbf{w}\right\Vert }\rho\] where \[\mathbf{u}=\left\{ \begin{array}{lll} \frac{\mathbf{V}_{0}\mathbf{w}}{\left\Vert \mathbf{V}_{0}\mathbf{w}\right\Vert } & & \text{if }\mathbf{V}_{0}\mathbf{w}\neq\mathbf{0}\\ \text{any unitary vector} & & \text{otherwise}. \end{array}\right.\]
Recall the maximum Sharpe ratio portfolio formulation in convex form: \[\begin{array}{ll} \underset{\tilde{\mathbf{w}}}{\textsf{minimize}} & \tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}\\ \textsf{subject to} & \tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})=1\\ & \mathbf{1}^{T}\tilde{\mathbf{w}}\geq0\\ & \tilde{\mathbf{w}}\geq\mathbf{0}. \end{array}\]
The portfolio is then obtained with the correct scaling factor as \[\mathbf{w}=\tilde{\mathbf{w}}/(\mathbf{1}^{T}\tilde{\mathbf{w}}).\]
Naive optimization: optimization problems formulated assuming that the parameters are perfectly known when they are not.
👉 the naive solution \(\mathbf{x}^{\star}(\hat{\boldsymbol{\theta}})\) may totally differ from the desired one \(\mathbf{x}^{\star}\left(\boldsymbol{\theta}\right)\) (or not, depends on the type of problem)
Optimization under uncertainty of parameters:
For more information visit: https://www.danielppalomar.com
Chopra, V., & Ziemba, W. (1993). The effect of errors in means, variances and covariances on optimal portfolio choice. Journal of Portfolio Management.
Cornuejols, G., & Tütüncü, R. (2006). Optimization methods in finance. Cambridge University Press.
Fabozzi, F. J. (2007). Robust portfolio optimization and management. Wiley.
Lobo, M. S., & Boyd, S. (2000). The worst-case risk of a portfolio.
Markowitz, H. (1952). Portfolio selection. J. Financ., 7(1), 77–91.
Zhao, Z., Zhou, R., Palomar, D. P., & Feng, Y. (2019). Portfolio optimization. submitted.