Portfolio Optimization

MAFS6010R - Portfolio Optimization with R
MSc in Financial Mathematics
The Hong Kong University of Science and Technology (HKUST)
Fall 2019-20

Outline

Primer on Financial Data
Modeling the Returns
Portfolio Basics
Heuristic Portfolios
Markowitz’s Modern Portfolio Theory (MPT)
- Mean-variance portfolio (MVP)
- Global minimum variance portfolio (GMVP)
- Maximum Sharpe ratio portfolio (MSRP)
Risk-Based Portfolios (GMVP, IVP, RPP, MDP, MDCP)
Comparison of Portfolios
Conclusions

Primer on Financial Data

Asset log-prices

Let \(p_{t}\) be the price of an asset at (discrete) time index \(t\).
The fundamental model is based on modeling the log-prices \(y_{t}\triangleq\log p_{t}\) as a random walk: \[y_{t}=\mu+y_{t-1}+\epsilon_{t}\]

Asset returns

For stocks, returns are used for the modeling since they are “stationary” (as opposed to the previous random walk).
Simple return (a.k.a. linear or net return) is \[R_{t} \triangleq\frac{p_{t}-p_{t-1}}{p_{t-1}}=\frac{p_{t}}{p_{t-1}}-1.\]
Log-return (a.k.a. continuously compounded return) is \[r_{t} \triangleq y_{t}-y_{t-1}=\log\frac{p_{t}}{p_{t-1}}=\log\left(1+R_{t}\right).\]
Observe that the log-return is “stationary”: \[r_{t}=y_{t}-y_{t-1}=\mu+\epsilon_{t}\]
Note that \(r_{t}\approx R_{t}\) when \(R_{t}\) is small (i.e., the changes in \(p_t\) are small) (Ruppert 2010).

Log-returns vs simple returns

S&P 500 index - Log-returns

Autocorrelation

ACF of S&P 500 log-returns:

Autocorrelation

ACF of S&P 500 absolute value of log-returns:

Non-Gaussianity and asymmetry

Histograms of S&P 500 log-returns:

Asymmetry

Skewness of S&P 500 log-returns vs frequency (dangerous!):

Heavy-tailness

QQ plots of S&P 500 log-returns:

Heavy-tailness vs frequency

Kurtosis of S&P 500 log-returns vs frequency:

Volatility clustering

S&P 500 log-returns:

Volatility clustering removed

Standardized S&P 500 log-returns:

Conditional heavy-tailness

QQ plots of standardized S&P 500 log-returns (conditional heavy-tailness aka aggregational Gaussianity):

Frequency of data

Low frequency (weekly, monthly): Gaussian distributions seems to fit reality after correcting for volatility clustering (except for the asymmetry), but the nonstationarity is a big issue

Medium frequency (daily): definitely heavy tails even after correcting for volatility clustering, as well as asymmetry

High frequency (intraday, 30min, 5min, tick-data): below 5min the noise microstructure starts to reveal itself

Modeling the Returns

Returns of the universe

In practice, we don’t just deal with one asset but with a whole universe of \(N\) assets.
We denote the log-returns of the \(N\) assets at time t with the vector \(\mathbf{r}_{t}\in\mathbb{R}^{N}\).
The time index \(t\) can denote any arbitrary period such as days, weeks, months, 5-min intervals, etc.
\(\mathcal{F}_{t-1}\) denotes the previous historical data.
Econometrics aims at modeling \(\mathbf{r}_{t}\) conditional on \(\mathcal{F}_{t-1}\).
\(\mathbf{r}_{t}\) is a multivariate stochastic process with conditional mean and covariance matrix denoted as (Feng and Palomar 2016) \[\begin{aligned} \boldsymbol{\mu}_{t} &\triangleq\textsf{E}\left[\mathbf{r}_{t}\mid\mathcal{F}_{t-1}\right]\\ \boldsymbol{\Sigma}_{t} &\triangleq\textsf{Cov}\left[\mathbf{r}_{t}\mid\mathcal{F}_{t-1}\right]=\textsf{E}\left[(\mathbf{r}_{t}-\boldsymbol{\mu}_{t})(\mathbf{r}_{t}-\boldsymbol{\mu}_{t})^{T}\mid\mathcal{F}_{t-1}\right]. \end{aligned}\]

i.i.d. model

For simplicity we will assume that \(\mathbf{r}_{t}\) follows an i.i.d. distribution (which is not very innacurate in general).

That is, both the conditional mean and conditional covariance are constant: \[\begin{aligned} \boldsymbol{\mu}_{t} &= \boldsymbol{\mu},\\ \boldsymbol{\Sigma}_{t} &= \boldsymbol{\Sigma}. \end{aligned}\]

Very simple model, however, it is one of the most fundamental assumptions for many important works, e.g., the Nobel prize-winning Markowitz’s portfolio theory (Markowitz 1952).

Factor models

Factor models are special cases of the i.i.d. model with the covariance matrix being decomposed into two parts: low dimensional factors and marginal noise.
The factor model is \[\mathbf{r}_{t}=\boldsymbol{\alpha}+\mathbf{B}\mathbf{f}_{t}+\mathbf{w}_{t},\] where
- \(\boldsymbol{\alpha}\) denotes a constant vector
- \(\mathbf{f}_{t}\in\mathbb{R}^{K}\) with \(K\ll N\) is a vector of a few factors that are responsible for most of the randomness in the market
- \(\mathbf{B}\in\mathbb{R}^{N\times K}\) denotes how the low dimensional factors affect the higher dimensional market assets
- \(\mathbf{w}_{t}\) is a white noise residual vector that has only a marginal effect.
The factors can be explicit or implicit.
Widely used by practitioners (they buy factors at a high premium).
Connections with Principal Component Analysis (PCA) (Jolliffe 2002).

Time-series models

The previous models are i.i.d., but there are hundreds of other models attempting to capture the time correlation or time structure of the returns, as well as the volatility clustering or heteroskedasticity.
To capture the time correlation we have mean models: VAR, VMA, VARMA, VARIMA, VECM, etc.
To capture the volatility clustering we have covariance models: ARCH, GARCH, VEC, DVEC, BEKK, CCC, DCC, etc.
Standard textbook references (Lütkepohl 2007; Tsay 2005, 2013):

📘 H. Lutkepohl. New Introduction to Multiple Time Series Analysis. Springer, 2007.

📘 R. S. Tsay. Multivariate Time Series Analysis: With R and Financial Applications. John Wiley & Sons, 2013.
Simple introductory reference (Feng and Palomar 2016):

📘 Y. Feng and D. P. Palomar. A Signal Processing Perspective on Financial Engineering. Foundations and Trends in Signal Processing, Now Publishers, 2016.

Fitting process

Before we can use a model, we need to estimate the model parameters (for example, in the i.i.d. model: \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\)) using a training set.
Then use cross-validation to select the best fit (assuming we have different possible models each with a different fit).
Finally, we can use the best fitted model in the test data (aka out-of-sample data) for performance evaluation.

Be careful: if you use the test data many times, basically it is not out-of-sample anymore but cross-validation!

Sample estimates

Consider the i.i.d. model: \[\mathbf{r}_{t}=\boldsymbol{\mu}+\mathbf{w}_{t},\] where \(\boldsymbol{\mu}\in\mathbb{R}^{N}\) is the mean and \(\mathbf{w}_{t}\in\mathbb{R}^{N}\) is an i.i.d. process with zero mean and constant covariance matrix \(\boldsymbol{\Sigma}\).
The mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\) have to be estimated in practice based on \(T\) observations.
The simplest estimators are the sample estimators:
- sample mean: \(\quad\hat{\boldsymbol{\mu}} =\frac{1}{T}\sum_{t=1}^{T}\mathbf{r}_{t}\)
- sample covariance matrix: \(\quad\hat{\boldsymbol{\Sigma}} =\frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}})(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}})^{T}.\)
  Note that the factor \(1/\left(T-1\right)\) is used instead of \(1/T\) to get an unbiased estimator (asymptotically for \(T\rightarrow\infty\) they coincide).
Many more sophisticated estimators exist, namely: shrinkage estimators, Black-Litterman estimators, etc.

Least-Square (LS) estimator

Minimize the least-square error in the \(T\) observed i.i.d. samples: \[\underset{\boldsymbol{\mu}}{\textsf{minimize}} \quad\frac{1}{T}\sum_{t=1}^{T}\left\Vert \mathbf{r}_{t}-\boldsymbol{\mu}\right\Vert _{2}^{2}.\]
The optimal solution is the sample mean: \[\hat{\boldsymbol{\mu}}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{r}_{t}.\]
The sample covariance of the residuals \(\hat{\mathbf{w}}_{t}=\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\) is the sample covariance matrix: \[\hat{\boldsymbol{\Sigma}}=\frac{1}{T-1}\sum_{t=1}^{T}\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)^{T}.\]

Maximum Likelihood Estimator (MLE)

Assume \(\mathbf{r}_{t}\) are i.i.d. and follow a Gaussian distribution: \[f(\mathbf{r}) =\frac{1}{\sqrt{\left(2\pi\right)^{N}\left|\boldsymbol{\Sigma}\right|}}e^{-\frac{1}{2}(\mathbf{r}-\boldsymbol{\mu})^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{r}-\boldsymbol{\mu})}.\] where
- \(\boldsymbol{\mu}\in\mathbb{R}^{N}\) is a mean vector that gives the location
- \(\boldsymbol{\Sigma}\in\mathbb{R}^{N\times N}\) is a positive definite covariance matrix that defines the shape.

MLE

Given the \(T\) i.i.d. samples \(\mathbf{r}_{t}, \;t=1,\ldots,T,\) the negative log-likelihood function is \[\begin{aligned} \ell(\boldsymbol{\mu},\boldsymbol{\Sigma}) &= -\log\prod_{t=1}^{T}f(\mathbf{r}_{t})\\ &=\frac{T}{2}\log\left|\boldsymbol{\Sigma}\right| + \sum_{t=1}^{T}\frac{1}{2}(\mathbf{r}_{t}-\boldsymbol{\mu})^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{r}_{t}-\boldsymbol{\mu})+\text{const}. \end{aligned}\]
Setting the derivative of \(\ell(\boldsymbol{\mu},\boldsymbol{\Sigma})\) w.r.t. \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}^{-1}\) to zeros and solving the equations yield: \[\begin{aligned} \hat{\boldsymbol{\mu}} &= \frac{1}{T}\sum_{t=1}^{T}\mathbf{r}_{t}\\ \hat{\boldsymbol{\Sigma}} &= \frac{1}{T}\sum_{t=1}^{T}\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)^{T}. \end{aligned}\]

Parameter estimation

The parameter estimates \(\hat{\boldsymbol{\mu}}\) and \(\hat{\boldsymbol{\Sigma}}\) are only good for large \(T\), otherwise the estimation error is unacceptable.
For instance, the sample mean is particularly a very inefficient estimator, with very noisy estimates (Meucci 2005).
In practice, \(T\) cannot be large enough due to either:
- unavailability of data or
- lack of stationarity of data.
As a consequence, the estimates contain too much estimation error and a portfolio design (e.g., Markowitz mean-variance) based on those estimates can be severely affected (Chopra and Ziemba 1993).
Indeed, this is why Markowitz’s portfolio and other extensions are rarely used by practitioners.

Portfolio Basics

Portfolio return

Suppose the capital budget is \(B\) dollars.
The portfolio \(\mathbf{w}\in\mathbb{R}^{N}\) denotes the normalized dollar weights of the \(N\) assets such that \(\mathbf{1}^{T}\mathbf{w}=1\) (so \(B\mathbf{w}\) denotes dollars invested in the assets).
For each asset \(i\), the initial wealth is \(Bw_{i}\) and the end wealth is \[Bw_{i}\left(p_{i,t}/p_{i,t-1}\right)=Bw_{i}\left(R_{it}+1\right).\]
Then the portfolio return is \[R_{t}^{p}= \frac{\sum_{i=1}^{N}Bw_{i}\left(R_{it}+1\right)-B}{B}=\sum_{i=1}^{N}w_{i}R_{it}\approx\sum_{i=1}^{N}w_{i}r_{it}=\mathbf{w}^{T}\mathbf{r}_{t}\]
The portfolio expected return and variance are \(\mathbf{w}^{T}\boldsymbol{\mu}\) and \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\), respectively.

Performance measures

Expected return: \(\mathbf{w}^{T}\boldsymbol{\mu}\)
Volatility: \(\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\)
Sharpe Ratio (SR): expected return per unit of risk \[\mathsf{SR} =\frac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\] where \(r_{f}\) is the risk-free rate (e.g., interest rate on a three-month U.S. Treasury bill).
Information Ratio (IR): SR with \(r_{f}=0\).
Drawdown: decline from a historical peak of the cumulative profit \(X(t)\): \[D(T)=\max_{t\in[0,T]}X(t)-X(T)\]
VaR (Value at Risk): quantile of the loss.
ES (Expected Shortfall) or CVaR (Conditional Value at Risk): expected value of the loss above some quantile.

Practical constraints

Capital budget constraint: \[\mathbf{1}^T\mathbf{w} = 1.\]
Long-only constraint: \[\mathbf{w} \geq 0.\]
Dollar-neutral or self-financing constraint: \[\mathbf{1}^T\mathbf{w} = 0.\]
Holding constraint: \[\mathbf{l}\leq\mathbf{w}\leq \mathbf{u}\] where \(\mathbf{l}\in\mathbb{R}^{N}\) and \(\mathbf{u}\in\mathbb{R}^{N}\) are lower and upper bounds of the asset positions, respectively.

Practical constraints

Leverage constraint: \[\left\Vert \mathbf{w}\right\Vert _{1}\leq L.\]
Cardinality constraint: \[\left\Vert \mathbf{w}\right\Vert _{0} \leq K.\]
Turnover constraint: \[\left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1} \leq u\] where \(\mathbf{w}_{0}\) is the currently held portfolio.

Market-neutral constraint: \[\boldsymbol{\beta}^T\mathbf{w} = 0.\]

Leverage and margin requirement

The capital budget constraint \(\mathbf{1}^T\mathbf{w}=1\) means investor can use all the cash from short-selling to buy stocks.

In practice, one always trades stocks via a broker with some margin requirements on the positions (Jacobs et al. 2005). For example:
- long positions: one may borrow as much as 50% of the value of the position from the broker;
- short positions: the cash from short-selling is controlled by the broker as cash collateral and one is also required to have some equity as the “initial margin” to establish the positions, in general, at least 50% of the short value.

Thus, the margin requirement turns out to be \(0.5\mathbf{1}^{T}\mathbf{w}^{+}+0.5\mathbf{1}^{T}\mathbf{w}^{-}\leq1\) or , equivalently, \(\left\Vert \mathbf{w}\right\Vert _{1}\leq2\).

Portfolio return with leverage

Without leverage, the unnormalized portfolio \(\mathbf{w}\) satisfies \(\mathbf{1}^{T}\mathbf{w}=B\), but with a leverage of \(L\) it is \(\mathbf{1}^{T}\mathbf{w}=B\times L\).
At the time of the investment in the market: the portfolio held is \(\mathbf{w}\), the cash (which may be negative if there is leverage) is \(c=B-\mathbf{1}^{T}\mathbf{w}\), and the net asset value (NAV) is \(\textsf{NAV}=c+\mathbf{1}^{T}\mathbf{w}=B\).
After one period, the prices have changed, so the cash is the same \(c_{\textsf{new}}=c\), but the portfolio held now is \(\mathbf{w}_{\textsf{new}}=\mathbf{w}\times\mathbf{p}_{t}/\mathbf{p}_{t-1}\). So the NAV is \[\textsf{NAV}_{\textsf{new}}=c_{\textsf{new}}+\mathbf{1}^{T}\mathbf{w}_{\textsf{new}}=B-\mathbf{1}^{T}\mathbf{w}+\mathbf{w}^{T}\left(\mathbf{p}_{t}/\mathbf{p}_{t-1}\right)=B+\mathbf{w}^{T}\mathbf{r}_{t}.\]
The portfolio return can be computed as \[R_{t}^{p}=\frac{\textsf{NAV}_{\textsf{new}}-\textsf{NAV}}{\textsf{NAV}}=\frac{B+\mathbf{w}^{T}\mathbf{r}_{t}-B}{B}=\frac{\mathbf{w}^{T}\mathbf{r}_{t}}{B}=\bar{\mathbf{w}}^{T}\mathbf{r}_{t}\] where \(\bar{\mathbf{w}}\) denotes the normalized portfolio \(\bar{\mathbf{w}}=\mathbf{w}/B\). Same as before!

Linear returns vs log-returns revisited

Recall that we have modeled the log prices \(\mathbf{y}_{t}=\log\mathbf{p}_{t}\) as a random walk and, hence, the log-returns \(\mathbf{r}_{t}=\mathbf{y}_{t}-\mathbf{y}_{t-1}\) as an i.i.d. process with mean \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\).
However, for portfolio return the appropriate returns to be used are the linear returns \(R_{i,t}=p_{i,t}/p_{i,t-1}-1\) (recall \(r_{i,t}=\log\left(1+R_{i,t}\right))\): \[R_{t}^{p}=\mathbf{w}^{T}\mathbf{R}_{t}\approx\mathbf{w}^{T}\mathbf{r}_{t}.\]
Thus, when we formulate the portfolio optimization problem, we should use the moments of the linear returns: \(\textsf{E}\left[\mathbf{R}_{t}\mid\mathcal{F}_{t-1}\right]\) and \(\textsf{Cov}\left[\mathbf{R}_{t}\mid\mathcal{F}_{t-1}\right]\) instead of the ones for the log-returns \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\).
Fortunately, they can be related as follows: \[\begin{aligned} \mathbf{R}_{t} &= \exp\left(\mathbf{r}_{t}\right)-1 \\ \textsf{E}\left[\mathbf{R}_{t}\mid\mathcal{F}_{t-1}\right] & =\exp\left(\boldsymbol{\mu}+\frac{1}{2}\textsf{diag}\left(\boldsymbol{\Sigma}\right)\right)-1\\ \textsf{Cov}\left[\mathbf{R}_{t}\mid\mathcal{F}_{t-1}\right]_{ij} &= \exp\left(\mu_{i}+\mu_{j}+\frac{1}{2}\left(\Sigma_{ii}+\Sigma_{jj}\right)\right)\times\left(\exp\left(\Sigma_{ij}\right)-1\right). \end{aligned}\]

R session: Loading market data

We will load some stock market data and divide it into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation).

In particular, we will start by loading some stock data from three different sectors:

sector technology: AAPL, AMD, ADI.
sector healthcare: ABBV, AEZS, A.
sector basic materials: APD, AA, CF.

library(xts)  # to manipulate time series of stock data
library(quantmod)  # to download stock data
library(PerformanceAnalytics)  # to compute performance measures

# download data from YahooFinance
stock_namelist <- c("AAPL", "AMD", "ADI",  "ABBV", "AEZS", "A",  "APD", "AA","CF")
prices <- xts()
for (i in 1:length(stock_namelist)) {
  tmp <- Ad(getSymbols(stock_namelist[i], from = "2013-01-01", to = "2016-12-31", auto.assign = FALSE))
  tmp <- na.approx(tmp, na.rm = FALSE)  # interpolate NAs
  prices <- cbind(prices, tmp)
}
colnames(prices) <- stock_namelist
indexClass(prices) <- "Date"

str(prices)

R>> An 'xts' object on 2013-01-02/2016-12-30 containing:
R>>   Data: num [1:1008, 1:9] 55.5 54.8 53.2 52.9 53.1 ...
R>>  - attr(*, "dimnames")=List of 2
R>>   ..$ : NULL
R>>   ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ...
R>>   Indexed by objects of class: [Date] TZ: UTC
R>>   xts Attributes:  
R>>  NULL

head(prices)

R>>                AAPL  AMD      ADI     ABBV AEZS        A      APD       AA       CF
R>> 2013-01-02 55.47129 2.53 37.50844 27.60817  253 27.97831 66.43760 20.62187 29.44429
R>> 2013-01-03 54.77112 2.49 36.90318 27.38020  254 28.07852 66.20545 20.80537 29.30506
R>> 2013-01-04 53.24548 2.59 36.24679 27.03431  257 28.63300 67.09529 21.24121 29.96146
R>> 2013-01-07 52.93227 2.67 36.35761 27.08934  259 28.42591 67.03341 20.87419 29.84922
R>> 2013-01-08 53.07474 2.67 35.98254 26.49976  255 28.19877 67.15720 20.87419 29.41162
R>> 2013-01-09 52.24524 2.63 35.88875 26.64913  258 28.96036 68.06254 20.82831 30.44025

tail(prices)

R>>                AAPL   AMD      ADI     ABBV AEZS        A      APD    AA       CF
R>> 2016-12-22 111.8445 11.60 70.07727 55.82264 4.10 44.91877 136.0360 29.75 27.82244
R>> 2016-12-23 112.0657 11.58 70.44936 56.43826 4.10 45.14350 136.3992 29.71 28.37686
R>> 2016-12-27 112.7774 12.07 70.89778 56.58311 4.05 45.44642 137.2373 29.65 29.52265
R>> 2016-12-28 112.2965 11.55 70.18222 56.37488 3.55 44.67448 135.1328 29.43 29.22696
R>> 2016-12-29 112.2676 11.59 70.20129 56.79134 3.60 44.72544 135.2359 28.89 29.47645
R>> 2016-12-30 111.3924 11.34 69.28539 56.69175 3.60 44.64704 134.7206 28.08 29.08836

# compute log-returns and linear returns
X_log <- diff(log(prices))[-1]
X_lin <- (prices/lag(prices) - 1)[-1]

# or alternatively...
X_log <- CalculateReturns(prices, "log")[-1]
X_lin <- CalculateReturns(prices)[-1]

N <- ncol(X_log)  # number of stocks
T <- nrow(X_log)  # number of days

We can take a look at the prices of the stocks:

plot(prices/rep(prices[1, ], each = nrow(prices)), col = rainbow10equal, legend.loc = "topleft",
     main = "Normalized prices")

We now divide the data into a training set and test set:

T_trn <- round(0.7*T)  # 70% of data
X_log_trn <- X_log[1:T_trn, ]
X_log_tst <- X_log[(T_trn+1):T, ]
X_lin_trn <- X_lin[1:T_trn, ]
X_lin_tst <- X_lin[(T_trn+1):T, ]

R session: Linear vs log daily returns

We can now use the training set to obtain the sample estimates from the returns \(\mathbf{x}_t\) (i.e., sample mean and sample covariance matrix) as \[ \begin{aligned} \hat{\boldsymbol{\mu}} & = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t\\ \hat{\boldsymbol{\Sigma}} & = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})(\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T \end{aligned} \]

However, it is not totally clear whether we should use linear returns or log returns to estimate \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\). Clearly, for the portfolio design we need the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\) of the linear returns. There are three different main philosophies in the estimation procedure for \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\):

estimate them directly from the linear returns (even though linear returns are not supposed to be easily modeled): \(\hat{\boldsymbol{\mu}} = \hat{\boldsymbol{\mu}}^\textsf{lin}\) and \(\hat{\boldsymbol{\Sigma}} = \hat{\boldsymbol{\Sigma}}^\textsf{lin}\)
estimate them from the log returns (and ignore the approximation error): \(\hat{\boldsymbol{\mu}} = \hat{\boldsymbol{\mu}}^\textsf{log}\) and \(\hat{\boldsymbol{\Sigma}} = \hat{\boldsymbol{\Sigma}}^\textsf{log}\)
estimate them from the log returns but properly transforming them to linear: \[ \begin{align} \hat{\boldsymbol{\mu}} & = \exp\left( \hat{\boldsymbol{\mu}}^\textsf{log} + \frac{1}{2}\textsf{diag}(\hat{\boldsymbol{\Sigma}}^\textsf{log})\right) - 1\\ \hat{\Sigma}_{ij} & = \exp\left( \hat{\mu}_i^\textsf{log} + \hat{\mu}_j^\textsf{log} + \frac{1}{2}(\hat{\Sigma}_{ii}^\textsf{log} + \hat{\Sigma}_{jj}^\textsf{log}) \right) \times (\exp(\hat{\Sigma}_{ij}^\textsf{log}) - 1) \end{align} \]

In the following, we will explore these three different ways for daily returns. For the comparison we will use the simple Global Minimum Variance Portfolio (GMVP), \[ \mathbf{w}_\textsf{GMVP} = \frac{1}{\mathbf{1}^T\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}, \] as well as the mean-variance portfolio (MVP) as the solution to the problem: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

# Method 1: estimate directly from linear returns
mu_lin <- colMeans(X_lin_trn)
Xc <- X_lin_trn - matrix(mu_lin, T_trn, N, byrow = TRUE)  # remove mean (center data)
Sigma_lin <- 1/(T_trn-1) * t(Xc) %*% Xc
# or more conveniently:
Sigma_lin_ <- cov(X_lin_trn)
norm(Sigma_lin - Sigma_lin_, "F")  # sanity check

R>> [1] 4.019848e-18

# Method 2: estimate directly from log returns
mu_log <- colMeans(X_log_trn)
Sigma_log <- cov(X_log_trn)

# Method 3: estimate from log returns plus transformation
momentsReturnLog2Lin <- function(mu, Sigma) {
  N <- ncol(Sigma)
  mu_ <- exp(mu + 0.5*diag(Sigma)) - 1
  Sigma_ <- matrix(NA, nrow = N, ncol = N)
  for(ii in 1:N)
    for(jj in 1:N)
      Sigma_[ii, jj] <- exp(mu[ii] + mu[jj] + 0.5*(Sigma[ii, ii]+Sigma[jj, jj])) * (exp(Sigma[ii, jj])-1)
  return( list(mu=mu_, Sigma=Sigma_) )
}

tmp <- momentsReturnLog2Lin(mu_log, Sigma_log)
mu_log_trans <- tmp$mu
Sigma_log_trans <- tmp$Sigma

Now let’s compute and compare the three GMVP portfolios:

# create function for GMVP
GMVP <- function(Sigma) {
  ones <- rep(1, nrow(Sigma))
  Sigma_inv_1 <- solve(Sigma, ones)  #same as: inv(Sigma) %*% ones
  w <- (1/as.numeric(ones %*% Sigma_inv_1)) * Sigma_inv_1
  return(w)
}

# compute the three versions of GMVP
w_GMVP_lin <- GMVP(Sigma_lin)
w_GMVP_log <- GMVP(Sigma_log)
w_GMVP_log_trans <- GMVP(Sigma_log_trans)
w_GMVP_all <- cbind("GMVP_lin"       = w_GMVP_lin, 
                    "GMVP_log"       = w_GMVP_log, 
                    "GMVP_log_trans" = w_GMVP_log_trans)
w_GMVP_all

R>>        GMVP_lin   GMVP_log GMVP_log_trans
R>> AAPL 0.18865586 0.18725916     0.18703112
R>> AMD  0.01608760 0.01571003     0.01575742
R>> ADI  0.08229133 0.08404712     0.08407072
R>> ABBV 0.13928582 0.13836379     0.13818423
R>> AEZS 0.01377123 0.01452367     0.01463238
R>> A    0.14917832 0.14789974     0.14821945
R>> APD  0.27735998 0.28026594     0.28007130
R>> AA   0.03009397 0.03032780     0.03047517
R>> CF   0.10327589 0.10160275     0.10155822

# plot to compare the allocations
barplot(t(w_GMVP_all), col = rainbow8equal[1:3], legend = colnames(w_GMVP_all), 
        main = "GMVP allocation", xlab = "stocks", ylab = "dollars", beside = TRUE)

And now the three MVP portfolios:

library(CVXR)

# create function for MVP
MVP <- function(mu, Sigma, lmd = 0.5) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)),
                  constraints = list(sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# compute the three versions of MVP
w_MVP_lin <- MVP(mu_lin, Sigma_lin)
w_MVP_log <- MVP(mu_log, Sigma_log)
w_MVP_log_trans <- MVP(mu_log_trans, Sigma_log_trans)
w_MVP_all <- cbind("MVP_lin"       = w_MVP_lin, 
                   "MVP_log"       = w_MVP_log, 
                   "MVP_log_trans" = w_MVP_log_trans)
w_MVP_all

R>>           MVP_lin     MVP_log MVP_log_trans
R>>  [1,]  1.82349683  1.78880418     1.8028310
R>>  [2,] -0.44019876 -0.80020154    -0.4097064
R>>  [3,] -0.08553791  0.03403578    -0.0881237
R>>  [4,]  1.81463970  1.96492811     1.8217410
R>>  [5,] -0.64990649 -1.30141524    -0.8394611
R>>  [6,] -2.53548501 -2.23254728    -2.5396476
R>>  [7,]  1.86916151  2.40358620     1.9655884
R>>  [8,] -1.19378174 -1.20062371    -1.1803529
R>>  [9,]  0.39761187  0.34343349     0.4671312

# plot to compare the allocations
barplot(t(w_MVP_all), col = rainbow8equal[1:3], 
        legend = colnames(w_MVP_all), args.legend = list(x = "bottomright", inset = 0.04), 
        main = "GMVP allocation", xlab = "stocks", ylab = "dollars", beside = TRUE)

Finally, we can do an in-sample evaluation (no need to use the test sample for this):

# compute returns of the portfolios
ret_GMVP_all_trn <- xts(X_lin_trn %*% w_GMVP_all, index(X_lin_trn))
ret_MVP_all_trn <- xts(X_lin_trn %*% w_MVP_all, index(X_lin_trn))
# compare them
StdDev.annualized(ret_GMVP_all_trn)

R>>                                GMVP_lin  GMVP_log GMVP_log_trans
R>> Annualized Standard Deviation 0.1575621 0.1575663      0.1575672

SharpeRatio.annualized(ret_GMVP_all_trn)

R>>                                 GMVP_lin GMVP_log GMVP_log_trans
R>> Annualized Sharpe Ratio (Rf=0%) 1.193824 1.188617       1.187221

SharpeRatio.annualized(ret_MVP_all_trn)

R>>                                   MVP_lin MVP_log MVP_log_trans
R>> Annualized Sharpe Ratio (Rf=0%) 0.5270622     NaN           NaN

For the GMVP, which only uses the covariance matrix, clearly there is no apparent difference. This is not unexpected since the daily returns of the stocks are roughly within the interval [-0.04, 0.04] and the approximation \(r_t = \log(1 + R_t) \approx R_t\) is very accurate:

apply(X_lin_trn, 2, sd)

R>>       AAPL        AMD        ADI       ABBV       AEZS          A        APD         AA         CF 
R>> 0.01661471 0.03105342 0.01523081 0.01617068 0.07467274 0.01398818 0.01296057 0.01918079 0.01787465

However, for the mean-variance portfolio, which also uses the expected return, there is some small difference and the methods can be easily ranked:

the best method is to properly model the log-returns and then transform the moments to the linear return case
the second best method is to use directly the linear returns to estimate the moments
the worst method is to use the log-returns to estimate the moment and then use then as if they were the linear return moments.

Now, another question is whether these differences will become more evident for weekly or monthly rebalancing where the approximation \(r_t = \log(1 + R_t) \approx R_t\) may not be so accurate.

R session: Linear vs log weekly returns

Let’s repeat the previous analysis but using weekly returns:

periodicity(prices)

R>> Daily periodicity from 2013-01-02 to 2016-12-30

# change periodicity of prices
prices_weekly <- xts()
for (i in 1:ncol(prices))
  prices_weekly <- cbind(prices_weekly, Cl(to.weekly(prices[, i])))
colnames(prices_weekly) <- colnames(prices)
indexClass(prices_weekly) <- "Date"

periodicity(prices_weekly)

R>> Weekly periodicity from 2013-01-04 to 2016-12-30

head(prices_weekly)

R>>                AAPL  AMD      ADI     ABBV AEZS        A      APD       AA       CF
R>> 2013-01-04 53.24548 2.59 36.24679 27.03431  257 28.63300 67.09529 21.24121 29.96146
R>> 2013-01-11 52.56855 2.67 36.02515 26.92660  274 29.01381 68.37980 20.50717 30.99721
R>> 2013-01-18 50.51755 2.46 36.70712 29.68687  259 29.49481 68.62740 20.64481 31.12366
R>> 2013-01-25 44.44331 2.85 37.33794 29.90961  259 30.25639 68.37205 20.71362 32.71777
R>> 2013-02-01 45.83153 2.60 38.07959 29.57551  289 30.25639 68.65834 20.64481 32.67231
R>> 2013-02-08 50.01838 2.59 38.95763 28.83573  302 30.11611 68.36431 20.57622 32.13240

# recompute returns
X_weekly_log <- CalculateReturns(prices_weekly, "log")[-1]
X_weekly_lin <- CalculateReturns(prices_weekly)[-1]
T_weekly <- nrow(X_weekly_log)  # number of weeks

# split data into training and set data
T_weekly_trn <- round(0.7*T_weekly)  # 70% of data
X_weekly_log_trn <- X_weekly_log[1:T_weekly_trn, ]
X_weekly_log_tst <- X_weekly_log[(T_weekly_trn+1):T_weekly, ]
X_weekly_lin_trn <- X_weekly_lin[1:T_weekly_trn, ]
X_weekly_lin_tst <- X_weekly_lin[(T_weekly_trn+1):T_weekly, ]

# estimate mu and Sigma
mu_weekly_lin <- colMeans(X_weekly_lin_trn)
Sigma_weekly_lin <- cov(X_weekly_lin_trn)

mu_weekly_log <- colMeans(X_weekly_log_trn)
Sigma_weekly_log <- cov(X_weekly_log_trn)

tmp <- momentsReturnLog2Lin(mu_weekly_log, Sigma_weekly_log)
mu_weekly_log_trans <- tmp$mu
Sigma_weekly_log_trans <- tmp$Sigma

# compute GMVPs
w_GMVP_weekly_lin <- GMVP(Sigma_weekly_lin)
w_GMVP_weekly_log <- GMVP(Sigma_weekly_log)
w_GMVP_weekly_log_trans <- GMVP(Sigma_weekly_log_trans)
w_GMVP_weekly_all <- cbind("GMVP_lin"       = w_GMVP_weekly_lin, 
                           "GMVP_log"       = w_GMVP_weekly_log, 
                           "GMVP_log_trans" = w_GMVP_weekly_log_trans)
barplot(t(w_GMVP_weekly_all), col = rainbow8equal[1:3], legend = colnames(w_GMVP_weekly_all), 
        main = "GMVP allocation for weekly returns", xlab = "stocks", ylab = "dollars", beside = TRUE)

# compute MVPs
w_MVP_weekly_lin <- MVP(mu_weekly_lin, Sigma_weekly_lin)
w_MVP_weekly_log <- MVP(mu_weekly_log, Sigma_weekly_log)
w_MVP_weekly_log_trans <- MVP(mu_weekly_log_trans, Sigma_weekly_log_trans)
w_MVP_weekly_all <- cbind("MVP_lin"       = w_MVP_weekly_lin, 
                          "MVP_log"       = w_MVP_weekly_log, 
                          "MVP_log_trans" = w_MVP_weekly_log_trans)
barplot(t(w_GMVP_weekly_all), col = rainbow8equal[1:3], legend = colnames(w_GMVP_weekly_all), 
        main = "MVP allocation for weekly returns", xlab = "stocks", ylab = "dollars", beside = TRUE)

The allocations seem to be very similar.

Let’s look at the portfolios performance:

# compute weekly returns of portfolios
ret_GMVP_weekly_all_trn <- xts(X_weekly_lin_trn %*% w_GMVP_weekly_all, index(X_weekly_lin_trn))
ret_MVP_weekly_all_trn <- xts(X_weekly_lin_trn %*% w_MVP_weekly_all, index(X_weekly_lin_trn))

# compare them
StdDev.annualized(ret_GMVP_weekly_all_trn)

R>>                                GMVP_lin  GMVP_log GMVP_log_trans
R>> Annualized Standard Deviation 0.1495471 0.1495629      0.1495584

SharpeRatio(ret_MVP_weekly_all_trn, FUN = "StdDev")

R>>                                 MVP_lin   MVP_log MVP_log_trans
R>> StdDev Sharpe (Rf=0%, p=95%): 0.2223933 0.2138044       0.22141

Again, there is no significant difference for the GMVP. However, for the MVP (which involves also the expected return) there is some small difference.

R session: Linear vs log monthly returns

Let’s repeat the previous analysis but using monthly returns:

# change periodicity of prices
prices_monthly <- xts()
for (i in 1:ncol(prices))
  prices_monthly <- cbind(prices_monthly, Cl(to.monthly(prices[, i])))
colnames(prices_monthly) <- colnames(prices)
indexClass(prices_monthly) <- "Date"

# recompute returns
X_monthly_log <- CalculateReturns(prices_monthly, "log")[-1]
X_monthly_lin <- CalculateReturns(prices_monthly)[-1]
T_monthly <- nrow(X_monthly_log)  # number of months

# split data into training and set data
T_monthly_trn <- round(0.7*T_monthly)  # 70% of data
X_monthly_log_trn <- X_monthly_log[1:T_monthly_trn, ]
X_monthly_log_tst <- X_monthly_log[(T_monthly_trn+1):T_monthly, ]
X_monthly_lin_trn <- X_monthly_lin[1:T_monthly_trn, ]
X_monthly_lin_tst <- X_monthly_lin[(T_monthly_trn+1):T_monthly, ]

# estimate mu and Sigma
mu_monthly_lin <- colMeans(X_monthly_lin_trn)
Sigma_monthly_lin <- cov(X_monthly_lin_trn)

mu_monthly_log <- colMeans(X_monthly_log_trn)
Sigma_monthly_log <- cov(X_monthly_log_trn)

tmp <- momentsReturnLog2Lin(mu_monthly_log, Sigma_monthly_log)
mu_monthly_log_trans <- tmp$mu
Sigma_monthly_log_trans <- tmp$Sigma

# compute GMVPs
w_GMVP_monthly_lin <- GMVP(Sigma_monthly_lin)
w_GMVP_monthly_log <- GMVP(Sigma_monthly_log)
w_GMVP_monthly_log_trans <- GMVP(Sigma_monthly_log_trans)
w_GMVP_monthly_all <- cbind("GMVP_lin"       = w_GMVP_monthly_lin, 
                            "GMVP_log"       = w_GMVP_monthly_log, 
                            "GMVP_log_trans" = w_GMVP_monthly_log_trans)
barplot(t(w_GMVP_monthly_all), col = rainbow8equal[1:3], legend = colnames(w_GMVP_monthly_all), 
        main = "GMVP allocation for monthly returns", xlab = "stocks", ylab = "dollars", beside = TRUE)

# compute MVPs
w_MVP_monthly_lin <- MVP(mu_monthly_lin, Sigma_monthly_lin)
w_MVP_monthly_log <- MVP(mu_monthly_log, Sigma_monthly_log)
w_MVP_monthly_log_trans <- MVP(mu_monthly_log_trans, Sigma_monthly_log_trans)
w_MVP_monthly_all <- cbind("MVP_lin"       = w_MVP_monthly_lin, 
                           "MVP_log"       = w_MVP_monthly_log, 
                           "MVP_log_trans" = w_MVP_monthly_log_trans)
barplot(t(w_MVP_monthly_all), col = rainbow8equal[1:3], legend = colnames(w_MVP_monthly_all), 
        main = "MVP allocation for monthly returns", xlab = "stocks", ylab = "dollars", beside = TRUE)

The allocations seem to be very similar.

Let’s look at the portfolios performance:

# compute weekly returns of portfolios
ret_GMVP_monthly_all_trn <- xts(X_monthly_lin_trn %*% w_GMVP_monthly_all, index(X_monthly_lin_trn))
ret_MVP_monthly_all_trn <- xts(X_monthly_lin_trn %*% w_MVP_monthly_all, index(X_monthly_lin_trn))

# compare them
StdDev.annualized(ret_GMVP_monthly_all_trn)

R>>                                GMVP_lin  GMVP_log GMVP_log_trans
R>> Annualized Standard Deviation 0.1363082 0.1365655      0.1364513

SharpeRatio(ret_MVP_monthly_all_trn, FUN = "StdDev")

R>>                                 MVP_lin   MVP_log MVP_log_trans
R>> StdDev Sharpe (Rf=0%, p=95%): 0.6103515 0.6000131     0.5985821

Again, there is no significant difference.

Heuristic Portfolios

Heuristic portfolios

Heuristic portfolios are not formally derived from a sound mathematical foundation. Instead, they are intuitive and based on common sense.

We will explore the following simple and heuristic portfolios:
- Buy & Hold (B&H)
- Buy & Rebalance
- equally weighted portfolio (EWP) or \(1/N\) portfolio
- quintile portfolio
- global maximum return portfolio (GMRP).

Buy & Hold (B&H)

The simplest investment strategy consists of selecting just one asset, allocating the whole budget \(B\) to it:
- Buy & Hold (B&H): chooses one asset and sticks to it forever.
- Buy & Rebalance: chooses one asset but it reevaluates that choice regularly.
The belief behind such investment is that the asset will increase gradually in value over the investment period.
There is no diversification in this strategy.
One can use different methods (like fundamental analysis or technical analysis) to make the choice.
Mathematically, it can be expressed as \[\mathbf{w} = \mathbf{e}_i\] where \(\mathbf{e}_i\) denotes the canonical vector with a 1 on the \(i\)th position and 0 elsewhere.

Buy & Hold (B&H)

Cumulative PnL of 9 possible B&H for 9 assets:

R session: Buy & Hold (B&H)

We will estimate \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) from the in-sample log-returns:

mu <- colMeans(X_log_trn)
Sigma <- cov(X_log_trn)

Buy & Hold simply means that we allocate the whole budget to one stock and we stick to it. Since we have \(N=9\) stocks in our universe, we can define \(N=9\) different B&H portfolios, which we will store as column vectors.

# a B&H portfolio is trivially the zero vector with a one on the stock held
w_BnH <- diag(N)
rownames(w_BnH) <- colnames(X_lin)
colnames(w_BnH) <- paste0("B&H - ", colnames(X_lin))
w_BnH

R>>      B&H - AAPL B&H - AMD B&H - ADI B&H - ABBV B&H - AEZS B&H - A B&H - APD B&H - AA B&H - CF
R>> AAPL          1         0         0          0          0       0         0        0        0
R>> AMD           0         1         0          0          0       0         0        0        0
R>> ADI           0         0         1          0          0       0         0        0        0
R>> ABBV          0         0         0          1          0       0         0        0        0
R>> AEZS          0         0         0          0          1       0         0        0        0
R>> A             0         0         0          0          0       1         0        0        0
R>> APD           0         0         0          0          0       0         1        0        0
R>> AA            0         0         0          0          0       0         0        1        0
R>> CF            0         0         0          0          0       0         0        0        1

We can now compute the performance of those \(N=9\) portfolios in the training data with the package PerformanceAnalytics:

# compute returns of all B&H portfolios
ret_BnH <- xts(X_lin %*% w_BnH, index(X_lin))
ret_BnH_trn <- ret_BnH[1:T_trn, ]
ret_BnH_tst <- ret_BnH[-c(1:T_trn), ]
head(ret_BnH)

R>>              B&H - AAPL    B&H - AMD    B&H - ADI   B&H - ABBV   B&H - AEZS      B&H - A
R>> 2013-01-03 -0.012622240 -0.015810277 -0.016136502 -0.008257409  0.003952569  0.003581918
R>> 2013-01-04 -0.027854806  0.040160643 -0.017786948 -0.012632777  0.011811024  0.019747441
R>> 2013-01-07 -0.005882358  0.030888031  0.003057512  0.002035524  0.007782101 -0.007232493
R>> 2013-01-08  0.002691496  0.000000000 -0.010316326 -0.021764499 -0.015444015 -0.007990737
R>> 2013-01-09 -0.015628866 -0.014981273 -0.002606487  0.005636580  0.011764706  0.027007986
R>> 2013-01-10  0.012395942 -0.003802281  0.012114243  0.002949440 -0.003875969  0.007381675
R>>                B&H - APD     B&H - AA     B&H - CF
R>> 2013-01-03 -0.0034942112  0.008898612 -0.004728862
R>> 2013-01-04  0.0134405698  0.020948193  0.022398933
R>> 2013-01-07 -0.0009223002 -0.017278772 -0.003746079
R>> 2013-01-08  0.0018466463  0.000000000 -0.014660383
R>> 2013-01-09  0.0134809381 -0.002197690  0.034973559
R>> 2013-01-10 -0.0009096340 -0.012114520  0.015029411

# performance measures
library(PerformanceAnalytics)
t(table.AnnualizedReturns(ret_BnH_trn))

R>>            Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> B&H - AAPL            0.2629             0.2638                    0.9969
R>> B&H - AMD            -0.0773             0.4930                   -0.1568
R>> B&H - ADI             0.1517             0.2418                    0.6276
R>> B&H - ABBV            0.2167             0.2567                    0.8442
R>> B&H - AEZS           -0.7375             1.1854                   -0.6221
R>> B&H - A               0.0846             0.2221                    0.3808
R>> B&H - APD             0.2150             0.2057                    1.0448
R>> B&H - AA              0.0271             0.3045                    0.0891
R>> B&H - CF              0.1674             0.2838                    0.5901

t(table.AnnualizedReturns(ret_BnH_tst))

R>>            Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> B&H - AAPL            0.0375             0.2386                    0.1572
R>> B&H - AMD             3.2190             0.8017                    4.0153
R>> B&H - ADI             0.2000             0.2453                    0.8151
R>> B&H - ABBV            0.1532             0.3080                    0.4973
R>> B&H - AEZS           -0.3470             1.3677                   -0.2538
R>> B&H - A               0.2220             0.2329                    0.9532
R>> B&H - APD             0.1450             0.1917                    0.7561
R>> B&H - AA              0.2154             0.4755                    0.4531
R>> B&H - CF             -0.3103             0.5015                   -0.6188

Note how the in-sample (ex ante) performance is not maintained out-of-sample (ex post).

We can compute many other performance measures:

table.DownsideRisk(ret_BnH_trn)

R>> VaR calculation produces unreliable result (inverse risk) for column: 1 : -0.434364986114259

R>>                               B&H - AAPL B&H - AMD B&H - ADI B&H - ABBV B&H - AEZS B&H - A
R>> Semi Deviation                    0.0120    0.0221    0.0104     0.0120     0.0423  0.0099
R>> Gain Deviation                    0.0111    0.0231    0.0118     0.0095     0.0936  0.0091
R>> Loss Deviation                    0.0123    0.0237    0.0104     0.0117     0.0528  0.0093
R>> Downside Deviation (MAR=210%)     0.0162    0.0263    0.0149     0.0163     0.0471  0.0148
R>> Downside Deviation (Rf=0%)        0.0115    0.0220    0.0100     0.0115     0.0434  0.0097
R>> Downside Deviation (0%)           0.0115    0.0220    0.0100     0.0115     0.0434  0.0097
R>> Maximum Drawdown                  0.2586    0.6524    0.2639     0.2630     0.9845  0.2296
R>> Historical VaR (95%)             -0.0246   -0.0441   -0.0231    -0.0272    -0.0667 -0.0227
R>> Historical ES (95%)              -0.0370   -0.0744   -0.0324    -0.0377    -0.1512 -0.0304
R>> Modified VaR (95%)               -0.0264   -0.0474   -0.0192    -0.0273         NA -0.0217
R>> Modified ES (95%)                -0.0531   -0.0756   -0.0192    -0.0404    -0.0038 -0.0332
R>>                               B&H - APD B&H - AA B&H - CF
R>> Semi Deviation                   0.0087   0.0130   0.0120
R>> Gain Deviation                   0.0099   0.0145   0.0140
R>> Loss Deviation                   0.0081   0.0122   0.0117
R>> Downside Deviation (MAR=210%)    0.0134   0.0178   0.0165
R>> Downside Deviation (Rf=0%)       0.0083   0.0128   0.0116
R>> Downside Deviation (0%)          0.0083   0.0128   0.0116
R>> Maximum Drawdown                 0.1984   0.5359   0.3573
R>> Historical VaR (95%)            -0.0196  -0.0267  -0.0263
R>> Historical ES (95%)             -0.0268  -0.0417  -0.0386
R>> Modified VaR (95%)              -0.0170  -0.0276  -0.0222
R>> Modified ES (95%)               -0.0191  -0.0347  -0.0222

To compute the wealth or cumulative P&L, we have two options: one assumes the same quantity is repeateadly invested, whereas the other assumes reinvesting (compounding):

# compute cumulative wealth
wealth_arith_BnH_trn <- 1 + cumsum(ret_BnH_trn)  # initial budget of 1$
wealth_geom_BnH_trn <- cumprod(1 + ret_BnH_trn)  # initial budget of 1$

# plots
# same as: 
#   plot(wealth_arith_BnH_trn[, 1], main = "Buy & Hold performance (not compounded)", ylab = "wealth")
chart.CumReturns(ret_BnH_trn[, 1], main = "Buy & Hold performance (not compounded)", 
                 geometric = FALSE, wealth.index = TRUE)

# same as: 
#   plot(wealth_geom_BnH_trn[, 1], main = "Buy & Hold performance (compounded)", ylab = "wealth")
chart.CumReturns(ret_BnH_trn[, 1], main = "Buy & Hold performance (compounded)", 
                 geometric = TRUE, wealth.index = TRUE)

We can plot many more plots with the package PerformanceAnalytics:

# more plots
chart.CumReturns(ret_BnH, main = "Buy & Hold performance", 
                 wealth.index = TRUE, legend.loc = "topleft", colorset = rich10equal)

charts.PerformanceSummary(ret_BnH_trn, main = "Buy & Hold performance", 
                          wealth.index = TRUE, colorset = rich10equal)

chart.Boxplot(ret_BnH_trn)

chart.RiskReturnScatter(ret_BnH_trn, symbolset = 21, bg = "red")

Equally weighted portfolio (EWP) or \(1/N\) portfolio

One of the most important goals of quantitative portfolio management is to realize the goal of diversification across different assets in a portfolio.
A simple way to achieve diversification is by allocating the capital equally across all the assets.
This strategy is called equally weighted portfolio (EWP), \(1/N\) portfolio, uniform portfolio, or maximum deconcentration portfolio: \[\mathbf{w} = \frac{1}{N}\mathbf{1}.\]
It has been called “Talmudic rule” (Duchin and Levy 2009) since the Babylonian Talmud recommended this strategy approximately 1,500 years ago: “A man should always place his money, one third in land, a third in merchandise, and keep a third in hand.”
It has gained much interest due to superior historical performance and the emergence of several equally weighted ETFs (DeMiguel et al. 2009). For example, Standard & Poor’s has developed many S&P 500 equal weighted indices.

Empirical performance of \(1/N\) portfolio

The following table is from (DeMiguel et al. 2009) showing the superiority of the \(1/N\) portfolio:

Quintile Portfolio

The quintile portfolio is widely used by practitioners.
Two types: long-only quintile portfolio and long-short quintile portfolio.
Basic idea: 1) rank the \(N\) stocks according to some criterion, 2) divide them into five parts, and 3) long the top part (and possibly short the bottom part).
One can rank the stocks in a multitude of ways (typically based on expensive factors that investment funds buy at a premium price).
If we restric to price data, three common possible rankings are according to:
1. \(\quad\boldsymbol{\mu}\)
2. \(\dfrac{\boldsymbol{\mu}}{\textsf{diag}(\boldsymbol{\Sigma})}\)
3. \(\dfrac{\boldsymbol{\mu}}{\sqrt{\textsf{diag}(\boldsymbol{\Sigma})}}\)

Global maximum return portfolio (GMRP)

Another simple way to make an investment from the \(N\) assets is to only invest on the one with the highest return.
Mathematically, the global maximum return portfolio (GMRP) is formulated as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \quad \mathbf{w}\ge\mathbf{0}. \end{array}\]
This problem is convex and can be optimally solved, but of course the solution is trivial: to allocate all the budget to the asset with maximum return.
However, this seemingly good portfolio lacks diversification and performs poorly because past performance is not a guarantee of future performance.
In addition, the estimation of \(\boldsymbol{\mu}\) is extremely noisy in practice (Chopra and Ziemba 1993).

Example

Dollar allocation for EWP, QuintP, and GMRP:

R session: Comparison of \(1/N\) portfolio, quintile portfolio, and GMRP

We are now ready to consider several portfolio designs and compare their performance (the underlying constraint is simply \(\mathbf{w}\ge\mathbf{0}\) and \(\mathbf{1}^T\mathbf{w}=1\)).

The EWP or \(1/N\) portfolio allocates equal dollar weight to each stock: \(\mathbf{w} = \frac{1}{N}\mathbf{1}\).

w_EWP <- rep(1/N, N)
names(w_EWP) <- colnames(X_lin)
w_EWP

R>>      AAPL       AMD       ADI      ABBV      AEZS         A       APD        AA        CF 
R>> 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111

Quintile portfolios are widely used by practitioners. The idea is to 1) rank the \(N\) stocks, 2) divide them into five parts, and 3) long the top part (and possibly short the bottom part). One can rank the stocks in a multitude of ways (typically based on expensive factors that investment funds buy at a premium price). For our experiments, we will consider three possible rankings according to:

\(\boldsymbol{\mu}\)
\(\boldsymbol{\mu}/\textsf{diag}(\boldsymbol{\Sigma})\)
\(\boldsymbol{\mu}/\sqrt{(\textsf{diag}(\boldsymbol{\Sigma}))}\)

# find indices of sorted stocks
i1 <- sort(mu, decreasing = TRUE, index.return = TRUE)$ix
i2 <- sort(mu/diag(Sigma), decreasing = TRUE, index.return = TRUE)$ix
i3 <- sort(mu/sqrt(diag(Sigma)), decreasing = TRUE, index.return = TRUE)$ix

# create portfolios
w_QuintP_1 <- w_QuintP_2 <- w_QuintP_3 <- rep(0, N)
w_QuintP_1[i1[1:round(N/5)]] <- 1/round(N/5)
w_QuintP_2[i2[1:round(N/5)]] <- 1/round(N/5)
w_QuintP_3[i3[1:round(N/5)]] <- 1/round(N/5)
w_QuintP <- cbind("QuintP (mu)"        = w_QuintP_1, 
                  "QuintP (mu/sigma2)" = w_QuintP_2, 
                  "QuintP (mu/sigma)"  = w_QuintP_3)
rownames(w_QuintP) <- colnames(X_lin)
w_QuintP

R>>      QuintP (mu) QuintP (mu/sigma2) QuintP (mu/sigma)
R>> AAPL         0.5                0.5               0.5
R>> AMD          0.0                0.0               0.0
R>> ADI          0.0                0.0               0.0
R>> ABBV         0.5                0.0               0.0
R>> AEZS         0.0                0.0               0.0
R>> A            0.0                0.0               0.0
R>> APD          0.0                0.5               0.5
R>> AA           0.0                0.0               0.0
R>> CF           0.0                0.0               0.0

The global maximum return portfolio (GMRP) chooses the stock with the hightest return during the in-sample period:

i_max <- which.max(mu)
w_GMRP <- rep(0, N)
w_GMRP[i_max] <- 1
names(w_GMRP) <- colnames(X_lin)
w_GMRP

R>> AAPL  AMD  ADI ABBV AEZS    A  APD   AA   CF 
R>>    1    0    0    0    0    0    0    0    0

We can now compare the allocations of the portfolios:

# put together all portfolios
w_heuristic <- cbind("EWP" = w_EWP, w_QuintP, "GMRP" = w_GMRP)
round(w_heuristic, digits = 2)

R>>       EWP QuintP (mu) QuintP (mu/sigma2) QuintP (mu/sigma) GMRP
R>> AAPL 0.11         0.5                0.5               0.5    1
R>> AMD  0.11         0.0                0.0               0.0    0
R>> ADI  0.11         0.0                0.0               0.0    0
R>> ABBV 0.11         0.5                0.0               0.0    0
R>> AEZS 0.11         0.0                0.0               0.0    0
R>> A    0.11         0.0                0.0               0.0    0
R>> APD  0.11         0.0                0.5               0.5    0
R>> AA   0.11         0.0                0.0               0.0    0
R>> CF   0.11         0.0                0.0               0.0    0

barplot(t(w_heuristic), col = rainbow8equal[1:5], legend = colnames(w_heuristic), beside = TRUE,
        main = "Portfolio allocation of heuristic portfolios", xlab = "stocks", ylab = "dollars")

Then we can compare the performance (in-sample vs out-of-sample):

# compute returns of all portfolios
ret_heuristic <- xts(X_lin %*% w_heuristic, index(X_lin))
ret_heuristic$`QuintP (mu/sigma2)` <- NULL  # remove since it coincides with "QuintP (mu/sigma)"
ret_heuristic_trn <- ret_heuristic[1:T_trn, ]
ret_heuristic_tst <- ret_heuristic[-c(1:T_trn), ]

# performance
t(table.AnnualizedReturns(ret_heuristic_trn))

R>>                   Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> EWP                          0.0405             0.2059                    0.1967
R>> QuintP (mu)                  0.2573             0.1988                    1.2945
R>> QuintP (mu/sigma)            0.2518             0.1870                    1.3461
R>> GMRP                         0.2629             0.2638                    0.9969

t(table.AnnualizedReturns(ret_heuristic_tst))

R>>                   Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> EWP                          0.3397             0.2770                    1.2265
R>> QuintP (mu)                  0.1103             0.2142                    0.5151
R>> QuintP (mu/sigma)            0.0988             0.1748                    0.5652
R>> GMRP                         0.0375             0.2386                    0.1572

Let’s plot the wealth evolution (cumulative PnL) over time:

{ chart.CumReturns(ret_heuristic, main = "Cumulative return of heuristic portfolios", 
                   wealth.index = TRUE, legend.loc = "topleft", colorset = rich8equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

charts.PerformanceSummary(ret_heuristic, main = "Performance of heuristic portfolios", 
                          wealth.index = TRUE, colorset = rich8equal)

Finally, we can plot the risk-return scatter plot:

chart.RiskReturnScatter(ret_heuristic_trn, symbolset = 21, bg = "red",
                        main = "Annualized Return and Risk (in-sample)")

Markowitz’s Modern Portfolio Theory (MPT)

Risk control

In finance, the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) is very relevant as it quantifies the average benefit.

However, in practice, the average performance is not enough to characterize an investment and one needs to control the probability of going bankrupt.

Risk measures control how risky an investment strategy is.

The most basic measure of risk is given by the variance (Markowitz 1952): a higher variance means that there are large peaks in the distribution which may cause a big loss.

There are more sophisticated risk measures such as downside risk, VaR, ES, etc.

Mean-variance portfolio (MVP)

Mean-variance tradeoff

The mean return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the variance (risk) \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) (equivalently, the standard deviation or volatility \(\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\)) constitute two important performance measures.

Usually, the higher the mean return the higher the variance and vice-versa.

Thus, we are faced with two objectives to be optimized: it is a multi-objective optimization problem.

They define a fundamental mean-variance tradeoff curve (Pareto curve).

The choice of a specific point in this tradeoff curve depends on how agressive or risk-averse the investor is.

Mean-variance tradeoff

Markowitz’s mean-variance portfolio (1952)

The idea of Markowitz’s mean-variance portfolio (MVP) (Markowitz 1952) is to find a trade-off between the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the risk of the portfolio measured by the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1 \end{array}\] where \(\mathbf{w}^{T}\mathbf{1}=1\) is the capital budget constraint and \(\lambda\) is a parameter that controls how risk-averse the investor is.
This is also referred to as Modern Portfolio Theory (MPT).

This is a convex quadratic problem (QP) with only one linear constraint which admits a closed-form solution: \[\mathbf{w}_{\sf MVP} = \frac{1}{2\lambda}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{\mu}+\nu\mathbf{1}\right),\] where \(\nu\) is the optimal dual variable \(\nu=\frac{2\lambda-\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\).

Markowitz’s mean-variance portfolio (1952)

There are two alternative obvious reformulations for Markowitz’s portfolio.
Maximization of mean return: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\leq\alpha\\ & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
Minimization of risk: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{w}^{T}\boldsymbol{\mu}\geq\beta\\ & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
The three formulations give different points on the Pareto optimal curve.
They all require choosing one parameter (\(\alpha\), \(\beta\), or \(\lambda\)).
By sweeping over this parameter, one can recover the whole Pareto optimal curve.

Efficient frontier

The previous three problems result in the same mean-variance trade-off curve (Pareto curve):

Markowitz’s portfolio with practical constraints

A general Markowitz’s portfolio with practical constraints could be: \[\begin{array}{lll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w} & \\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1 & \text{budget}\\ & \mathbf{w}\ge\mathbf{0} & \text{no shorting}\\ & \left\Vert \mathbf{w}\right\Vert _{1}\leq\gamma & \text{leverage}\\ & \left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1}\leq\tau & \text{turnover}\\ & \left\Vert \mathbf{w}\right\Vert _{\infty}\leq u & \text{max position}\\ & \left\Vert \mathbf{w}\right\Vert _{0}\leq K & \text{sparsity} \end{array}\] where:
- \(\gamma\geq1\) controls the amount of shorting and leveraging
- \(\tau>0\) controls the turnover (to control the transaction costs in the rebalancing)
- \(u\) limits the position in each stock
- \(K\) controls the cardinality of the portfolio (to select a small set of stocks from the universe).
Without the sparsity constraint, the problem can be rewritten as a QP.

Markowitz’s MVP as a regression

Interestingly, the MVP formulation can be interpreted as a regression.
The key observation is that the variance of the portfolio can be seen as an \(\ell_2\)-norm error term: \[ \begin{aligned} \mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w} &= \mathbf{w}^T\textsf{E}\left[(\mathbf{r}_t-\boldsymbol{\mu})(\mathbf{r}_t-\boldsymbol{\mu})^T\right]\mathbf{w}\\ &= \textsf{E}\left[(\mathbf{w}^T(\mathbf{r}_t-\boldsymbol{\mu}))^2\right]\\ &= \textsf{E}\left[(\mathbf{w}^T\mathbf{r}_t-\rho)^2\right] \end{aligned} \] where \(\rho = \mathbf{w}^T\boldsymbol{\mu}\).
The sample approximation of the expected value is \[ \textsf{E}\left[(\mathbf{w}^T\mathbf{r}_t - \rho)^2\right] \approx \sum_{t=1}^T (\mathbf{w}^T\mathbf{r}_t - \rho)^2 = \|\mathbf{R}\mathbf{w} - \rho\mathbf{1}\|^2 \] where \(\mathbf{R} \triangleq \left[\mathbf{r}_1,\ldots,\mathbf{r}_T \right]^T\).

Markowitz’s MVP as a regression

Let’s start by rewritting the MVP as a minimization: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w} - \frac{1}{\lambda}\mathbf{w}^{T}\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1 \end{array}\]
Thus, we can finally write the MVP as the regression \[\begin{array}{ll} \underset{\mathbf{w},\rho}{\textsf{minimize}} & \|\mathbf{R}\mathbf{w} - \rho\mathbf{1}\|^2 - \frac{1}{\lambda}\rho\\ \textsf{subject to} & \rho = \mathbf{w}^T\boldsymbol{\mu}\\ & \mathbf{1}^T\mathbf{w}=1\\ \end{array}\]
If instead of the mean-variance formulation, we fix the expected return to some value and minimize the variance, then \(\rho\) simply becomes a fixed value rather than a variable.
Intuitively, this reformulation is trying to achieve the expected return \(\rho\) with minimimum variance in the \(\ell_2\)-norm sense.

R session: Markowitz’s MVP

Markowitz’s mean-variance portfolio (MVP) with no shorting is formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w} -\lambda\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

This problem does not have a closed-form solution and we need to resort to using a solver. It is very convenient to use the package CVXR (although the computational cost will be high and the solution not totally robust, if necessary use a QP solver like quadprog):

library(CVXR)

# create function for MVP
MVP <- function(mu, Sigma, lmd = 0.5) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)),
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  names(w) <- colnames(Sigma)
  return(w)
}

# this function can now be used as
w_MVP <- MVP(mu, Sigma, lmd = 2)

Global minimum variance portfolio (GMVP)

Recall the risk minimization formulation: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \begin{array}{c} \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\end{array}\\ \textsf{subject to} & \mathbf{w}^{T}\boldsymbol{\mu}\geq\beta\\ & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]
The global minimum variance portfolio (GMVP) ignores the expected return and focuses on the risk only: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
It is a simple convex QP with solution \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}.\]
It is widely used in academic papers for simplicity of evaluation and comparison of different estimators of the covariance matrix \(\boldsymbol{\Sigma}\) (while ignoring the estimation of \(\boldsymbol{\mu}\)).

GMVP with leverage constraints

The GMVP is typically considered with no-short constraints \(\mathbf{w}\geq\mathbf{0}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \;\mathbf{w}\geq\mathbf{0}. \end{array}\]
However, if short-selling is allowed, one needs to limit the amount of leverage to avoid impractical solutions with very large positive and negative weights that cancel out.
A sensible GMVP formulation with leverage is \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \;\left\Vert \mathbf{w}\right\Vert _{1}\leq\gamma \end{array}\] where \(\gamma\geq1\) is a parameter that controls the amount of leverage:
- \(\gamma=1\) means no shorting (so equivalent to \(\mathbf{w}\geq\mathbf{0}\))
- \(\gamma>1\) allows some shorting as well as leverage in the longs, e.g., \(\gamma=1.5\) would allow the portfolio \(\mathbf{w}=\left(1.25,-0.25\right)\).

R session: GMVP

The Global Minimum Variance Portfolio (GMVP) with no shorting is formulated as \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0} \end{array} \]

Since a closed-form solution does not exist with the constraint \(\mathbf{w}\ge\mathbf{0}\), we need to resort to a solver. We can convenientky use the package CVXR (although the computational cost will be high and the solution not totally robust, if necessary use a QP solver like quadprog):

library(CVXR)

# create function for GMVP
GMVP <- function(Sigma) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Minimize(quad_form(w, Sigma)), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w))
  names(w) <- colnames(Sigma)
  return(w)
}

# this function can now be used as
w_GMVP <- GMVP(Sigma)

Maximum Sharpe ratio portfolio (MSRP)

Markowitz’s mean-variance framework provides portfolios along the Pareto-optimal frontier and the choice depends on the risk-aversion of the investor.
But typically one measures an investment with the Sharpe ratio: only one portfolio on the Pareto-optimal frontier achieves the maximum Sharpe ratio.
Precisely, Sharpe (1966) first proposed the maximization of the Sharpe ratio: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right) \end{array}\] where \(r_{f}\) is the return of a risk-free asset.
However, this problem is not convex!
This problem belong to the class of fractional programming (FP) with many methods available for its resolution.

Interlude: Fractional Programming (FP)

Fractional programming (FP) is a family of optimization problems involving ratios.
Its history can be traced back to a paper on economic expansion by von Neumann (1937).
It has since inspired the studies in economics, management science, optics, information theory, communication systems, graph theory, computer science, etc.
Given functions \(f(\mathbf{x})\ge0\) and \(g(\mathbf{x})>0\), a single-ratio FP is \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & \dfrac{f(\mathbf{x})}{g(\mathbf{x})}\\ \textsf{subject to} & \mathbf{x}\in\cal{X}. \end{array}\]
FP has been widely studied and extended to deal with multiple ratios such as \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & \min_i\dfrac{f_i(\mathbf{x})}{g_i(\mathbf{x})}\\ \textsf{subject to} & \mathbf{x}\in\cal{X}. \end{array}\]

Interlude: How to solve FP

FPs are nonconvex problems, so in principle they are tough to solve (Stancu-Minasian 1992).
However, the so-called concave-convex FP can be easibly solved in different ways.
We will focus on the concave-convex single-ratio FP: \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & \dfrac{f(\mathbf{x})}{g(\mathbf{x})}\\ \textsf{subject to} & \mathbf{x}\in\cal{X}. \end{array}\] where \(f(\mathbf{x})\ge0\) is concave and \(g(\mathbf{x})>0\) is convex.
Main approaches:
- via bisection method (aka sandwich technique)
- via Dinkelbach transform
- via Schaible transform (Charnes-Cooper transform for the linear FP case)

Interlude: Solving FP via bisection

The idea is to realize that a concave-convex FP, while not convex, is quasi-convex.
This can be easily seen by rewritting the problem in epigraph form: \[\begin{array}{ll} \underset{\mathbf{x},t}{\textsf{maximize}} & t\\ \textsf{subject to} & t \le \dfrac{f(\mathbf{x})}{g(\mathbf{x})}\\ & \mathbf{x}\in\cal{X}. \end{array}\]
If now we fix the variable \(t\) to some value (so it is not a variable anymore), then we can rewrite it as a convex problem: \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & t\\ \textsf{subject to} & t g(\mathbf{x}) \le f(\mathbf{x})\\ & \mathbf{x}\in\cal{X}. \end{array}\]
At this point, one can easily solve this convex problem optimally with a solver and then solve for \(t\) via the bisection algorithm (aka sandwich technique). It converges to the global optimal solution.

Interlude: Solving FP via bisection

Recall the quasi-convex problem we want to solve: \[\begin{array}{ll} \underset{\mathbf{x},t}{\textsf{maximize}} & t\\ \textsf{subject to} & t g(\mathbf{x}) \le f(\mathbf{x})\\ & \mathbf{x}\in\cal{X}. \end{array}\]

Algorithm 1: Bisection method (aka sandwich technique)

Given upper- and lower-bounds on \(t\): \(t^{\textsf{ub}}\) and \(t^{\textsf{lb}}.\)
1. compute mid-point: \(t=\left(t^{\textsf{ub}}+t^{\textsf{lb}}\right)/2\)
2. solve the following feasibility problem: \[\begin{array}{ll} \underset{\;}{\textsf{find}} & \mathbf{x}\\ \textsf{subject to} & t g(\mathbf{x}) \le f(\mathbf{x})\\ & \mathbf{x}\in\cal{X}. \end{array}\] 3. if feasible, then set \(t^{\textsf{lb}}=t\); otherwise set \(t^{\textsf{ub}}=t\)
4. if \(t^{\textsf{ub}}-t^{\textsf{lb}}>\epsilon\) go to step 1; otherwise finish.

Interlude: Solving FP via Dinkelbach tranform

The Dinkelbach transform was proposed in (Dinkelbach 1967).
It reformulates the original concave-convex FP problem \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & \dfrac{f(\mathbf{x})}{g(\mathbf{x})}\\ \textsf{subject to} & \mathbf{x}\in\cal{X}. \end{array}\] as the convex problem \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & f(\mathbf{x}) - y g(\mathbf{x})\\ \textsf{subject to} & \mathbf{x}\in\cal{X} \end{array}\] with a new auxiliary variable \(y\), which is iteratively updated by \[y^{(k)} = \frac{f(\mathbf{x}^{(k)})}{g(\mathbf{x}^{(k)})}\] where \(k\) is the iteration index.

Interlude: Solving FP via Dinkelbach transform

Algorithm 2: Dinkelbach method

Set \(k=0\) and initialize \(\mathbf{x}^{(0)}\)
repeat

Set \(y^{(k)} = \frac{f(\mathbf{x}^{(k)})}{g(\mathbf{x}^{(k)})}\)
Obtain next point \(\mathbf{x}^{(k+1)}\) by solving \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & f(\mathbf{x}) - y^{(k)} g(\mathbf{x})\\ \textsf{subject to} & \mathbf{x}\in\cal{X} \end{array}\]
\(k \gets k+1\)

until convergence
return \(\mathbf{x}^{(k)}\)

Dinkelbach method can be shown to converge to the global optimum of the original concave-convex FP by carefully analyzing the increasingness of \(y^{(k)}\) and the funtion \(F(y) = \arg\max_\mathbf{x} f(\mathbf{x}) - y g(\mathbf{x})\).

Interlude: Solving Linear FP via Charnes-Cooper transform

The Charnes-Cooper transform was proposed in (Charnes and Cooper 1962) to solve the linear FP (LFP) case (Bajalinov 2003): \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & \dfrac{\mathbf{c}^T\mathbf{x}+\alpha}{\mathbf{d}^T\mathbf{x}+\beta}\\ \textsf{subject to} & \mathbf{A}\mathbf{x}\le\mathbf{b} \end{array}\] over the set \(\{\mathbf{x} \mid \mathbf{d}^T\mathbf{x}+\beta > 0\}\).
The Charnes-Cooper transform introduces two new variables \[ \mathbf{y} = \frac{1}{\mathbf{d}^T\mathbf{x}+\beta}\mathbf{x}, \quad t = \frac{1}{\mathbf{d}^T\mathbf{x}+\beta}. \]

Interlude: Solving Linear FP via Charnes-Cooper transform

The LFP becomes then a linear program (LP): \[\begin{array}{ll} \underset{\mathbf{y},t}{\textsf{maximize}} & \mathbf{c}^T\mathbf{y}+\alpha t\\ \textsf{subject to} & \mathbf{A}\mathbf{y}\le\mathbf{b}t\\ & \mathbf{d}^T\mathbf{y}+\beta t = 1\\ & t\ge0\\ \end{array}\]
The solution for \(\mathbf{y}\) and \(t\) yields the solution of the original problem as \[\mathbf{x} = \frac{1}{t}\mathbf{y}.\]
Note that the number of constraints in the LP formulation has increased.

Interlude: Solving Linear FP via Charnes-Cooper transform

Proof:

Any feasible point \(\mathbf{x}\) in the original LFP leads to a feasible point \((\mathbf{y},t)\) in the LP (via the equations in the Charnes-Cooper transform) with the same objective value. The objective is \[ \frac{\mathbf{c}^T\mathbf{x}+\alpha}{\mathbf{d}^T\mathbf{x}+\beta} = \frac{\mathbf{c}^T\mathbf{y}/t+\alpha}{\mathbf{d}^T\mathbf{y}/t+\beta} = \frac{\mathbf{c}^T\mathbf{y}+\alpha t}{\mathbf{d}^T\mathbf{y}+\beta t} \] and since it is scale invariant, we can choose to set the denominator equal to 1.
Conversely, any feasible point \((\mathbf{y},t)\) in the LP leads to a feasible point \(\mathbf{x}\) in the original LFP (via \(\mathbf{x} = \frac{1}{t}\mathbf{y}\)). From the denominator constraint \[ \mathbf{d}^T\mathbf{y}+\beta t = 1 \Longleftrightarrow \mathbf{d}^T\mathbf{y}/t+\beta = 1/t \Longleftrightarrow 1/(\mathbf{d}^T\mathbf{x}+\beta) = t \] which leads to the objective \[ \mathbf{c}^T\mathbf{y}+\alpha t = t (\mathbf{c}^T\mathbf{y}/t+\alpha) = t (\mathbf{c}^T\mathbf{x}+\alpha) = (\mathbf{c}^T\mathbf{x}+\alpha)/(\mathbf{d}^T\mathbf{x}+\beta). \]

Interlude: Solving FP via Schaible transform

The Schaible transform is a generalization of the Charnes-Cooper transform proposed in (Schaible 1974) to solve the concave-convex FP: \[\begin{array}{ll} \underset{\mathbf{x}}{\textsf{maximize}} & \dfrac{f(\mathbf{x})}{g(\mathbf{x})}\\ \textsf{subject to} & \mathbf{x}\in\cal{X}. \end{array}\]
The Schaible transform introduces two new variables (note that \(\mathbf{x} = \mathbf{y}/t\)): \[ \mathbf{y} = \frac{\mathbf{x}}{g(\mathbf{x})}, \quad t = \frac{1}{g(\mathbf{x})}. \]
The original concave-convex FP is equivalent to the convex problem: \[\begin{array}{ll} \underset{\mathbf{y},t}{\textsf{maximize}} & t f\left(\frac{\mathbf{y}}{t}\right)\\ \textsf{subject to} & t g\left(\frac{\mathbf{y}}{t}\right) \le 1\\ & t \ge 0\\ & \mathbf{y}/t\in\cal{X}. \end{array}\]

Solving the Maximum Sharpe ratio portfolio (MSRP)

Recall the maximization of the Sharpe ratio proposed in (Sharpe 1966): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right). \end{array}\]

This problem is nonconvex, but upon recognition as a fractional program (FP), we can consider its resolution via:
- bisection method (aka sandwich technique),
- Dinkelbach transform, and
- Schaible transform (aka Charnes-Cooper transform for the linear FP case).

R session: Maximum Sharpe ratio portfolio via a general-purpose nonlinear solver

The maximum Sharpe ratio portfolio (MSRP) is given by the solution to the nonconvex problem \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad\mathbf{w}\geq\mathbf{0}. \end{array}\]

We will solve this problem with the general-purpose nonlinear solver nloptr in R:

library(nloptr)

# define the nonconvex objective function
fn_SR <- function(w) {
  return(as.numeric(t(w) %*% mu / sqrt(t(w) %*% Sigma %*% w)))
}
  
# initial point
w0 <- rep(1/N, N)

res <- nloptr::slsqp(w0, fn_SR,
                     lower = rep(0, N),  # w >= 0
                     heq = function(w) return(sum(w) - 1))    # sum(w) = 1
w_nonlinear_solver <- res$par
res

R>> $par
R>> [1] 1.126793e-19 1.611956e-01 6.598538e-18 1.156482e-18 8.388044e-01 0.000000e+00 1.909453e-18
R>> [8] 5.671798e-17 0.000000e+00
R>> 
R>> $value
R>> [1] -0.07987582
R>> 
R>> $iter
R>> [1] 68
R>> 
R>> $convergence
R>> [1] 4
R>> 
R>> $message
R>> [1] "NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached."

# res <- alabama::constrOptim.nl(w0, fn_SR,
#                                hin = function(w) return(w),  # w >= 0
#                                heq = function(w) return(sum(w) - 1),    # sum(w) = 1
#                                control.outer = list(trace = FALSE))
# w_nonlinear_solver <- res$par
# res

Maximum Sharpe ratio portfolio via bisection

The idea is to realize that the problem, while not convex, is quasi-convex.
This can be easily seen by rewritting the problem in epigraph form: \[\begin{array}{ll} \underset{\mathbf{w},t}{\textsf{maximize}} & t\\ \textsf{subject to} & t \leq \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right). \end{array}\]
If now we fix the variable \(t\) to some value (so it is not a variable anymore), the problem is easily recognized as a (convex) second order cone program (SOCP): \[\begin{array}{ll} \underset{\;}{\textsf{find}} & \mathbf{w}\\ \textsf{subject to} & t \left\Vert \boldsymbol{\Sigma}^{1/2}\mathbf{w}\right\Vert_{2}\leq\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}\\ & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right). \end{array}\]
At this point, one can easily solve the convex problem with an SOCP solver and then solve for \(t\) via yhe bisection algorithm (aka sandwich technique).

R session: Maximum Sharpe ratio portfolio via bisection

We are going to solve the nonconvex problem \[\begin{array}{ll} \underset{\mathbf{w},t}{\textsf{maximize}} & t\\ \textsf{subject to} & t \leq \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right). \end{array}\] via bisection on \(t\) with the following (convex) SOCP problem for a given \(t\): \[\begin{array}{ll} \underset{\;}{\textsf{find}} & \mathbf{w}\\ \textsf{subject to} & t \left\Vert \boldsymbol{\Sigma}^{1/2}\mathbf{w}\right\Vert_{2}\leq\mathbf{w}^{T}\boldsymbol{\mu}\\ & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right). \end{array}\]

# define the inner solver based on an SOCP solver 
# (we will simply use CVXR for convenience, see: https://cvxr.rbind.io/cvxr_functions/)
library(CVXR)

# square-root of matrix Sigma
Sigma_12 <- chol(Sigma)
max(abs(t(Sigma_12) %*% Sigma_12 - Sigma))  # sanity check

R>> [1] 5.421011e-20

# create function for MVP
SOCP_bisection <- function(t) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Maximize(0),
                  constraints = list(t*cvxr_norm(Sigma_12 %*% w, 2) <= t(mu) %*% w,
                                     sum(w) == 1,
                                     w >= 0))
  result <- solve(prob)
  return(list("status" = result$status, "w" = as.vector(result$getValue(w))))
}

# now run the bisection algorithm
t_lb <- 0   # for sure the problem is feasible in this case
t_ub <- 10  # a tighter upper bound coud be chose, but a Sharpe ratio of 10 surely cannot be achieved
while(t_ub - t_lb > 1e-6) {
  t <- (t_ub + t_lb)/2  # midpoint
  if(SOCP_bisection(t)$status == "infeasible")
    t_ub <- t
  else
    t_lb <- t
}
w_bisection <- SOCP_bisection(t_lb)$w

# comparison between two solutions
round(cbind(w_nonlinear_solver, w_bisection), digits = 3)

R>>       w_nonlinear_solver w_bisection
R>>  [1,]              0.000       0.325
R>>  [2,]              0.161       0.000
R>>  [3,]              0.000       0.000
R>>  [4,]              0.000       0.223
R>>  [5,]              0.839       0.000
R>>  [6,]              0.000       0.000
R>>  [7,]              0.000       0.399
R>>  [8,]              0.000       0.000
R>>  [9,]              0.000       0.053

# Sharpe ratio of two solutions
c("nonlinear_solver" = fn_SR(w_nonlinear_solver), 
  "bisection"        = fn_SR(w_bisection))

R>> nonlinear_solver        bisection 
R>>      -0.07987582       0.07747009

We can see that the nonlinear solver could not properly solve the nonconvex problem. However, using bisection we can solve the problem optimally.

Maximum Sharpe ratio portfolio via Dinkelbach

The Dinkelbach transform proposed in (Dinkelbach 1967) reformulates the original concave-convex FP problem \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right) \end{array}\] as the convex problem \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu} - y\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right) \end{array}\] with a new auxiliary variable \(y\), which is iteratively updated by \[y^{(k)} = \frac{\mathbf{w}^{(k)T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{(k)T}\boldsymbol{\Sigma}\mathbf{w}^{(k)}}}\] where \(k\) is the iteration index.

Maximum Sharpe ratio portfolio via Dinkelbach

By noting that \(\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}} = \left\Vert \boldsymbol{\Sigma}^{1/2}\mathbf{w}\right\Vert_{2}\), where \(\boldsymbol{\Sigma}^{1/2}\) satisfies \(\boldsymbol{\Sigma}^{T/2}\boldsymbol{\Sigma}^{1/2}=\boldsymbol{\Sigma}\), we can finally rewrite the problem as a SOCP.
The iterative Dinkelbach-based method obtains \(\mathbf{w}^{(k+1)}\) by solving the following problem at the \(k\)th iteration: \[\begin{array}{ll} \underset{\mathbf{w},t}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu} - y^{(k)} t\\ \textsf{subject to} & t \ge \left\Vert \boldsymbol{\Sigma}^{1/2}\mathbf{w}\right\Vert_{2}\\ & \mathbf{1}^{T}\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right) \end{array}\] where \[y^{(k)} = \frac{\mathbf{w}^{(k)T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{(k)T}\boldsymbol{\Sigma}\mathbf{w}^{(k)}}}.\]

R session: Maximum Sharpe ratio portfolio via Dinkelbach

We are going to solve the nonconvex problem \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad \mathbf{w}\geq\mathbf{0} \end{array}\] by iteratively solving the (convex) SOCP problem for a given \(y^{(k)}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu} - y^{(k)} \left\Vert \boldsymbol{\Sigma}^{1/2}\mathbf{w}\right\Vert_{2}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad \mathbf{w}\geq\mathbf{0}. \end{array}\] where \[y^{(k)} = \frac{\mathbf{w}^{(k)T}\boldsymbol{\mu}}{\sqrt{\mathbf{w}^{(k)T}\boldsymbol{\Sigma}\mathbf{w}^{(k)}}}.\]

# define the inner solver based on an SOCP solver 
# (we will simply use CVXR for convenience, see: https://cvxr.rbind.io/cvxr_functions/)
library(CVXR)

# square-root of matrix Sigma
Sigma_12 <- chol(Sigma)
max(abs(t(Sigma_12) %*% Sigma_12 - Sigma))  # sanity check

R>> [1] 5.421011e-20

# create function for MVP
SOCP_Dinkelbach <- function(y) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Maximize(t(mu) %*% w - y*cvxr_norm(Sigma_12 %*% w, 2)),
                  constraints = list(sum(w) == 1,
                                     w >= 0))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# initial point (has to satisfy t(w_k) %*% mu>=0)
i_max <- which.max(mu)
w_k <- rep(0, N)  
w_k[i_max] <- 1

# now the iterative Dinkelbach algorithm
k <- 1
while(k == 1 || max(abs(w_k - w_prev)) > 1e-6) {
  w_prev <- w_k
  y_k <- as.numeric(t(w_k) %*% mu / sqrt(t(w_k) %*% Sigma %*% w_k))
  w_k <- SOCP_Dinkelbach(y_k)
  k <- k + 1
}
w_Dinkelbach <- w_k
cat("Number of iterarions:", k-1)

R>> Number of iterarions: 4

# comparison among three solutions
round(cbind(w_nonlinear_solver, w_bisection, w_Dinkelbach), digits = 3)

R>>       w_nonlinear_solver w_bisection w_Dinkelbach
R>>  [1,]              0.000       0.325        0.325
R>>  [2,]              0.161       0.000        0.000
R>>  [3,]              0.000       0.000        0.000
R>>  [4,]              0.000       0.223        0.223
R>>  [5,]              0.839       0.000        0.000
R>>  [6,]              0.000       0.000        0.000
R>>  [7,]              0.000       0.399        0.399
R>>  [8,]              0.000       0.000        0.000
R>>  [9,]              0.000       0.053        0.052

# Sharpe ratio of three solutions
c("nonlinear_solver" = fn_SR(w_nonlinear_solver), 
  "bisection"        = fn_SR(w_bisection),
  "Dinkelbach"       = fn_SR(w_Dinkelbach))

R>> nonlinear_solver        bisection       Dinkelbach 
R>>      -0.07987582       0.07747009       0.07747018

As expected, the Dinkelbach method converges to the optimal solution, like the bisection method (but unlike the general-purpose nonlinear solver).

Maximum Sharpe ratio portfolio via Schaible

The Schaible transform is a generalization of the Charnes-Cooper transform proposed in (Schaible 1974) to solve a concave-convex FP: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1, \quad\left(\mathbf{w}\geq\mathbf{0}\right). \end{array}\]
The Schaible transform introduces two new variables (note that \(\mathbf{w} = \tilde{\mathbf{w}}/t\)): \[ \tilde{\mathbf{w}} = \frac{\mathbf{w}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}, \quad t = \frac{1}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}. \]
The original maximum Sharpe ratio portfolio is equivalent to the convex quadratic problem (QP): \[\begin{array}{ll} \underset{\tilde{\mathbf{w}},t}{\textsf{maximize}} & \tilde{\mathbf{w}}^{T}\boldsymbol{\mu} - r_{f} t\\ \textsf{subject to} & \tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}} \le 1\\ & t \ge 0\\ & \mathbf{1}^T\tilde{\mathbf{w}}=t, \quad\left(\tilde{\mathbf{w}}\geq\mathbf{0}\right). \end{array}\]

Maximum Sharpe ratio portfolio via Schaible

The previous problem can be simplified by eliminating the variable \(t=\mathbf{1}^T\tilde{\mathbf{w}}\): \[\begin{array}{ll} \underset{\tilde{\mathbf{w}}}{\textsf{maximize}} & \tilde{\mathbf{w}}^T\left(\boldsymbol{\mu} - r_{f}\mathbf{1}\right)\\ \textsf{subject to} & \tilde{\mathbf{w}}^T\boldsymbol{\Sigma}\tilde{\mathbf{w}} \le 1\\ & \mathbf{1}^T\tilde{\mathbf{w}} \ge 0, \quad\left(\tilde{\mathbf{w}}\geq\mathbf{0}\right) \end{array}\] from which we can recover the original solution as \(\mathbf{w} = \tilde{\mathbf{w}}/t = \tilde{\mathbf{w}}/\left(\mathbf{1}^T\tilde{\mathbf{w}}\right)\).
Interestingly, we can get the following slightly different formulation by starting with a minimization of a ratio: \[\begin{array}{ll} \underset{\tilde{\mathbf{w}}}{\textsf{minimize}} & \tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}\\ \textsf{subject to} & \tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})=1\\ & \mathbf{1}^{T}\tilde{\mathbf{w}}\geq0, \quad\left(\tilde{\mathbf{w}}\geq\mathbf{0}\right). \end{array}\]

Proof of convex reformulation of MSRP

Start with the original problem formulation: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \frac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1\\ & \left(\mathbf{w}\geq\mathbf{0}\right) \end{array} \Longleftrightarrow \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \frac{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}{\mathbf{w}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1\\ & \left(\mathbf{w}\geq\mathbf{0}\right) \end{array} \]

Now, since the objective is scale invariant w.r.t. \(\mathbf{w}\), we can choose the proper scaling factor for our convenience. We define \(\tilde{\mathbf{w}}=t\mathbf{w}\) with the scaling factor \(t=1/\left(\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}\right)>0\), so that the objective becomes \(\tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}\), the sum constraint \(\mathbf{1}^{T}\tilde{\mathbf{w}}=t\), and the problem is \[\begin{array}{ll} \underset{\mathbf{w},\tilde{\mathbf{w}},t}{\textsf{minimize}} & \sqrt{\tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}}\\ \textsf{subject to} & t=1/\mathbf{w}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})>0\\ & \tilde{\mathbf{w}}=t\mathbf{w}\\ & \mathbf{1}^{T}\tilde{\mathbf{w}}=t>0\\ & \left(\tilde{\mathbf{w}}\geq\mathbf{0}\right). \end{array}\]

Proof of convex reformulation of MSRP

The constraint \(t=1/\mathbf{w}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})\) can be rewritten in terms of \(\tilde{\mathbf{w}}\) as \(1=\tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})\). So the problem becomes \[\begin{array}{ll} \underset{\mathbf{w},\tilde{\mathbf{w}},t}{\textsf{minimize}} & \tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}\\ \textsf{subject to} & \tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})=1\\ & \tilde{\mathbf{w}}=t\mathbf{w}\\ & \mathbf{1}^{T}\tilde{\mathbf{w}}=t>0\\ & \left(\tilde{\mathbf{w}}\geq\mathbf{0}\right). \end{array}\]
Now, note that the strict inequality \(t>0\) is equivalent to \(t\geq0\) because \(t=0\) can never happen as \(\tilde{\mathbf{w}}\) would would be zero and the first constraint would not be satisfied.

Proof of convex reformulation of MSRP

Finally, we can now get rid of \(\mathbf{w}\) and t in the formulation as they can be directly obtained as \(t=\mathbf{1}^{T}\tilde{\mathbf{w}}\) and \(\mathbf{w}=\tilde{\mathbf{w}}/t\): \[\begin{array}{ll} \underset{\tilde{\mathbf{w}}}{\textsf{minimize}} & \tilde{\mathbf{w}}^{T}\boldsymbol{\Sigma}\tilde{\mathbf{w}}\\ \textsf{subject to} & \tilde{\mathbf{w}}^{T}(\boldsymbol{\mu}-r_{f}\mathbf{1})=1\\ & \mathbf{1}^{T}\tilde{\mathbf{w}}\geq0\\ & \left(\tilde{\mathbf{w}}\geq\mathbf{0}\right). \end{array}\]
QED!
Recall that the portfolio is then obtained with the correct scaling factor as \(\mathbf{w}=\tilde{\mathbf{w}}/(\mathbf{1}^{T}\tilde{\mathbf{w}})\).

R session: MSRP

The maximum Sharpe ratio portfolio (MSRP) is the nonconvex problem \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\mu}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1, \quad \mathbf{w}\geq\mathbf{0} \end{array}\] that can be rewritten in convex form as \[ \begin{array}{ll} \underset{\tilde{\mathbf{w}}}{\textsf{minimize}} & \tilde{\mathbf{w}}^T\mathbf{\Sigma}\tilde{\mathbf{w}}\\ {\textsf{subject to}} & \tilde{\mathbf{w}}^T\boldsymbol{\mu} = 1\\ & \tilde{\mathbf{w}}\ge\mathbf{0} \end{array} \] and then \(\mathbf{w} = \tilde{\mathbf{w}}/(\mathbf{1}^T\tilde{\mathbf{w}})\).

This is a quadratic problem (QP) and we can conveniently use CVXR (although one is advised to use a specific QP solver like quadprog for speed and stability):

# create function for MSRP
MSRP <- function(mu, Sigma) {
  w_ <- Variable(nrow(Sigma))
  prob <- Problem(Minimize(quad_form(w_, Sigma)),
                  constraints = list(w_ >= 0, t(mu) %*% w_ == 1))
  result <- solve(prob)
  w <- as.vector(result$getValue(w_)/sum(result$getValue(w_)))
  names(w) <- colnames(Sigma)
  return(w)
}

# this function can now be used as
w_MSRP <- MSRP(mu, Sigma)

# comparison among solutions
round(cbind(w_nonlinear_solver, w_bisection, w_Dinkelbach, w_MSRP), digits = 3)

R>>      w_nonlinear_solver w_bisection w_Dinkelbach w_MSRP
R>> AAPL              0.000       0.325        0.325  0.325
R>> AMD               0.161       0.000        0.000  0.000
R>> ADI               0.000       0.000        0.000  0.000
R>> ABBV              0.000       0.223        0.223  0.224
R>> AEZS              0.839       0.000        0.000  0.000
R>> A                 0.000       0.000        0.000  0.000
R>> APD               0.000       0.399        0.399  0.399
R>> AA                0.000       0.000        0.000  0.000
R>> CF                0.000       0.053        0.052  0.052

# Sharpe ratio of different solutions
c("nonlinear_solver" = fn_SR(w_nonlinear_solver), 
  "bisection"        = fn_SR(w_bisection),
  "Dinkelbach"       = fn_SR(w_Dinkelbach),
  "Schaible"         = fn_SR(w_MSRP))

R>> nonlinear_solver        bisection       Dinkelbach         Schaible 
R>>      -0.07987582       0.07747009       0.07747018       0.07747018

As expected, the bisection method, Dinkelbach method, and Schaible method give the optimal solution, unlike the general-purpose nonlinear solver which solves a nonconvex problem.

Sharpe ratio portfolio in the efficient frontier

Risk-Based Portfolios (GMVP, IVP, RPP, MDP, MDCP)

Risk-based portfolios

Risk-based portfolios try to bypass the high sensitivity of Markowitz’s mean-variance portfolio to the estimation errors of the expected returns by not making use of the expected returns altogether. They are based only on the covariance matrix (Ardia et al. 2017).

We will explore the following risk-based portfolios:
- global minimum variance portfolio (GMVP)
- inverse volatility portfolio (IVP)
- risk parity portfolio (RPP) or equal risk portfolio (ERP)
- most diversified portfolio (MDP)
- maximum decorrelation portfolio (MDCP).

Global minimum variance portfolio (GMVP)

As previous seen, the global minimum variance portfolio (GMVP) can be seen as a particular case of Markowitz’s mean-variance portfolio when the expected return is totally ignored: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]
It is a simple convex QP with solution \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}.\]
It is widely used in academic papers for simplicity of evaluation and comparison of different estimators of the covariance matrix \(\boldsymbol{\Sigma}\) (while ignoring the estimation of \(\boldsymbol{\mu}\)).

Inverse volatility portfolio (IVP)

The aim of inverse volatility portfolio (IVP) is to control the portfolio risk (risk parity portfolio being a refined version (Qian 2005)).
The IVP is defined as \[\mathbf{w} = \frac{\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\boldsymbol{\sigma}^{-1}}\] where \(\boldsymbol{\sigma}^2 = {\sf Diag(\boldsymbol{\Sigma})}\).
Lower weights are given to high volatility assets and higher weights to low volatility assets
IVP is also called “equal volatility” portfolio since the weighted constituent assets have equal volatility: \[{\sf sd}(w_ir_i) = w_i\sigma_i = 1/N.\]
Note that the GMVP when the covariance matrix is diagonal leads to an inverse-variance solution: \[\mathbf{w} = \frac{\boldsymbol{\sigma}^{-2}}{\mathbf{1}^T\boldsymbol{\sigma}^{-2}}.\]

R session: IVP

The Inverse volatility portfolio (IVP) has the simple closed-form solution: \[\mathbf{w} = \frac{\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\boldsymbol{\sigma}^{-1}}.\]

Its implementation in R is trivial:

# create function for IVP
IVP <- function(Sigma) {
  sigma <- sqrt(diag(Sigma))
  w <- 1/sigma
  w <- w/sum(w)
  return(w)
}

# this function can now be used as
w_IVP <- IVP(Sigma)

Risk parity portfolio (RPP)

The risk parity portfolio (RPP) or equal risk portfolio (ERP) aims at equalizing the risk contribution from the invested assets in the global portfolio risk (Qian 2005).
More sound formulation of the inverse volatility portfolio (IVP), which ignores the asset correlations, by properly taking into account the while covariance matrix.
From “dollar” to risk diversification:

Most diversified portfolio (MDP)

In (Choueifaty and Coignard 2008), it was postulated that markets are risk-efficient, so that investments will produce returns in proportion to their total risk (measured by volatility).
The diversification ratio (DR) was defined analogous to the Sharpe ratio (SR) but substituting the weighted return for the weighted volatility: \[\mathsf{DR} = \frac{\mathbf{w}^{T}\boldsymbol{\sigma}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}.\]
For long-only portfolios, it can be shown that \(\mathsf{DR}\ge1\). For a single stock, \(\mathsf{DR}=1\).
The most diversified portfolio (MDP) is obtained as the maximization of DR (akin to the maximization of the Sharpe ratio): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\sigma}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1. \end{array}\]

Most diversified portfolio (MDP)

The MDP has some interesting properties:
- the correlation of some portfolio \(\mathbf{w}\) with the MDP \(\mathbf{w}_{\sf MDP}\) is proportional to the DR of the portfolio: \[\rho = \frac{\mathsf{DR}(\mathbf{w})}{\mathsf{DR}(\mathbf{w}_{\sf MDP})},\]
- as a consequence, all the assets in the MDP have the same positive correlation to the MDP,
- also, any stock not held by the MDP is more correlated to the MDP than any of the stocks that belong to it (this illustrates that all assets in the universe considered are effectively represented in the MDP, even if the portfolio does not physically hold them);
- if all the stocks in the universe have the same volatility, then the MDP is equivalent to the GMVP;
- the squared \(\mathsf{DR}\) can be interpreted as the effective number of independent risk factors in the portfolio (Choueifaty et al. 2013),
- as a consequence, the MDP has a DR equal to the square root of the effective number of independent risk factors available in the entire market (which is typically larger than the market index).

R session: MDP

The Most diversified portfolio (MDP) is formulated exactly as the maximum Sharpe ratio portfolio (MSRP) but using \(\boldsymbol{\sigma}\) in lieu of \(\boldsymbol{\mu}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \dfrac{\mathbf{w}^{T}\boldsymbol{\sigma}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1\\ & \tilde{\mathbf{w}}\ge\mathbf{0}. \end{array}\]

Therefore, we can use the same R function created to solve the MSRP:

w_MDP <- MSRP(mu = sqrt(diag(Sigma)), Sigma)

Maximum decorrelation portfolio (MDCP)

The maximum decorrelation portfolio (MDCP) (Christoffersen et al. 2012) is closely related to GMVP and MDP, but applies to the case where an investor believes all assets have similar returns and volatility, but heterogeneous correlations.
The MDCP is formulated as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\mathbf{C}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] where \(\mathbf{C} \triangleq \mathsf{Diag}(\boldsymbol{\Sigma})^{-1/2}\boldsymbol{\Sigma}\mathsf{Diag}(\boldsymbol{\Sigma})^{-1/2}\) is the correlation matrix.
Interestingly, when the weights derived from the MDCP are divided by their respective volatilities and re-standardized so that they sum to 1, we retrieve the MDP weights.
The MDCP happen to
- maximize the DR when all assets have equal volatility and
- maximize the SR when all assets have equal risks and returns.

R session: MDCP

The maximum decorrelation portfolio (MDCP) is formally formulated as the GMVP but using \(\mathbf{C}\) in lieu of \(\boldsymbol{\Sigma}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\mathbf{C}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^{T}\mathbf{w}=1 \end{array}\] where \(\mathbf{C} \triangleq \mathsf{Diag}(\boldsymbol{\Sigma})^{-1/2}\boldsymbol{\Sigma}\mathsf{Diag}(\boldsymbol{\Sigma})^{-1/2}\) is the correlation matrix.

Therefore, we can use the same R function created to solve the GMVP:

# create function for MDCP based on GMVP()
MDCP <- function(Sigma) {
  C <- diag(1/sqrt(diag(Sigma))) %*% Sigma %*% diag(1/sqrt(diag(Sigma)))
  colnames(C) <- colnames(Sigma)
  return(GMVP(Sigma = C))
}

# this function can now be used as
w_MDCP <- MDCP(Sigma)

Comparison of Portfolios

Comparison of portfolios: dollar allocation

Comparison of portfolios: cumulative P&L

Comparison of portfolios: drawdown

Comparison of portfolios over multiple datasets

We have backtested a number of different portfolio designs.
However, we have only used one set of market data and we should not draw any conclusion from that.
Also, we have used a training data window to design the portfolio and evaluated it in a subsequent test data window.
A serious backtesting requires:
- multiple datasets
- evaluation on a rolling-window basis.
In R, the package portfolioBacktest makes this very simple.

Comparison of portfolios over multiple datasets

The following performance table is obtained with the R package portfolioBacktest:

Comparison of portfolios over multiple datasets

The following barplot is obtained with the R package portfolioBacktest:

Comparison of portfolios over multiple datasets

The following boxplot is obtained with the r package portfolioBacktest:

R session: Comparison of MVP, GMVP, IVP, MSRP, MDP, MDCP, as well as the previous heuristic portfolios (EWP, quintile portfolio, and GMRP)

We are now ready to consider several portfolio designs and compare their performance (the underlying constraint is simply \(\mathbf{w}\ge\mathbf{0}\) and \(\mathbf{1}^T\mathbf{w}=1\)).

Recall the three heuristic portfolios we want to compare:

w_heuristic <- cbind("EWP"    = w_EWP, 
                     "QuintP" = w_QuintP[, "QuintP (mu/sigma)"], 
                     "GMRP"   = w_GMRP)

Now we stack the Markowitz-based portfolios we want to compare:

w_Markowitz <- cbind("MVP"  = w_MVP, 
                     "GMVP" = w_GMVP, 
                     "IVP"  = w_IVP, 
                     "MSRP" = w_MSRP, 
                     "MDP"  = w_MDP, 
                     "MDCP" = w_MDCP)

We can now compare the allocations of the portfolios:

w_all <- cbind(w_heuristic, w_Markowitz)
barplot(t(w_all), col = rainbow10equal[1:9], legend = colnames(w_all), beside = TRUE,
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars")

The performance is (in-sample vs out-of-sample):

# compute returns of all portfolios
ret_all <- xts(X_lin %*% w_all, index(X_lin))
ret_all_trn <- ret_all[1:T_trn, ]
ret_all_tst <- ret_all[-c(1:T_trn), ]

# performance
t(table.AnnualizedReturns(ret_all_trn))

R>>        Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> EWP               0.0405             0.2059                    0.1967
R>> QuintP            0.2518             0.1870                    1.3461
R>> GMRP              0.2629             0.2638                    0.9969
R>> MVP               0.2518             0.1719                    1.4649
R>> GMVP              0.1873             0.1576                    1.1886
R>> IVP               0.1386             0.1644                    0.8434
R>> MSRP              0.2471             0.1669                    1.4807
R>> MDP               0.0935             0.1886                    0.4955
R>> MDCP             -0.1027             0.3287                   -0.3123

t(table.AnnualizedReturns(ret_all_tst))

R>>        Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> EWP               0.3397             0.2770                    1.2265
R>> QuintP            0.0988             0.1748                    0.5652
R>> GMRP              0.0375             0.2386                    0.1572
R>> MVP               0.1196             0.1676                    0.7139
R>> GMVP              0.1430             0.1779                    0.8038
R>> IVP               0.2494             0.2110                    1.1821
R>> MSRP              0.1033             0.1664                    0.6207
R>> MDP               0.3146             0.2573                    1.2229
R>> MDCP              0.4050             0.4246                    0.9538

We can observe that:

as expected, the MSRP achieves the maximum ex ante (in-sample) Sharpe ratio; however, this is not maintained ex post (out-of-sample);
as expected, the EWP achieves the best ex post performance;
the MVP performs poorly;
other top performers include the IVP and MDP.

Let’s plot the wealth evolution (cumulative PnL) over the whole time:

{ chart.CumReturns(ret_all, main = "Cumulative return of portfolios", 
                   wealth.index = TRUE, legend.loc = "topleft", colorset = rich10equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

and let’s zoom in the out-of-sample period:

chart.CumReturns(ret_all_tst, main = "Cumulative return of portfolios (out-of-sample)",
                   wealth.index = TRUE, legend.loc = "topleft", colorset = rich10equal)

Let’s look at the drawdown:

chart.Drawdown(ret_all_tst, main = "Drawdown of portfolios (out-of-sample)", 
               legend.loc = "bottomleft", colorset = rich10equal)

Clearly the MDCP and the GMRP have the worst drawdown and are unacceptable.

Finally, we can plot the (in-sample) return-risk scatter plot along with the efficient frontier:

# first, compute the efficient frontier
w_frontier_trn <- NULL
lmd_sweep <- exp(seq(-6, 6, by = 0.5))
for (lmd in lmd_sweep)
  w_frontier_trn <- cbind(w_frontier_trn, MVP(mu, Sigma, lmd))
ret_frontier_trn <- xts(X_lin_trn %*% w_frontier_trn, index(X_lin_trn))
mu_frontier_trn <- table.AnnualizedReturns(ret_frontier_trn)[1, ]
sd_frontier_trn <- table.AnnualizedReturns(ret_frontier_trn)[2, ]

# plot in-sample sd-mu scatter plot
maxSR <- table.AnnualizedReturns(ret_all_trn[, "MSRP"])[3, ]
chart.RiskReturnScatter(ret_all_trn,
                        main = "Annualized Return and Risk (in-sample)",
                        symbolset = c(rep(21, 3), rep(22, 6)), 
                        colorset = c(rep("darkred", 3), rep("darkblue", 6)),
                        bg = "black",
                        add.sharpe = maxSR)
lines(sd_frontier_trn, mu_frontier_trn)

Observe that this nice return-risk scatter plot totally deforms in an unpredictable way when we evaluate it in the out-of-sample set as we can see:

# compute the efficient frontier again but based on the test data
mu_tst <- colMeans(X_log_tst)
Sigma_tst <- cov(X_log_tst)

w_frontier_tst <- NULL
lmd_sweep <- exp(seq(-6, 6, by = 0.5))
for (lmd in lmd_sweep)
  w_frontier_tst <- cbind(w_frontier_tst, MVP(mu_tst, Sigma_tst, lmd))
ret_frontier_tst <- xts(X_lin_tst %*% w_frontier_tst, index(X_lin_tst))
mu_frontier_tst <- table.AnnualizedReturns(ret_frontier_tst)[1, ]
sd_frontier_tst <- table.AnnualizedReturns(ret_frontier_tst)[2, ]

# plot out-of-sample sd-mu scatter plot
chart.RiskReturnScatter(ret_all_tst,
                        main = "Annualized Return and Risk (out-of-sample)",
                        symbolset = c(rep(21, 3), rep(22, 6)),  
                        colorset = c(rep("darkred", 3), rep("darkblue", 6)),
                        bg = "black",
                        add.sharpe = NA,
                        ylim = c(0, 1))
lines(sd_frontier_tst, mu_frontier_tst)

Again this ex post return-risk scatter plot shows that the winners are EWP, IVP, and MDP.

R session: Slower rebalancing frequencies

In the previous performance analysis of the different portfolios, we were implicitly assuming a daily rebalancing and that’s why computing the return of the portfolio was as easy as ret_portf <- X_lin %*% w. Now, we will consider a lower rebalancing frequency (weekly, monthly, quarterly, and yearly) for a given portfolio (in particular, we will just use the EWP portfolio w_EWP). Observe that on the day of the rebalancing, the portfolio held will be the designed one w_EWP but in the subsequent days without rebalancing such portfolio slowly deviates as the prices of the stocks change (recall that the portfolio denotes the dollar or capital allocation).

The package PerformanceAnalytics has the convenient function Return.portfolio() that allows us to compute the portfolio return with different rebalancing schemes. Let’s start with a sanity check for the daily rebalancing:

# choose portfolio for the comparison
w <- w_EWP

# recall the computation for the daily rebalancing (with daily returns)
ret_daily_rebal <- X_lin %*% w

# we can alternativaly use this function:
ret_daily_rebal_ <- Return.portfolio(X_lin, weights = w, rebalance_on = "days")
norm(ret_daily_rebal - ret_daily_rebal_)  # sanity check

R>> [1] 8.420819e-14

Now, let’s use a yearly rebalancing frequency to observe the slow deviation as the prices change:

# let's observe how the portfolio slowly deviates from its original design
tmp <- Return.portfolio(X_lin, weights = w, rebalance_on = "years", verbose = TRUE)
round(head(tmp$BOP.Weight, 15), digits = 4)

R>>              AAPL    AMD    ADI   ABBV   AEZS      A    APD     AA     CF
R>> 2013-01-03 0.1111 0.1111 0.1111 0.1111 0.1111 0.1111 0.1111 0.1111 0.1111
R>> 2013-01-04 0.1103 0.1099 0.1099 0.1107 0.1121 0.1121 0.1113 0.1127 0.1111
R>> 2013-01-07 0.1063 0.1134 0.1071 0.1085 0.1125 0.1134 0.1119 0.1141 0.1127
R>> 2013-01-08 0.1056 0.1168 0.1073 0.1086 0.1133 0.1125 0.1117 0.1120 0.1122
R>> 2013-01-09 0.1067 0.1177 0.1070 0.1070 0.1124 0.1124 0.1127 0.1129 0.1114
R>> 2013-01-10 0.1043 0.1152 0.1060 0.1069 0.1130 0.1147 0.1135 0.1119 0.1145
R>> 2013-01-11 0.1053 0.1144 0.1069 0.1069 0.1122 0.1152 0.1130 0.1102 0.1159
R>> 2013-01-14 0.1037 0.1155 0.1051 0.1068 0.1186 0.1135 0.1127 0.1089 0.1152
R>> 2013-01-15 0.1008 0.1151 0.1059 0.1083 0.1164 0.1147 0.1130 0.1094 0.1164
R>> 2013-01-16 0.0978 0.1170 0.1055 0.1101 0.1174 0.1141 0.1128 0.1096 0.1157
R>> 2013-01-17 0.1015 0.1183 0.1053 0.1127 0.1127 0.1132 0.1119 0.1087 0.1157
R>> 2013-01-18 0.0998 0.1180 0.1067 0.1144 0.1120 0.1142 0.1122 0.1084 0.1143
R>> 2013-01-22 0.1000 0.1068 0.1075 0.1181 0.1124 0.1158 0.1134 0.1099 0.1161
R>> 2013-01-23 0.1004 0.1058 0.1071 0.1146 0.1122 0.1164 0.1137 0.1118 0.1179
R>> 2013-01-24 0.1007 0.1161 0.1060 0.1172 0.1097 0.1138 0.1108 0.1095 0.1162

chart.StackedBar(tmp$BOP.Weight, main = "Evolution of uniform portfolio with yearly rebalancing",
                 ylab = "w", space = 0, border = NA)

We can now compare the different rebalancing frequencies:

# now different rebalancing frequencies
ret_weekly_rebal <- Return.portfolio(X_lin, weights = w, rebalance_on = "weeks")
ret_monthly_rebal <- Return.portfolio(X_lin, weights = w, rebalance_on = "months")
ret_quarterly_rebal <- Return.portfolio(X_lin, weights = w, rebalance_on = "quarters")
ret_yearly_rebal <- Return.portfolio(X_lin, weights = w, rebalance_on = "years")
ret_allfreqs <- cbind(ret_daily_rebal, ret_weekly_rebal, ret_monthly_rebal, ret_quarterly_rebal, ret_yearly_rebal)
colnames(ret_allfreqs) <- c("daily", "weekly", "monthly", "quarterly", "yearly")
round(head(ret_allfreqs, 25), digits = 4)

R>>              daily  weekly monthly quarterly  yearly
R>> 2013-01-03 -0.0050 -0.0050 -0.0050   -0.0050 -0.0050
R>> 2013-01-04  0.0078  0.0079  0.0079    0.0079  0.0079
R>> 2013-01-07  0.0010  0.0010  0.0010    0.0010  0.0010
R>> 2013-01-08 -0.0073 -0.0073 -0.0073   -0.0073 -0.0073
R>> 2013-01-09  0.0064  0.0062  0.0064    0.0064  0.0064
R>> 2013-01-10  0.0032  0.0033  0.0031    0.0031  0.0031
R>> 2013-01-11  0.0087  0.0087  0.0089    0.0089  0.0089
R>> 2013-01-14 -0.0075 -0.0075 -0.0075   -0.0075 -0.0075
R>> 2013-01-15 -0.0019 -0.0018 -0.0016   -0.0016 -0.0016
R>> 2013-01-16  0.0042  0.0041  0.0035    0.0035  0.0035
R>> 2013-01-17  0.0099  0.0101  0.0100    0.0100  0.0100
R>> 2013-01-18 -0.0070 -0.0070 -0.0075   -0.0075 -0.0075
R>> 2013-01-22  0.0055  0.0055  0.0054    0.0054  0.0054
R>> 2013-01-23  0.0160  0.0158  0.0152    0.0152  0.0152
R>> 2013-01-24 -0.0119 -0.0117 -0.0104   -0.0104 -0.0104
R>> 2013-01-25  0.0055  0.0061  0.0064    0.0064  0.0064
R>> 2013-01-28  0.0034  0.0034  0.0025    0.0025  0.0025
R>> 2013-01-29  0.0198  0.0210  0.0202    0.0202  0.0202
R>> 2013-01-30 -0.0140 -0.0152 -0.0154   -0.0154 -0.0154
R>> 2013-01-31 -0.0106 -0.0114 -0.0116   -0.0116 -0.0116
R>> 2013-02-01  0.0101  0.0101  0.0101    0.0103  0.0103
R>> 2013-02-04 -0.0196 -0.0196 -0.0195   -0.0196 -0.0196
R>> 2013-02-05  0.0102  0.0102  0.0101    0.0092  0.0092
R>> 2013-02-06  0.0032  0.0030  0.0030    0.0033  0.0033
R>> 2013-02-07  0.0113  0.0111  0.0110    0.0105  0.0105

# performance
t(table.AnnualizedReturns(ret_allfreqs))

R>>           Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> daily                0.1224             0.2296                    0.5334
R>> weekly               0.1186             0.2299                    0.5160
R>> monthly              0.1050             0.2268                    0.4629
R>> quarterly            0.0859             0.2066                    0.4156
R>> yearly               0.1310             0.2020                    0.6485

chart.CumReturns(ret_allfreqs, main = "Daily wealth for different rebalancing frequencies",
                 wealth.index = TRUE, legend.loc = "topleft", colorset = rich6equal)

R session: Rolling window portfolios

Until now, we have considered static portfolios in the sense that they are first designed based on a training set and then they remain fixed and are used in the test set. In a more realistic setting, however, one wants to implement this procedure in a rolling-window basis. That is, with some frequency the portfolio is reoptimized (since the dynamics of the asset returns may change with time) and rebalanced.

Recall the procedure for the static portfolio where one estimates \(\boldsymbol\mu\) and \(\boldsymbol\Sigma\) from the training set, designs some portfolio (say, Markowitz mean-variance), and then applies it to the test set with some rebalancing frequency (say, monthly):

# recall the procedure for the static portfolio (Markowitz, for example):
T_trn <- round(0.5*T)
X_log_trn <- X_log[1:T_trn, ]
X_log_tst <- X_log[(T_trn+1):T, ]
X_lin_trn <- X_lin[1:T_trn, ]
X_lin_tst <- X_lin[(T_trn+1):T, ]
mu <- colMeans(X_log_trn)
Sigma <- cov(X_log_trn)
w_MVP_static <- MVP(mu, Sigma, lmd = 2)
ret_static <- Return.portfolio(X_lin_tst, weights = w_MVP_static, rebalance_on = "months")

Now we want to reoptimize the portfolio on a rolling-window basis. R provides some options:

The package PerformanceAnalytics provides two functions to handle the two main rolling-window strategies: apply.rolling() is for a rolling window of fixed length, whereas apply.fromstart() is for an expanding rolling window. However, those only work independently on each column of the multivariate time series, which is not what we want.
The package xts has the function apply.rolling() but it only works with nonoverlapping periods, which is not what we want. What we want is to choose a window of, say, 6 months (about 120 days) and shift it, say, every month.

We will then do it by ourselves with a simple loop:

# create empty portfolio matrix
w_MVP_rolling <- X_log
w_MVP_rolling[] <- NA
head(w_MVP_rolling)

R>>            AAPL AMD ADI ABBV AEZS  A APD AA CF
R>> 2013-01-03   NA  NA  NA   NA   NA NA  NA NA NA
R>> 2013-01-04   NA  NA  NA   NA   NA NA  NA NA NA
R>> 2013-01-07   NA  NA  NA   NA   NA NA  NA NA NA
R>> 2013-01-08   NA  NA  NA   NA   NA NA  NA NA NA
R>> 2013-01-09   NA  NA  NA   NA   NA NA  NA NA NA
R>> 2013-01-10   NA  NA  NA   NA   NA NA  NA NA NA

# find rebalancing indices
rebal_indices <- T_trn + endpoints(X_log_tst, on = "months")
rebal_indices

R>>  [1]  504  523  542  564  585  605  627  649  670  691  713  733  755  774  794  816  837  858  880
R>> [20]  900  923  944  965  986 1007

index(X_log)[rebal_indices]

R>>  [1] "2015-01-02" "2015-01-30" "2015-02-27" "2015-03-31" "2015-04-30" "2015-05-29" "2015-06-30"
R>>  [8] "2015-07-31" "2015-08-31" "2015-09-30" "2015-10-30" "2015-11-30" "2015-12-31" "2016-01-29"
R>> [15] "2016-02-29" "2016-03-31" "2016-04-29" "2016-05-31" "2016-06-30" "2016-07-29" "2016-08-31"
R>> [22] "2016-09-30" "2016-10-31" "2016-11-30" "2016-12-30"

# run the rolling window loop
lookback <- 10*20  # maximum value is: floor(T_trn/20)*20
for (i in 1:length(rebal_indices)) {
  # estimate moments
  X_ <- X_log[(rebal_indices[i] - lookback + 1):rebal_indices[i], ]
  mu <- colMeans(X_)
  Sigma <- cov(X_)
  # design portfolio
  w_MVP_rolling[rebal_indices[i], ] <- MVP(mu, Sigma, lmd = 2)
}
w_MVP_rolling <- na.omit(w_MVP_rolling)
w_MVP_rolling

R>>                    AAPL          AMD          ADI         ABBV         AEZS            A
R>> 2015-01-02 1.000000e+00 1.580691e-10 5.050330e-10 1.202175e-09 1.105367e-10 3.573050e-10
R>> 2015-01-30 1.000000e+00 3.539053e-10 7.050078e-10 2.239440e-09 2.554891e-10 6.893228e-10
R>> 2015-02-27 9.999998e-01 1.590879e-09 5.307164e-09 5.757445e-09 1.121228e-09 3.135692e-09
R>> 2015-03-31 7.476072e-01 2.305288e-09 1.534448e-08 1.164569e-08 1.782426e-09 6.584706e-09
R>> 2015-04-30 6.529623e-01 1.929058e-09 2.915103e-08 1.511416e-02 1.810234e-09 9.038248e-09
R>> 2015-05-29 2.315103e-01 6.080842e-10 2.774413e-01 4.580175e-08 3.180671e-10 2.338672e-09
R>> 2015-06-30 4.207176e-08 1.175655e-09 3.356819e-01 1.311964e-08 4.809086e-10 2.857770e-09
R>> 2015-07-31 8.392660e-08 3.044102e-09 5.745496e-01 2.863960e-01 8.534075e-10 1.274881e-08
R>> 2015-08-31 1.592957e-08 1.396224e-09 4.926272e-01 1.703659e-08 3.254074e-10 3.791135e-09
R>> 2015-09-30 5.014320e-01 1.780290e-09 3.394764e-01 6.258564e-09 3.285026e-10 6.633887e-09
R>> 2015-10-30 5.959166e-01 4.504492e-09 4.040833e-01 2.368998e-08 3.681050e-10 2.538993e-08
R>> 2015-11-30 1.573484e-08 3.875573e-09 2.973573e-01 6.790072e-02 9.198252e-10 6.347419e-01
R>> 2015-12-31 7.729590e-10 7.191776e-02 3.307433e-09 2.178741e-01 4.398168e-11 7.102081e-01
R>> 2016-01-29 1.447055e-08 1.663432e-08 4.935063e-02 2.961420e-01 4.236551e-10 4.289580e-01
R>> 2016-02-29 5.134762e-08 1.863314e-07 1.944425e-07 3.292062e-06 2.568049e-09 2.192275e-01
R>> 2016-03-31 2.747988e-08 1.267846e-01 3.120421e-08 3.602898e-08 1.447580e-09 2.777190e-01
R>> 2016-04-29 6.306332e-10 2.687206e-01 1.237174e-09 1.758966e-09 1.033592e-10 1.059529e-08
R>> 2016-05-31 4.182577e-09 4.299935e-01 7.054894e-09 5.718436e-09 5.535112e-10 5.700065e-01
R>> 2016-06-30 9.988696e-10 4.381333e-01 1.585390e-09 4.372585e-09 3.829997e-10 5.618667e-01
R>> 2016-07-29 2.242834e-09 4.987464e-01 4.403256e-09 4.133897e-02 5.280437e-10 4.599146e-01
R>> 2016-08-31 8.031004e-10 5.430246e-01 1.242330e-09 2.333549e-09 1.985657e-10 4.569754e-01
R>> 2016-09-30 1.911442e-08 4.377899e-01 4.943627e-08 1.596992e-01 2.734715e-09 6.312447e-08
R>> 2016-10-31 2.248867e-07 5.312681e-01 2.115053e-01 2.539852e-08 1.146040e-06 5.974307e-08
R>> 2016-11-30 1.266082e-08 6.462092e-01 3.537907e-01 1.425341e-08 5.633842e-09 1.241059e-08
R>> 2016-12-30 1.215596e-08 6.751752e-01 2.282577e-01 9.656699e-02 3.525486e-09 3.534123e-08
R>>                     APD           AA           CF
R>> 2015-01-02 1.037021e-09 1.582697e-09 8.306670e-10
R>> 2015-01-30 2.004292e-09 1.293498e-09 5.901921e-09
R>> 2015-02-27 1.693633e-08 3.335767e-09 1.733711e-07
R>> 2015-03-31 3.507708e-07 4.284927e-09 2.523924e-01
R>> 2015-04-30 1.669919e-08 4.136112e-09 3.319235e-01
R>> 2015-05-29 3.046181e-09 9.222353e-10 4.910484e-01
R>> 2015-06-30 5.425461e-09 1.349488e-09 6.643180e-01
R>> 2015-07-31 3.138748e-08 2.726832e-09 1.390543e-01
R>> 2015-08-31 6.386453e-02 9.593828e-10 4.435083e-01
R>> 2015-09-30 1.020273e-01 1.448415e-09 5.706429e-02
R>> 2015-10-30 5.743667e-08 1.693634e-09 7.838458e-09
R>> 2015-11-30 1.291114e-08 2.487516e-09 5.077051e-09
R>> 2015-12-31 1.205238e-09 4.518472e-10 4.028364e-10
R>> 2016-01-29 2.255493e-01 2.856509e-09 2.805078e-09
R>> 2016-02-29 7.807688e-01 1.891941e-08 1.650022e-08
R>> 2016-03-31 5.954963e-01 1.086565e-08 4.094872e-09
R>> 2016-04-29 7.312794e-01 2.159469e-09 2.712549e-10
R>> 2016-05-31 1.034081e-08 4.375698e-09 1.199165e-09
R>> 2016-06-30 4.289797e-09 9.780217e-10 3.301711e-10
R>> 2016-07-29 4.645468e-09 1.922269e-09 6.623378e-10
R>> 2016-08-31 1.264529e-08 1.402590e-09 2.218040e-10
R>> 2016-09-30 4.025108e-01 9.409569e-09 2.342977e-09
R>> 2016-10-31 2.572251e-01 4.424621e-08 9.895388e-09
R>> 2016-11-30 1.963114e-08 2.841839e-08 5.565432e-09
R>> 2016-12-30 4.998978e-08 1.334315e-08 3.895068e-09

We can now plot the evolution over time of the portfolio:

# compute portfolio returns
tmp <- Return.portfolio(X_lin_tst, weights = w_MVP_rolling, verbose = TRUE)
ret_rolling <- tmp$returns
chart.StackedBar(tmp$BOP.Weight, main = "Evolution of rolling-window Markowitz's mean-variance portfolio",
                 ylab = "w", space = 0, border = NA)

We can now compare the static and rolling window versions:

# performance
ret_MVP <- cbind(ret_static, ret_rolling)
colnames(ret_MVP) <- c("Markowitz MVP - static", "Markowitz MVP - rolling window")

t(table.AnnualizedReturns(ret_MVP))

R>>                                Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> Markowitz MVP - static                    0.0350             0.1978                    0.1770
R>> Markowitz MVP - rolling window            0.3586             0.2957                    1.2126

chart.CumReturns(ret_MVP, main = "Daily wealth for static vs rolling window",
                 wealth.index = TRUE, legend.loc = "topleft", colorset = rich6equal)

chart.Drawdown(ret_MVP, main = "Drawdown for static vs rolling window", 
               legend.loc = "bottomleft", colorset = rich6equal)

R session: Assessment of portfolios with the package `portfolioBacktest`

We have backtested a number of different portfolio designs.
However, we have only used one set of market data and we should not draw any conclusion from that.
Also, we have used a training data window to design the portfolio and evaluated it in a subsequent test data window.
A serious backtesting requires:
- multiple datasets
- evaluation on a rolling-window basis.
We will use the R package portfolioBacktest which makes this very simple.

First, let’s load the market data using the package portfolioBacktest:

library(portfolioBacktest)

# load the SP500 assets
data("SP500_symbols")
SP500_YAHOO <- stockDataDownload(stock_symbols = SP500_symbols, from = "2010-12-01", to = "2018-12-01")

# generate 100 random samples each containing 50 random assets over a random window of two years
N_datasets <- 100
mydataset <- stockDataResample(SP500_YAHOO, N = 50, T = 252*2, num_datasets = N_datasets)

Second, let’s define the portfolios that we want to backtest:

library(CVXR)

QuintP <- function(dataset) {
  prices <- dataset$adjusted
  N <- ncol(prices)
  X <- diff(log(prices))[-1]  # returns
  mu <- colMeans(X)
  Sigma <- cov(X)
  
  # portfolio design
  idx <- sort(mu/diag(Sigma), decreasing = TRUE, index.return = TRUE)$ix
  w <- rep(0, N)
  w[idx[1:round(N/5)]] <- 1/round(N/5)
  return(w)
}

IVP <- function(dataset) {
  prices <- dataset$adjusted
  N <- ncol(prices)
  X <- diff(log(prices))[-1]  # returns
  Sigma <- cov(X)

  # portfolio design
  sigma <- sqrt(diag(Sigma))
  w <- 1/sigma
  w <- w/sum(w)
  return(w)
}

GMVP <- function(dataset) {
  prices <- dataset$adjusted
  N <- ncol(prices)
  X <- diff(log(prices))[-1]  # returns
  Sigma <- cov(X)
  
  # portfolio design
  w <- Variable(N)
  prob <- Problem(Minimize(quad_form(w, Sigma)),
                  constraints = list(sum(w) == 1, w >= 0))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

MVP <- function(dataset) {
  prices <- dataset$adjusted
  N <- ncol(prices)
  X <- diff(log(prices))[-1]  # returns
  mu <- colMeans(X)
  Sigma <- cov(X)
  
  # portfolio design
  lmd = 0.5
  w <- Variable(N)
  prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)),
                  constraints = list(sum(w) == 1, w >= 0))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

portfolio_list <- list("QuintP" = QuintP,
                       "IVP"    = IVP,
                       "GMVP"   = GMVP,
                       "MVP"    = MVP)

Then, let’s proceed with the backtesting based on 100 datasets randomly chosen from market data:

bt_all_port <- portfolioBacktest(portfolio_funs = portfolio_list,
                                 dataset = mydataset,
                                 benchmark = c("uniform", "index"),
                                 T_rolling_window = 252*2/3, optimize_every = 20, rebalance_every = 1, 
                                 show_progress_bar = FALSE,
                                 paral_datasets = 5)
res_summary_median <- backtestSummary(bt_all_port)

Now we are ready to compare the portfolios via tables and plots.

A leaderboard is a table with a comparison:

summaryTable(res_summary_median, type = "DT", order_col = 2, order_dir = "desc")

A barplot provides the same information as the leaderboard but in a visual way:

summaryBarPlot(res_summary_median, measures = c("Sharpe ratio", "max drawdown"))

A boxplot is probably the best way to properly compare the performance of different portfolios with a single performance measure:

backtestBoxPlot(bt_all_port, "Sharpe Ratio")

We can confirm the good performance of the EWP and the bad performance of the Markowitz MVP.

Conclusions

Drawbacks of Markowitz’s portfolio

Markowitz’s portfolio has never been fully embraced by practitioners, among other reasons (Zhao et al. 2019) because

variance is not a good measure of risk in practice since it penalizes both the unwanted high losses and the desired low losses: the solution is to use alternative measures for risk, e.g., VaR and CVaR (McNeil et al. 2005),
it is highly sensitive to parameter estimation errors (i.e., to the covariance matrix \(\boldsymbol{\Sigma}\) and especially to the mean vector \(\boldsymbol{\mu}\)): solution is robust optimization (Fabozzi 2007) and improved parameter estimation (Ledoit and Wolf 2004),
it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis): solution is the risk parity portfolio (Qian 2005).

Markowitz’s portfolio vs \(1/N\) portfolio

Markowitz’s mean-variance portfolio based on moments estimated via sample estimates is notorious for producing extreme weights that fluctuate substantially over time and perform poorly out of sample.
The \(1/N\) portfolio has been claimed to outperform Markowitz’s mean-variance portfolio (DeMiguel et al. 2009); although this may not hold if: i) the estimation window is long; ii) the ex-ante (true) Sharpe ratio of the mean-variance portfolio is much higher than that of the \(1/N\) portfolio; and iii) the number of assets is small (less parameters to be estimated).
Markowitz’s mean-variance portfolio is sometimes cynically referred to as “error-maximizer.”
However, other studies show that Markowitz’s mean-variance portfolio is superior to the \(1/N\) portfolio (Kritzman et al. 2010).

Basic references

Short introduction to different portfolio (Zhao et al. 2019):

📄 Z. Zhao, R. Zhou, D. P. Palomar, and Y. Feng, “Portfolio Optimization,” submitted, 2019.
Textbook on financial data (Ruppert and Matteson 2015):

📘 D. Ruppert and D. Matteson. Statistics and Data Analysis for Financial Engineering: With R Examples. Springer, 2015.
Textbooks on portfolio optimization (Cornuejols and Tütüncü 2006; Fabozzi 2007; Feng and Palomar 2016):

📘 G. Cornuejols and R. Tutuncu. Optimization Methods in Finance. Cambridge University Press, 2006.

📘 F. J. Fabozzi. Robust Portfolio Optimization and Management. Wiley, 2007.

📘 Y. Feng and D. P. Palomar. A Signal Processing Perspective on Financial Engineering. Foundations and Trends in Signal Processing, Now Publishers, 2016.

Thanks

For more information visit: https://www.danielppalomar.com

References

Ardia, D., Bolliger, G., Boudt, K., & Gagnon-Fleury, J.-P. (2017). The impact of covariance misspecification in risk-based portfolios. Annals of Operations Research, 254(1-2), 2–16. https://doi.org/10.1007/s10479-017-2474-7

Bajalinov, E. B. (2003). Linear-fractional programming: Theory, methods, applications and software. Kluwer Academic Publishers.

Charnes, A., & Cooper, W. W. (1962). Programming with linear fractional functionals. Naval Research Logistics Quarterly, 9(3-4), 181–186.

Chopra, V., & Ziemba, W. (1993). The effect of errors in means, variances and covariances on optimal portfolio choice. Journal of Portfolio Management.

Choueifaty, Y., & Coignard, Y. (2008). Toward maximum diversification. Journal of Portfolio Management.

Choueifaty, Y., Froidure, T., & Reynier, J. (2013). Properties of the most diversified portfolio. Journal of Investment Strategies.

Christoffersen, P., Errunza, V., Jacobs, K., & Langlois, H. (2012). Is the potential for international diversification disappearing? A dynamic copula approach. The Review of Financial Studies.

Cornuejols, G., & Tütüncü, R. (2006). Optimization methods in finance. Cambridge University Press.

DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification: How inefficient is the 1/n portfolio strategy? The Review of Financial Studies.

Dinkelbach, W. (1967). On nonlinear fractional programming. Manage. Sci., 133(7), 492–498.

Duchin, R., & Levy, H. (2009). Markowitz versus the talmudic portfolio diversification strategies. Journal of Portfolio Management, 35(2), 71.

Fabozzi, F. J. (2007). Robust portfolio optimization and management. Wiley.

Feng, Y., & Palomar, D. P. (2016). A Signal Processing Perspective on Financial Engineering. Foundations; Trends in Signal Processing, Now Publishers.

Jacobs, B. I., Levy, K. N., & Markowitz, H. M. (2005). Portfolio optimization with factors, scenarios, and realistic short positions. Operations Research, 53(4), 586–599.

Jolliffe, I. (2002). Principal component analysis. Springer-Verlag.

Kritzman, M., Page, S., & Turkington, D. (2010). In defense of optimization: The fallacy of 1/n. Financial Analysts Journal, 66(2).

Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411.

Lütkepohl, H. (2007). New introduction to multiple time series analysis. Springer.

Markowitz, H. (1952). Portfolio selection. J. Financ., 7(1), 77–91.

McNeil, A. J., Frey, R., & Embrechts, P. (2005). Quantitative risk management: Concepts, techniques and tools. Princeton University Press.

Meucci, A. (2005). Risk and asset allocation. Springer.

Qian, E. (2005). Risk parity portfolios: Efficient portfolios through true diversification. PanAgora Asset Management.

Ruppert, D. (2010). Statistics and data analysis for financial engineering. Springer.

Ruppert, D., & Matteson, D. (2015). Statistics and data analysis for financial engineering (2nd Ed.). Springer.

Schaible, S. (1974). Parameter-free convex equivalent and dual programs of fractional programming problems. Zeitschrift fur Operations Research, 18(5), 187–196.

Sharpe, W. F. (1966). Mutual fund performance. The Journal of Business, 39(1), 119–138.

Stancu-Minasian, I. M. (1992). Fractional programming: Theory, methods and applications. Kluwer Academic Publishers.

Tsay, R. S. (2005). Analysis of financial time series (3rd ed.). John Wiley & Sons.

Tsay, R. S. (2013). Multivariate time series analysis: With r and financial applications. John Wiley & Sons.

von Neumann, J. (1937). Uber ein okonomisches gleichgewichtssystem und eine verallgemeinerung des brouwerschen fixpunktsatzes. Ergebnisse eines Mathematischen Kolloquiums, 8, 73–83.

Zhao, Z., Zhou, R., Palomar, D. P., & Feng, Y. (2019). Portfolio optimization. submitted.