MAFS5310 - Portfolio Optimization with R
MSc in Financial Mathematics
The Hong Kong University of Science and Technology (HKUST)
Fall 2020-21

Outline

  • Introduction

  • Warm-Up: Markowitz Portfolio

    • Signal model
    • Markowitz formulation
    • Drawbacks of Markowitz portfolio
  • Risk Parity Portfolio

    • Problem formulation
    • Solution to the naive diagonal formulation
    • Solution to the vanilla convex formulation
    • Solution to the general nonconvex formulation
  • Conclusions

Introduction

Motivation

Markowitz’s portfolio has never been fully embraced by practitioners, among other reasons because

  1. variance is not a good measure of risk in practice since it penalizes both the unwanted high losses and the desired low losses: the solution is to use alternative measures for risk, e.g., VaR and CVaR,

  2. it is highly sensitive to parameter estimation errors (i.e., to the covariance matrix \(\boldsymbol{\Sigma}\) and especially to the mean vector \(\boldsymbol{\mu}\)): solution is robust optimization and improved parameter estimation,

  3. it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis): solution is the risk parity portfolio.



👉 We will address here the risk diversification among the assets by properly redefining the portfolio formulation.

Warm-Up: Markowitz Portfolio

Signal model

Returns

  • Let us denote the log-returns of \(N\) assets at time \(t\) with the vector \(\mathbf{r}_{t}\in\mathbb{R}^{N}\) (i.e., \(r_{it}=\log{p_{i,t}}-\log{p_{i,t-1}}\)).

  • Note that the log-returns are almost the same as the linear returns \(R_{it}=\frac{p_{i,t}-p_{i,t-1}}{p_{i,t-1}}\), i.e., \(r_{it}\approx R_{it}\).

  • The time index \(t\) can denote any arbitrary period such as days, weeks, months, 5-min intervals, etc.

  • \(\mathcal{F}_{t-1}\) denotes the previous historical data.

  • Econometrics aims at modeling \(\mathbf{r}_{t}\) conditional on \(\mathcal{F}_{t-1}\).

  • \(\mathbf{r}_{t}\) is a multivariate stochastic process with conditional mean and covariance matrix denoted as (Feng and Palomar 2016) \[\begin{aligned} \boldsymbol{\mu}_{t} &\triangleq\textsf{E}\left[\mathbf{r}_{t}\mid\mathcal{F}_{t-1}\right]\\ \boldsymbol{\Sigma}_{t} &\triangleq\textsf{Cov}\left[\mathbf{r}_{t}\mid\mathcal{F}_{t-1}\right]=\textsf{E}\left[(\mathbf{r}_{t}-\boldsymbol{\mu}_{t})(\mathbf{r}_{t}-\boldsymbol{\mu}_{t})^{T}\mid\mathcal{F}_{t-1}\right]. \end{aligned}\]

i.i.d. model

  • For simplicity we will assume that \(\mathbf{r}_{t}\) follows an i.i.d. distribution (which is not very innacurate in general).


  • That is, both the conditional mean and conditional covariance are constant: \[\begin{aligned} \boldsymbol{\mu}_{t} &= \boldsymbol{\mu},\\ \boldsymbol{\Sigma}_{t} &= \boldsymbol{\Sigma}. \end{aligned}\]


  • Very simple model, however, it is one of the most fundamental assumptions for many important works, e.g., the Nobel prize-winning Markowitz portfolio theory (Markowitz 1952).

Parameter estimation

  • Consider the i.i.d. model: \[\mathbf{r}_{t}=\boldsymbol{\mu}+\mathbf{w}_{t},\] where \(\boldsymbol{\mu}\in\mathbb{R}^{N}\) is the mean and \(\mathbf{w}_{t}\in\mathbb{R}^{N}\) is an i.i.d. process with zero mean and constant covariance matrix \(\boldsymbol{\Sigma}\).

  • The mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\) have to be estimated in practice based on \(T\) observations.

  • The simplest estimators are the sample estimators:

    • sample mean: \(\quad\hat{\boldsymbol{\mu}} =\frac{1}{T}\sum_{t=1}^{T}\mathbf{r}_{t}\)
    • sample covariance matrix: \(\quad\hat{\boldsymbol{\Sigma}} =\frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}})(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}})^{T}.\)
  • Many more sophisticated estimators exist, namely: shrinkage estimators, Black-Litterman estimators, etc.

Parameter estimation

  • The parameter estimates \(\hat{\boldsymbol{\mu}}\) and \(\hat{\boldsymbol{\Sigma}}\) are only good for large \(T\), otherwise the estimation error is unacceptable.

  • For instance, the sample mean is particularly a very inefficient estimator, with very noisy estimates (Meucci 2005).

  • In practice, \(T\) cannot be large enough due to either:

    • unavailability of data or
    • lack of stationarity of data.
  • As a consequence, the estimates contain too much estimation error and a portfolio design (e.g., Markowitz mean-variance) based on those estimates can be severely affected (Chopra and Ziemba 1993).

  • Indeed, this is why Markowitz portfolio and other extensions are rarely used by practitioners.

Markowitz formulation

Portfolio return

  • Suppose the capital budget is \(B\) dollars.

  • The portfolio \(\mathbf{w}\in\mathbb{R}^{N}\) denotes the normalized dollar weights of the \(N\) assets such that \(\mathbf{1}^{T}\mathbf{w}=1\) (so \(B\mathbf{w}\) denotes dollars invested in the assets).

  • For each asset \(i\), the initial wealth is \(Bw_{i}\) and the end wealth is \[Bw_{i}\left(p_{i,t}/p_{i,t-1}\right)=Bw_{i}\left(R_{it}+1\right).\]

  • Then the portfolio return is \[R_{t}^{p}= \frac{\sum_{i=1}^{N}Bw_{i}\left(R_{it}+1\right)-B}{B}=\sum_{i=1}^{N}w_{i}R_{it}\approx\sum_{i=1}^{N}w_{i}r_{it}=\mathbf{w}^{T}\mathbf{r}_{t}\]

  • The portfolio expected return and variance are \(\mathbf{w}^{T}\boldsymbol{\mu}\) and \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\), respectively.

Performance measures

  • Expected return: \(\mathbf{w}^{T}\boldsymbol{\mu}\)

  • Volatility: \(\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\)

  • Sharpe Ratio (SR): expected excess return per unit of risk \[\mathsf{SR} =\frac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\] where \(r_{f}\) is the risk-free rate (e.g., interest rate on a three-month U.S. Treasury bill).

  • Information Ratio (IR): SR with respect to a benchmark (e.g., the market index): \(\mathsf{IR} =\frac{\textsf{E}\left[\mathbf{w}^T\mathbf{r}_t - r_{b,t}\right]}{\sqrt{\textsf{Var}\left[\mathbf{w}^T\mathbf{r}_t - r_{b,t}\right]}}\).

  • Drawdown: decline from a historical peak of the cumulative profit \(X(t)\): \[D(T)=\max_{t\in[0,T]}X(t)-X(T)\]

  • VaR (Value at Risk): quantile of the loss.

  • ES (Expected Shortfall) or CVaR (Conditional Value at Risk): expected value of the loss above some quantile.

Practical constraints

  • Capital budget constraint: \[\mathbf{1}^T\mathbf{w} = 1.\]

  • Long-only constraint: \[\mathbf{w} \geq 0.\]

  • Dollar-neutral or self-financing constraint: \[\mathbf{1}^T\mathbf{w} = 0.\]

  • Holding constraint: \[\mathbf{l}\leq\mathbf{w}\leq \mathbf{u}\] where \(\mathbf{l}\in\mathbb{R}^{N}\) and \(\mathbf{u}\in\mathbb{R}^{N}\) are lower and upper bounds of the asset positions, respectively.

Practical constraints

  • Leverage constraint: \[\left\Vert \mathbf{w}\right\Vert _{1}\leq L.\]

  • Cardinality constraint: \[\left\Vert \mathbf{w}\right\Vert _{0} \leq K.\]

  • Turnover constraint: \[\left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1} \leq u\] where \(\mathbf{w}_{0}\) is the currently held portfolio.


  • Market-neutral constraint: \[\boldsymbol{\beta}^T\mathbf{w} = 0.\]

Risk control

  • In finance, the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) is very relevant as it quantifies the average benefit.


  • However, in practice, the average performance is not enough to characterize an investment and one needs to control the probability of going bankrupt.


  • Risk measures control how risky an investment strategy is.


  • The most basic measure of risk is given by the variance (Markowitz 1952): a higher variance means that there are large peaks in the distribution which may cause a big loss.


  • There are more sophisticated risk measures such as downside risk, VaR, ES, etc.

Mean-variance tradeoff

  • The mean return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the variance (risk) \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) (equivalently, the standard deviation or volatility \(\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\)) constitute two important performance measures.


  • Usually, the higher the mean return the higher the variance and vice-versa.


  • Thus, we are faced with two objectives to be optimized: it is a multi-objective optimization problem.


  • They define a fundamental mean-variance tradeoff curve (Pareto curve).


  • The choice of a specific point in this tradeoff curve depends on how agressive or risk-averse the investor is.

Mean-variance tradeoff

Markowitz mean-variance portfolio (1952)

  • The idea of the Markowitz mean-variance portfolio (MVP) (Markowitz 1952) is to find a trade-off between the expected return \(\mathbf{w}^{T}\boldsymbol{\mu}\) and the risk of the portfolio measured by the variance \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\): \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \mathbf{w}^{T}\boldsymbol{\mu}-\lambda\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1 \end{array}\] where \(\mathbf{w}^{T}\mathbf{1}=1\) is the capital budget constraint and \(\lambda\) is a parameter that controls how risk-averse the investor is.
  • This is a convex quadratic problem (QP) with only one linear constraint which admits a closed-form solution: \[\mathbf{w}_{\sf MVP} = \frac{1}{2\lambda}\boldsymbol{\Sigma}^{-1}\left(\boldsymbol{\mu}+\nu\mathbf{1}\right),\] where \(\nu\) is the optimal dual variable \(\nu=\frac{2\lambda-\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\boldsymbol{\mu}}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\).

Global Minimum Variance Portfolio (GMVP)

  • The global minimum variance portfolio (GMVP) ignores the expected return and focuses on the risk only: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

  • It is a simple convex QP with solution \[\mathbf{w}_{\sf GMVP}=\frac{1}{\mathbf{1}^{T}\boldsymbol{\Sigma}^{-1}\mathbf{1}}\boldsymbol{\Sigma}^{-1}\mathbf{1}.\]

  • It is widely used in academic papers for simplicity of evaluation and comparison of different estimators of the covariance matrix \(\boldsymbol{\Sigma}\) (while ignoring the estimation of \(\boldsymbol{\mu}\)).

Drawbacks of Markowitz portfolio

Drawbacks of Markowitz’s formulation

Markowitz’s portfolio has never been fully embraced by practitioners, among other reasons because

  1. variance is not a good measure of risk in practice since it penalizes both the unwanted high losses and the desired low losses: the solution is to use alternative measures for risk, e.g., VaR and CVaR,

  2. it is highly sensitive to parameter estimation errors (i.e., to the covariance matrix \(\boldsymbol{\Sigma}\) and especially to the mean vector \(\boldsymbol{\mu}\)): solution is robust optimization and improved parameter estimation,

  3. it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis): solution is the risk parity portfolio.



👉 We will address here the risk diversification among the assets by properly redefining the portfolio formulation.

Lack of diversification of Markowitz portfolio

Markowitz mean-variance portfolio (MVP) is typically concentrated in very few assets, while GMVP is more diversified (but not totally):

R session: Loading market data

We will load some stock market data and divide it into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation).

In particular, we will start by loading some stock data from three different sectors:

library(xts)  # to manipulate time series of stock data
library(quantmod)  # to download stock data
library(PerformanceAnalytics)  # to compute performance measures

# download data from YahooFinance
stock_namelist <- c("AAPL", "AMD", "ADI",  "ABBV", "AEZS", "A",  "APD", "AA","CF")
prices <- xts()
for (i in 1:length(stock_namelist)) {
  tmp <- Ad(getSymbols(stock_namelist[i], from = "2014-01-01", to = "2016-12-31", auto.assign = FALSE))
  tmp <- na.approx(tmp, na.rm = FALSE)  # interpolate NAs
  prices <- cbind(prices, tmp)
}
colnames(prices) <- stock_namelist
indexClass(prices) <- "Date"
R>> Warning: 'indexClass<-' is deprecated.
R>> Use 'tclass<-' instead.
R>> See help("Deprecated") and help("xts-deprecated").
str(prices)
R>> An 'xts' object on 2014-01-02/2016-12-30 containing:
R>>   Data: num [1:756, 1:9] 17.5 17.1 17.2 17.1 17.2 ...
R>>  - attr(*, "dimnames")=List of 2
R>>   ..$ : NULL
R>>   ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ...
R>>   Indexed by objects of class: [Date] TZ: UTC
R>>   xts Attributes:  
R>>  NULL
head(prices)
R>>                AAPL  AMD      ADI     ABBV AEZS        A      APD       AA       CF
R>> 2014-01-02 17.48146 3.95 41.93015 38.52222  144 37.71119 86.09941 24.48568 33.62581
R>> 2014-01-03 17.09747 4.00 42.21093 38.75936  146 38.18753 85.92943 24.57870 33.39415
R>> 2014-01-06 17.19070 4.13 41.97269 37.34386  149 37.99967 86.13805 24.48568 32.94953
R>> 2014-01-07 17.06776 4.18 42.19391 37.41798  147 38.54311 85.07185 24.50894 33.16105
R>> 2014-01-08 17.17585 4.18 42.29601 37.32164  146 39.17375 85.05639 25.18329 33.60711
R>> 2014-01-09 16.95651 4.09 42.13434 37.95898  119 39.18716 84.72417 24.85774 34.53233
tail(prices)
R>>                AAPL   AMD      ADI     ABBV AEZS        A      APD    AA       CF
R>> 2016-12-22 27.31051 11.60 67.92088 50.60495 4.10 44.34312 131.7363 29.75 26.71214
R>> 2016-12-23 27.36453 11.58 68.28153 51.16303 4.10 44.56498 132.0880 29.71 27.24443
R>> 2016-12-27 27.53832 12.07 68.71616 51.29434 4.05 44.86401 132.8996 29.65 28.34450
R>> 2016-12-28 27.42090 11.55 68.02262 51.10558 3.55 44.10196 130.8616 29.43 28.06061
R>> 2016-12-29 27.41385 11.59 68.04111 51.48311 3.60 44.15227 130.9614 28.89 28.30014
R>> 2016-12-30 27.20013 11.34 67.15337 51.39283 3.60 44.07487 130.4625 28.08 27.92754
# compute log-returns and linear returns
X_log <- diff(log(prices))[-1]
X_lin <- (prices/lag(prices) - 1)[-1]

# or alternatively...
X_log <- CalculateReturns(prices, "log")[-1]
X_lin <- CalculateReturns(prices)[-1]

N <- ncol(X_log)  # number of stocks
T <- nrow(X_log)  # number of days

We can take a look at the prices of the stocks:

plot(prices/rep(prices[1, ], each = nrow(prices)), col = rainbow10equal, legend.loc = "topleft",
     main = "Normalized prices")

We now divide the data into a training set and test set:

T_trn <- round(0.7*T)  # 70% of data
X_log_trn <- X_log[1:T_trn, ]
X_log_tst <- X_log[(T_trn+1):T, ]
X_lin_trn <- X_lin[1:T_trn, ]
X_lin_tst <- X_lin[(T_trn+1):T, ]

We can now use the training set to obtain the sample estimates from the returns \(\mathbf{x}_t\) (i.e., sample means and sample covariance matrix) as \[ \begin{aligned} \hat{\boldsymbol{\mu}} & = \frac{1}{T}\sum_{t=1}^T \mathbf{x}_t\\ \hat{\boldsymbol{\Sigma}} & = \frac{1}{T-1}\sum_{t=1}^T (\mathbf{x}_t - \hat{\boldsymbol{\mu}})(\mathbf{x}_t - \hat{\boldsymbol{\mu}})^T \end{aligned} \]

mu <- colMeans(X_log_trn)
Sigma <- cov(X_log_trn)

R session: Lack of diversification of Markowitz portfolio

The Markowitz mean-variance portfolio (MVP) with no shorting is \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{maximize}} & \boldsymbol{\mu}^T\mathbf{w} -\lambda\mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

For completeness, we can also consider the Global Minimum Variance Portfolio (GMVP), which doesn’t make use of \(\boldsymbol{\mu}\): \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \mathbf{w}^T\mathbf{\Sigma}\mathbf{w}\\ {\textsf{subject to}} & \mathbf{1}^T\mathbf{w} = 1\\ & \mathbf{w}\ge\mathbf{0}. \end{array} \]

We can compute the optimal \(\mathbf{w}\) with a solver (a closed-form solution does not exist for \(\mathbf{w}\ge\mathbf{0}\)):

library(CVXR)  # interface for convex optimization solvers

# define portfolio formulations
portolioMarkowitz <- function(mu, Sigma, lmd = 0.5) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Maximize(t(mu) %*% w - lmd*quad_form(w, Sigma)),
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

portolioGMVP <- function(Sigma) {
  w <- Variable(nrow(Sigma))
  prob <- Problem(Minimize(quad_form(w, Sigma)), 
                  constraints = list(w >= 0, sum(w) == 1))
  result <- solve(prob)
  return(as.vector(result$getValue(w)))
}

# compute portfolios
w_Markowitz <- portolioMarkowitz(mu, Sigma)
w_GMVP <- portolioGMVP(Sigma)

We can now plot the allocation of the portfolios:

# put together all portfolios
w_all <- cbind(w_GMVP, w_Markowitz)
rownames(w_all) <- colnames(X_lin)
colnames(w_all) <- c("GMVP", "Markowitz MVP")

# plot
barplot(t(w_all), col = rainbow8equal[1:2],
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars", beside = TRUE, 
        legend = colnames(w_all))  #args.legend = list(x = "topleft", inset = 0.04)

We can see that the Markowitz MVP concentrates all the budget in one single asset! The GMVP is much more diversified.

Then we can compare the performance (in-sample vs out-of-sample):

# compute returns of all portfolios
ret_all <- xts(X_lin %*% w_all, index(X_lin))
ret_all_trn <- ret_all[1:T_trn, ]
ret_all_tst <- ret_all[-c(1:T_trn), ]

# performance in-sample
t(table.AnnualizedReturns(ret_all_trn))
R>>               Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> GMVP                     0.0447             0.1762                    0.2538
R>> Markowitz MVP            0.1200             0.1968                    0.6100
# performance out-of-sample
t(table.AnnualizedReturns(ret_all_tst))
R>>               Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> GMVP                     0.3285             0.1516                    2.1672
R>> Markowitz MVP            0.2709             0.1445                    1.8752

We can see that the Markowitz MVP performs better than the GMVP in-sample, but much worse out-of-sample! This is due to the bad estimation of \(\mathbf{\mu}\).

Let us plot the cumulative PnL over time:

{ chart.CumReturns(ret_all, main = "Performance of different portfolios", 
                   wealth.index = TRUE, legend.loc = "topleft", colorset = rich8equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

The cum PnL may seem contradictory at first because the Markowitz MVP seems to be doing much better than the GMVP. This is however a visual effect. The drawdown is very instructive:

{ chart.Drawdown(ret_all, main = "Drawdown of different portfolios", 
                 legend.loc = "bottomleft", colorset = rich8equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

We can see that the drawdown of the Markowitz mean-variance portfolio is indeed much worse than that of the GMVP.

R session: Sensitivity to parameters of Markowitz portfolio

To study the sensitivity with respect to the parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\), we will design multiple portfolios based on different samples from the training set:

# compute 8 different set of portfolios from different input data
w_GMVP_acc <- w_Markowitz_acc <- NULL
for (i in 1:8) {
  # sample means with random samples
  idx <- sample(1:T_trn, T_trn/2)
  mu_ <- colMeans(X_log_trn[idx, ])
  Sigma_ <- cov(X_log_trn[idx, ])
  
  # design portfolios
  w_Markowitz_acc <- cbind(w_Markowitz_acc, portolioMarkowitz(mu_, Sigma_))
  w_GMVP_acc <- cbind(w_GMVP_acc, portolioGMVP(Sigma_))
}
rownames(w_GMVP_acc) <- rownames(w_Markowitz_acc) <- colnames(X_lin)

Let’s plot different realizations of the GMVP portfolio:

barplot(t(w_GMVP_acc), col = rainbow8equal,
        main = "Different realizations of GMVP portfolio", 
        xlab = "stocks", ylab = "dollars", beside = TRUE)

The GMVP is not very sensitive since all the realizations have a similar allocation.

Let’s plot now the different realizations of the Markowitz mean-variance portfolio:

barplot(t(w_Markowitz_acc), col = rainbow8equal,
        main = "Different realizations of Markowitz mean-variance portfolio", 
        xlab = "stocks", ylab = "dollars", beside = TRUE)

The mean-variance portfolio is highly sensitive. So sensitive that the portfolio at each realization is totally different!

Of course the reason for this very distinct behavior is that the sensitivity w.r.t. \(\boldsymbol{\mu}\) is much greater than w.r.t. \(\boldsymbol{\Sigma}\) (also, the estimation error in the former is larger than in the latter) (Chopra and Ziemba 1993).

Risk Parity Portfolio

Motivation

  • The Markowitz mean-variance portfolio has never been fully embraced by practitioners, among other reasons (Zhao et al. 2019) because
    • it only considers the risk of the portfolio as a whole and ignores the risk diversification (i.e., concentrates risk too much in few assets, this was observed in the 2008 financial crisis)
    • it is highly sensitive to the estimation errors in the parameters (i.e., small estimation errors in the parameters may change completely the designed portfolio) (Chopra and Ziemba 1993)


  • Although portfolio management did not change much during the 40 years after the seminal works of Markowitz and Sharpe, the development of risk budgeting techniques marked an important milestone in deepening the relationship between risk and asset management.

Motivation

  • Since the global financial crisis in 2008, risk management has particularly become more important than performance management in portfolio optimization
    • risk parity became a popular financial model after the global financial crisis in 2008 (Asness et al. 2012; Qian 2005).


  • The alternative risk parity portfolio design has been receiving significant attention from both the theoretical and practical sides because it
    • diversifies the risk, instead of the capital, among the assets
    • is less sensitive to parameter estimation errors.


  • Today, pension funds and institutional investors are using this approach in the development of smart indexing and the redefinition of long-term investment policies.

From “dollar” to risk diversification

  • Risk parity is an approach to portfolio management that focuses on allocation of risk rather than allocation of capital.


  • The risk parity approach asserts that when asset allocations are adjusted to the same risk level, the portfolio can achieve a higher Sharpe ratio and can be more resistant to market downturns.


  • While the minimum variance portfolio tries to minimize the variance (with the disadvantage that a few assets may be the ones contributing most to the risk), the risk parity portfolio tries to constrain each asset (or asset class, such as bonds, stocks, real estate, etc.) to contribute equally to the portfolio overall volatility.

From “dollar” to risk diversification

  • The term “risk parity” was coined by Edward Qian from PanAgora Asset Management (Qian 2005) and was then adopted by the asset management industry.


  • Some of its theoretical components were developed in the 1950s and 1960s but the first risk parity fund, called the “All Weather” fund, was pioneered by Bridgewater Associates LP in 1996.


  • Interest in the risk parity approach has increased since the late 2000s financial crisis as the risk parity approach fared better than traditionally constructed portfolios.


  • Some portfolio managers have expressed skepticism about the practical application of the concept and its effectiveness in all types of market conditions but others point to its performance during the financial crisis of 2007-2008 as an indication of its potential success.

From “dollar” to risk diversification

Problem formulation

Risk contribution

  • One of the important concepts in portfolio management is quantifying the risk of individual components to the total portfolio risk

  • Given a portfolio \(\mathbf{w}\in\mathbb{R}^{N}\) and the return covariance matrix \(\boldsymbol{\Sigma}\), the portfolio volatility is \[\sigma(\mathbf{w})=\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}.\]

  • Following Euler’s theorem, the volatility can be decomposed as \[\sigma\left(\mathbf{w}\right)=\sum_{i=1}^{N}w_i\frac{\partial\sigma}{\partial w_i} = \sum_{i=1}^N\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\]

  • The marginal risk contribution (MRC) of the \(i\)th asset to the total risk \(\sigma(\mathbf{w})\) is defined as \[{\sf MRC}_i = \frac{\partial \sigma}{\partial w_i} = \frac{\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\]

    • measures the sensitivity of the portfolio volatility to the \(i\)th asset weight
    • MRC can be defined based on other risk measures, like VaR and CVaR.

Risk contribution

  • The risk contribution (RC) from the \(i\)th asset to the total risk \(\sigma(\mathbf{w})\) is defined as \[{\sf RC}_i = w_i\frac{\partial\sigma}{\partial w_i}=\frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\]

  • Observe that (from Euler’s theorem) \[\sum_{i=1}^N {\sf RC}_i = \sigma(\mathbf{w}).\]

  • The relative risk contribution (RRC) is defined as the ratio of its RC to the total portfolio risk \(\sigma(\mathbf{w})\): \[{\sf RRC}_i = \frac{{\sf RC}_i}{\sigma(\mathbf{w})} = \frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\] so that \(\sum_{i=1}^N {\sf RRC}_i = 1\).

Risk parity portfolio (RPP)

  • Goal: to allocate the weights so that all the assets contribute the same amount of risk, effectively “equalizing” the risk.

  • The risk parity portfolio (RPP) or equal risk portfolio (ERP) equalizes the risk contributions: \[{\sf RC}_i = \sigma(\mathbf{w})/N\] or \[{\sf RRC}_i = 1/N.\]

  • Note the parallel with the equal weight portfolio (EWP) (aka uniform portfolio): \[w_i = 1/N.\]

  • While the EWP equalizes the capital allocation \(w_i = 1/N\), the RPP equalizes the risk allocation \({\sf RRC}_i = 1/N\).

Risk contribution of EWP

Risk contribution of RPP

Risk budgeting portfolio (RBP)

  • The RPP aims at allocating the total risk evenly across the assets.

  • More generally, the risk budgeting portfolio (RBP) allocates the risk according to the risk profile determined by the weights \(\mathbf{b}\) (with \(\mathbf{1}^T\mathbf{b}=1\) and \(\mathbf{b}\ge \mathbf{0}\)): \[{\sf RC}_i = b_i \sigma(\mathbf{w})\] or \[{\sf RRC}_i = b_i.\]

  • We can rewrite \({\sf RRC}_i = \frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}} = b_i\) simply as \[w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}, \qquad i=1,\ldots,N.\]

  • Obviously, RPP is a special case of RBP with \(b_i=1/N\).

  • We will consider the more general RBP and we will generally call it RPP with some abuse of terminology

  • In general, finding a risk parity portfolio is not trivial.

Risk contribution of RBP

Risk budgeting portfolio with budget \(\mathbf{b} \propto (2,2,2,1,1,1,1,1,1)\):

Solution to the naive diagonal formulation

RPP: The diagonal case

  • Suppose that the covariance matrix of the returns is diagonal, \(\boldsymbol{\Sigma}={\sf Diag(\boldsymbol{\sigma}^2)}\), and that the portfolio has the constraints \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w}\ge\mathbf{0}\).

  • We can then write the risk parity/budgeting constraints \(w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) as \[w_i^2 \sigma_i^2 = b_i \sum_{j=1}^N w_j^2 \sigma_j^2\] or simply \[w_i^2 \sigma_i^2 \propto b_i\] which leads to \[w_i \propto \sqrt{b_i}/\sigma_i.\]

  • Observe that the portfolio is inversely proportional to the assets volatilities.

RPP: The diagonal case

  • The RPP in the diagonal case is then \[w_i = \frac{\sqrt{b_i}/\sigma_i}{\sum_{j=1}^N \sqrt{b_j}/\sigma_j}, \qquad i=1,\ldots,N.\] or, in terms of \(\boldsymbol{\Sigma}\), \[ w_i = \frac{\sqrt{b_i}/\sqrt{\Sigma_{ii}}}{\sum_{j=1}^N\sqrt{b_j}/\sqrt{\Sigma_{jj}}}, \qquad i=1,\ldots,N. \]

  • However, for non-diagonal \(\boldsymbol{\Sigma}\) or with other additional constraints, a closed-form solution does not exist in general and some optimization procedures have to be constructed.

  • The previous diagonal solution can be used even when \(\boldsymbol{\Sigma}\) is not diagonal and is then called naive risk budgeting portfolio.

Risk contribution of naive RPP

The risk contribution of the naive RPP is not perfectly equalized (as expected):

Inverse volatility portfolio

  • Similar to RPP, the aim of inverse volatility portfolio (IVP) is to control the portfolio risk.

  • The IVP is defined as \[\mathbf{w} = \frac{\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\boldsymbol{\sigma}^{-1}}\] where \(\boldsymbol{\sigma}^2 = {\sf Diag(\boldsymbol{\Sigma})}\).

  • Lower weights are given to high volatility assets and higher weights to low volatility assets

  • IVP is also called “equal volatility” portfolio since the weighted constituent assets have equal volatility: \[{\sf sd}(w_ir_i) = w_i\sigma_i = 1/N.\]

  • Observe that the IVP coincides with the naive risk parity portfolio.

R session: Solving the naive diagonal case

For the diagonal case, \(\boldsymbol{\Sigma}={\sf Diag(\boldsymbol{\sigma}^2)}\), we can simply use the closed form solution for the risk parity portfolio (RPP) \[\mathbf{w} = \frac{\boldsymbol{\sigma}^{-1}}{\mathbf{1}^T\boldsymbol{\sigma}^{-1}}.\]

Even for the nondiagonal case, we can still use the diagonal solution (termed then naive RPP) using just the diagonal elements \(\boldsymbol{\sigma}^2 = {\sf Diag(\boldsymbol{\Sigma})}\).

For comparison purposed we will also consider the equally weighted portfolio (EWP) (aka uniform portfolio): \[\mathbf{w} = \frac{1}{N}\mathbf{1}.\]

# compute EWP
w_EWP <- rep(1/N, N)

# compute naive RPP
sigma2 <- diag(Sigma)
w_RPP_naive <- 1/sqrt(sigma2)
w_RPP_naive <- w_RPP_naive/sum(w_RPP_naive)

# add portfolios to the two previous ones
w_all <- cbind(w_all, 
               "EWP"         = w_EWP,
               "RPP (naive)" = w_RPP_naive)  

# plot
barplot(t(w_all), col = rainbow8equal[1:4],
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars", beside = TRUE, 
        legend = colnames(w_all))

Obviously the EWP has a uniform dollar allocation, but is the risk allocation also uniform?

Let’s plot the risk contribution:

# compute risk contributions
risk_all <- cbind("GMVP"          = as.vector(w_GMVP * (Sigma %*% w_GMVP)),
                  "Markowitz MVP" = as.vector(w_Markowitz * (Sigma %*% w_Markowitz)),
                  "EWP"           = as.vector(w_EWP * (Sigma %*% w_EWP)),
                  "RPP (naive)"    = as.vector(w_RPP_naive * (Sigma %*% w_RPP_naive)))
rownames(risk_all) <- colnames(X_lin)
RRC_all <- sweep(risk_all, MARGIN = 2, STATS = colSums(risk_all), FUN = "/")  # normalize each column

# plot
barplot(t(RRC_all), col = rainbow8equal[1:4],
        main = "Relative risk contribution", xlab = "stocks", ylab = "risk", beside = TRUE, 
        legend = colnames(RRC_all))

Observe how the RPP has a more even risk contribution distribution than the other portfolios. However, it is not perfect because the design is the naive one assuming a diagonal covariance matrix.

R session: PnL comparison of GMVP, Markowitz MVP, and RPP

It is instructive now to compare the performance (in-sample vs out-of-sample):

# compute returns of all portfolios
ret_all <- xts(X_lin %*% w_all[, c("GMVP", "Markowitz MVP", "EWP", "RPP (naive)")], 
               order.by = index(X_lin))
ret_all_trn <- ret_all[1:T_trn, ]
ret_all_tst <- ret_all[-c(1:T_trn), ]

# performance in-sample
t(table.AnnualizedReturns(ret_all_trn))
R>>               Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> GMVP                     0.0447             0.1762                    0.2538
R>> Markowitz MVP            0.1200             0.1968                    0.6100
R>> EWP                     -0.1172             0.2574                   -0.4553
R>> RPP (naive)             -0.0180             0.1884                   -0.0955
# performance out-of-sample
t(table.AnnualizedReturns(ret_all_tst))
R>>               Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> GMVP                     0.3285             0.1516                    2.1672
R>> Markowitz MVP            0.2709             0.1445                    1.8752
R>> EWP                      0.7153             0.2246                    3.1841
R>> RPP (naive)              0.5699             0.1874                    3.0412

We can see that the in-sample the Markowitz MVP has the best Sharpe ratio and the RPP is quite bad. However, when it come to out-of-sample performance, the EWP and RPP are much better.

Let us plot the cumulative PnL over time (recall that it looks deceptive to the untrained eye):

{ chart.CumReturns(ret_all, main = "Cum PnL of different portfolios", 
                   wealth.index = TRUE, legend.loc = "topleft", colorset = rainbow8equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

The drawdown, on the other hand, clearly shows the superior performance of the RPP:

{ chart.Drawdown(ret_all, main = "Drawdown of different portfolios", 
                 legend.loc = "bottomleft", colorset = rainbow6equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

We can see that the drawdown of the Markowitz MVP and EWP are the worst, while the GMVP and RPP are the safest.

Solution to the vanilla convex formulation

RPP: Unveiling the hidden convexity

  • Consider the risk budgeting equations for an arbitrary covariance matrix \(\boldsymbol{\Sigma}\): \[w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}, \qquad i=1,\ldots,N\] with \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\).

  • If we define \(\mathbf{x}=\mathbf{w}/\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\), then we can rewrite the risk budgeting equations as \(x_i\left(\boldsymbol{\Sigma}\mathbf{x}\right)_i = b_i\) or, more compactly in vector form, as \[\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\] with \(\mathbf{x} \ge \mathbf{0}\) and we can always recover the portfolio by normalizing: \(\mathbf{w} = \mathbf{x}/(\mathbf{1}^T\mathbf{x})\).

  • At this point, we can use a nonlinear multivariate root finder for \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\). For example, in R we can use the package rootSolve.

Risk contribution of vanilla RPP

The risk contribution of the vanilla RPP is perfectly equalized (unlike the naive diagonal design):

R session: Solving the risk budgeting equations as a system of nonlinear equations

We can directly solve the risk budgeting equations \[\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\] by interpreting them as a system of nonlinear equations.

Define the nonlinear function \[F(\mathbf{x}) = \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x}.\] We can see that the solution to the risk budgeting equations correspond to the roots of the function \(F(\mathbf{x})\): \[F(\mathbf{x}) = \mathbf{0}.\] We can use the root finder function multiroot() from the package rootSolve:

library(rootSolve)

b <- rep(1/N, N)

# function definition F(x) = Sigma %*% x - b/x
f_root <- function(x, parms) {
  Sigma <- parms
  N <- nrow(Sigma)
  return(Sigma %*% x - b/x)
}

# finding the root
x_root <- multiroot(f_root, start = b, parms = Sigma)$root
w_root <- x_root/sum(x_root)

# sanity check
Sigma %*% x_root - b/x_root
R>>               [,1]
R>> AAPL -6.494678e-06
R>> AMD  -3.048127e-07
R>> ADI  -2.389429e-07
R>> ABBV -7.691218e-07
R>> AEZS -1.395804e-09
R>> A    -1.854531e-06
R>> APD  -9.681762e-06
R>> AA   -5.856150e-07
R>> CF   -1.175333e-07

Let’s plot the risk contribution and see:

w_all <- cbind(w_all, 
               "RPP (root)" = w_root)
# compute risk contributions
risk_all <- cbind(risk_all, 
                  "RPP (root)" = as.vector(w_root * (Sigma %*% w_root)))
RRC_all <- sweep(risk_all, MARGIN = 2, STATS = colSums(risk_all), FUN = "/")  # normalize each column

# plot
barplot(t(RRC_all), col = rainbow8equal[1:5],
        main = "Relative risk contribution", xlab = "stocks", ylab = "risk", beside = TRUE, 
        legend = colnames(RRC_all))

We can see that the RPP based on finding the root gives a perfectly equalized risk contribution (better than the naive solution, of course).

RPP: Unveiling the hidden convexity

  • Interestingly, Spinu (2013) realized that precisely the risk budgeting equation \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\) corresponds to the gradient of the convex function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\) set to zero: \[\nabla f(\mathbf{x}) = \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x} = \mathbf{0}.\]

  • This is precisely the optimality condition for the minimization of \(f(\mathbf{x})\).

  • Thus, we can finally formulate the risk budgeting problem as the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\] which has optimality condition \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\).

RPP: Unveiling the hidden convexity

  • Griveau-Billion et al. (2013) proposed a slightly different formulation (also convex): \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}} - \mathbf{b}^T\log(\mathbf{x})\] with optimality condition \(\frac{\boldsymbol{\Sigma}\mathbf{x}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}} = \mathbf{b}/\mathbf{x}\) or \(\frac{\boldsymbol{\Sigma}\mathbf{x}}{\sigma} = \mathbf{b}/\mathbf{x}\).

  • It looks like the optimal solution is not what we want, but after a careful inspection we can conclude that it is just a different normalization factor from \(\mathbf{w}\).

  • Simply define \(\tilde{\mathbf{x}}=\mathbf{x}/\sigma^{1/2}=\mathbf{w}/\sigma^{3/2}\) to obtain the optimality condition \[\boldsymbol{\Sigma}\tilde{\mathbf{x}} = \mathbf{b}/\tilde{\mathbf{x}}\] from which we can recover the portfolio by normalizing: \(\mathbf{w} = \tilde{\mathbf{x}}/(\mathbf{1}^T\tilde{\mathbf{x}})\).

RPP: Unveiling the hidden convexity

  • Kaya and Lee (2012) proposed yet another reformulation in convex form as the solution to \[ \begin{array}{ll} \underset{\mathbf{x}\ge\mathbf{0}}{\textsf{maximize}} & \mathbf{b}^T\log(\mathbf{x})\\ {\textsf{subject to}} & \sigma(\mathbf{x}) \le \sigma_0.\\ \end{array} \]

  • Ignoring the nonnegativity constraint, the Lagrangian of this constrained convex optimization problem is \[L(\mathbf{x};\lambda) = \mathbf{b}^T\log(\mathbf{x}) + \lambda (\sigma_0 - \sqrt{\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x}})\] with gradient \[ \nabla_\mathbf{x} L(\mathbf{x};\lambda) = \mathbf{b}/\mathbf{x} - \lambda \frac{\boldsymbol{\Sigma}\mathbf{x}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}} \]

  • Defining \(\tilde{\mathbf{x}}=(\lambda^{1/2}/\sigma^{1/2})\mathbf{x}\), we can rewrite \(\nabla_\mathbf{x} L(\mathbf{x};\lambda) = \mathbf{0}\) as \[\mathbf{b}/\tilde{\mathbf{x}} = \boldsymbol{\Sigma}\tilde{\mathbf{x}}\] which is the desired risk parity/budgeting condition.

Solving the RPP problem

  • A direct way is to attempt to directly solve the nonlinear equations \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\) with a nonlinear multivariate root finder:

    • in R we can use the function multiroot from the package rootSolve
    • in Matlab we can use the function fsolve.
  • An indirect way is to solve some of the previous convex formulations: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\]

  • Unfortunately, these convex problems do not conform with the classes most solvers embrace (i.e., LP, QP, QCQP, SOCP, SDP, GP, etc.).

  • We can still solve them with a general-purpose solver:

    • in R we can use the function optim
    • in Matlab we can use the function fmincon
  • But if we really aim for speed and computational efficiency, there are simple iterative algorithms that can be tailored to the problem at hand, like the cyclical coordinate descent algorithm and the Newton algorithm.

RPP: Newton method

  • Gradient and Newton methods are the most fundamental numerical methods for optimization (Boyd and Vandenberghe 2004).

  • The gradient method obtains the iterates based on the gradient \(\nabla f\) of the objective function \(f(\mathbf{x})\) as \[\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - \mu\nabla f(\mathbf{x}^{(k)})\] but has a slow convergence.

  • The Newton method also incorporates the Hessian \({\sf H}\): \[\mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - {\sf H}^{-1}(\mathbf{x}^{(k)})\nabla f(\mathbf{x}^{(k)})\] obtaining much faster convergence.

  • In practice, one may use the backtracking method to properly adjust the step size of each iteration.

  • For our function \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\), the gradient and Hessian are given by \[\begin{aligned} \nabla f(\mathbf{x}) &= \boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}/\mathbf{x}\\ {\sf H}(\mathbf{x}) &= \boldsymbol{\Sigma} + {\sf Diag}(\mathbf{b}/\mathbf{x}^2). \end{aligned}\]

Block coordinate descent (BCD)

The BCD method (aka Gauss-Seidel method) minimizes the function \(f(\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_N)\) with respect to each block of variables one by one in a sequential manner (Bertsekas 1999).



Algorithm 1: BCD

Set \(k=0\) and initialize \(\mathbf{x}^{(0)}\)
repeat

  • Solve sequentially for \(i=1,\ldots,N\): \[ \mathbf{x}_i^{(k+1)} = \arg \min_{\mathbf{x}_i} f\left(\mathbf{x}_1^{(k+1)},\ldots, \mathbf{x}_{i-1}^{(k+1)},\mathbf{x}_i,\mathbf{x}_{i+1}^{(k)},\ldots,\mathbf{x}_{N}^{(k)}\right) \]

  • \(k \gets k+1\)

until convergence
return \(\mathbf{x}^{(k)}\)

Convergence of BCD



Proposition 1:

If \(f(\mathbf{x})\) is continuously differentiable and each minimization has a unique solution, then every limit point of the algorithm is a stationary point (optimal point for a convex problem).

RPP: Cyclical coordinate descent algorithm

  • The cyclical coordinate descent algorithm is a particular case of the BCD method where \(f(\mathbf{x})\) is minimized in a cyclical manner with respect to each element of the variable \(\mathbf{x} = (x_1,x_2,\ldots,x_N)\).

  • The minimization of \(f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x})\) with respect to \(x_i\) is (denote \(\mathbf{x}_{-i}=(x_1,\cdots,x_{i-1},0,x_{i+1},\cdots,x_N)\)) \[\underset{x_i\ge 0}{\textsf{minimize}} \quad \frac{1}{2}x_i^2\sigma_i^2 + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{:,i}) - b_i\log{x_i}\] with gradient \(\nabla_i f = x_i\sigma_i^2 + (\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{:,i}) - b_i/x_i.\)

  • Setting the gradient to zero gives us the second order equation \[x_i^2\sigma_i^2 + x_i(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{:,i}) - b_i = 0\] with positive solution given by \[x_i^\star = \frac{-(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{:,i})+\sqrt{(\mathbf{x}_{-i}^T\boldsymbol{\Sigma}_{:,i})^2+4\sigma_i^2b_i}}{2\sigma_i^2}.\]

R session: Solving the vanilla convex case

In order to solve the nonlinear equations \(\boldsymbol{\Sigma}\mathbf{x} = \mathbf{b}/\mathbf{x}\), we will minimize the following convex optimization problem: \[\underset{\mathbf{x}\ge\mathbf{0}}{\textsf{minimize}} \quad \frac{1}{2}\mathbf{x}^{T}\boldsymbol{\Sigma}\mathbf{x} - \mathbf{b}^T\log(\mathbf{x}).\] We will solve this problem with the general-purpose nonlinear solver optim() in R:

# initial point
x0 <- rep(1/N, N)

# function definition
fn_convex <- function(x, Sigma) {
  N <- nrow(Sigma)
  return(0.5 * t(x) %*% Sigma %*% x - (1/N)*sum(log(x)))
}

# optimize with general-purpose solver
result <- optim(par = x0, fn = fn_convex, Sigma = Sigma, method = "BFGS")
x_convex <- result$par
w_RPP_convex <- x_convex/sum(x_convex)

# sanity check of the solution
b <- rep(1/N, N)
Sigma %*% x_convex - b/x_convex
R>>               [,1]
R>> AAPL -1.921622e-05
R>> AMD   7.124787e-05
R>> ADI  -1.084764e-05
R>> ABBV -1.788146e-05
R>> AEZS -3.709676e-05
R>> A     5.178955e-07
R>> APD  -4.696979e-05
R>> AA    3.113723e-05
R>> CF   -2.474985e-05
# plot
w_all <- cbind(w_all, "RPP (convex)" = w_RPP_convex)
barplot(t(w_all), col = rainbow8equal[1:6],
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars", beside = TRUE, 
        legend = colnames(w_all))

Let’s plot the risk contribution:

# compute risk contributions
risk_all <- cbind(risk_all, 
                  "RPP (convex)" = as.vector(w_RPP_convex * (Sigma %*% w_RPP_convex)))
RRC_all <- sweep(risk_all, MARGIN = 2, STATS = colSums(risk_all), FUN = "/")  # normalize each column

# plot
barplot(t(RRC_all), col = rainbow8equal[1:6],
        main = "Relative risk contribution", xlab = "stocks", ylab = "risk", beside = TRUE, 
        legend = colnames(RRC_all))

Now we can clearly see that the RPP based on the convex formulation gives a perfectly equalized risk contribution!

Solution to the general nonconvex formulation

RPP: General formulation

  • The previous methods are based on a convex reformulation of the problem so they are guaranteed to converge to the optimal risk budgeting solution.
  • However, they can only be employed for the simplest risk budgeting formulation with a simplex constraint set (i.e., \(\mathbf{1}^T\mathbf{w}=1\) and \(\mathbf{w} \ge \mathbf{0}\)).
  • They cannot be used if
    • we have other constraints like allowing shortselling or box constraints: \(l_i \le w_i \le u_i\)
    • on top of the risk budgeting constraints \(w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i = b_i \;\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\) we have other objectives like maximizing the expected return \(\mathbf{w}^T\boldsymbol{\mu}\) or minimizing the overall variance \(\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\).
  • In those more general cases, we need more sophisticated formulations, which unfortunately are not convex.
  • In the R programming language there is a package called riskParityPortfolio that can solve very efficiently all the formulations.
  • We will overview the different general formulations and the solution methods.

RPP formulations

  • The idea is to try to achieve equal risk contributions \({\sf RC}_i = \frac{w_i\left(\boldsymbol{\Sigma}\mathbf{w}\right)_i}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\) by penalizing the differences between the terms \(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}\).

  • Maillard et al. (2010) aimed at solving: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}\right)^{2}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

  • This is a simplified formulation with a single-index summation (objective only has \(N\) terms instead of \(N^2\)): \[\begin{array}{ll} \underset{\mathbf{w},\theta}{\textsf{minimize}} & \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i} - \theta\right)^{2}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

R session: Solving the nonconvex case with a general-purpose nonlinear solver

Consider now the nonconvex problem formulation \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}\right)^{2}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1,\quad\mathbf{w}\ge\mathbf{0}. \end{array}\]

We will solve it with the general-purpose nonlinear solver optim() in R (but this is totally ignoring the constraints, so a better solver should be used):

# initial point
x0 <- rep(1/N, N)

# function definition
fn_nonconvex <- function(w, Sigma) {
  N <- length(w)
  risks <-  w * (Sigma %*% w)
  g <- rep(risks, times = N) - rep(risks, each = N)
  return(sum(g^2))
}

# optimize with general-purpose solver
result <- optim(par = x0, fn = fn_nonconvex, Sigma = Sigma, method = "BFGS")
x_gen_solver <- result$par
w_RPP_gen_solver <- x_gen_solver/sum(x_gen_solver)

# plot
w_all <- cbind(w_all, "RPP (gen-solver)" = w_RPP_gen_solver)
barplot(t(w_all), col = rainbow8equal[1:7],
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars", beside = TRUE, 
        legend = colnames(w_all))

Let’s plot the risk contribution:

# compute risk contributions
risk_all <- cbind(risk_all, 
                  "RPP (gen-solver)" = as.vector(w_RPP_gen_solver * (Sigma %*% w_RPP_gen_solver)))
RRC_all <- sweep(risk_all, MARGIN = 2, STATS = colSums(risk_all), FUN = "/")  # normalize each column

# plot
barplot(t(RRC_all), col = rainbow8equal[1:7],
        main = "Relative risk contribution", xlab = "stocks", ylab = "risk", beside = TRUE, 
        legend = colnames(RRC_all))

We can observe that the solution based on the general solver for the nonconvex formulation is not as perfectly equalized as that from the convex formulation. The reason is that when solving a nonconvex problem one does not have any guarantee of global optimality. In this particular case, we know that perfect risk contribution equalization can actually be achieved (as it is by the solution from the convex formulation) but the nonconvex formulation cannot achieve it.

RBP formulations

  • This formulation is again based on the double-index summation with budgets: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{b_i} - \frac{w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}}{b_j}\right)^{2}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

  • This one on a single-index summation: \[\begin{array}{ll} \underset{\mathbf{w},\theta}{\textsf{minimize}} & \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{b_i} - \theta \right)^{2}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

RBP formulations

  • Bruder and Roncalli (2012) proposed a formulation based on the RRC: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}-b_{i}\right)^{2}\\ \textsf{subject to} & \mathbf{w}^{T}\mathbf{1}=1. \end{array}\]

  • This one is instead based on the RC: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{{\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}}} - b_i\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}\right)^{2}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

  • This one is also similar: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i} - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\right)^{2}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1. \end{array}\]

RPP: References

  • Two standard textbooks (Qian 2016; Roncalli 2013):

    📘 T. Roncalli, Introduction to Risk Parity and Budgeting. CRC Press, 2013.

    📘 E. Qian, Risk Parity Fundamentals. CRC Press, 2016.

  • A unified general formulation and advanced algorithms can be found in (Feng and Palomar 2015, 2016):

    📄 Y. Feng and D. P. Palomar, “SCRIP: Successive convex optimization methods for risk parity portfolios design,” IEEE Trans. Signal Process., vol. 63, no. 19, pp. 5285–5300, 2015.

    📘 Y. Feng and D. P. Palomar. A Signal Processing Perspective on Financial Engineering. Foundations and Trends in Signal Processing, Now Publishers, 2016.

    • A software implementation of the algorithms is available in the R package riskParityPortfolio.

R session: Solving the RPP with R packages

There are very few R packages for risk parity portfolio, namely, riskParityPortfolio, cccp, and FinCovRegularization.

Let’s illustrate the use of the package riskParityPortfolio (it is much faster and reliable than the others and it allows for combinations of the risk parity concentration term with expected return maximization and other variations):

library(riskParityPortfolio)
#?riskParityPortfolio  # to get help for the function

# use package
rpp <- riskParityPortfolio(Sigma)
names(rpp)
R>> [1] "w"                          "relative_risk_contribution" "is_feasible"
rpp$w
R>>       AAPL        AMD        ADI       ABBV       AEZS          A        APD         AA         CF 
R>> 0.14569676 0.07792281 0.12163699 0.12503892 0.04230565 0.13103465 0.14869527 0.09034471 0.11732425
rpp$relative_risk_contribution
R>>      AAPL       AMD       ADI      ABBV      AEZS         A       APD        AA        CF 
R>> 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111
c(rpp$w * (Sigma %*% rpp$w)) / as.numeric(t(rpp$w) %*% Sigma %*% rpp$w)
R>> [1] 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111 0.1111111
# plot
w_all <- cbind(w_all, "RPP (package)" = rpp$w)
barplot(t(w_all), col = rainbow8equal[1:8],
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars", beside = TRUE, 
        legend = colnames(w_all))

Let’s plot the risk contribution:

# compute risk contributions
risk_all <- cbind(risk_all, 
                  "RPP (package)" = as.vector(rpp$w * (Sigma %*% rpp$w)))
RRC_all <- sweep(risk_all, MARGIN = 2, STATS = colSums(risk_all), FUN = "/")  # normalize each column

# plot
barplot(t(RRC_all), col = rainbow8equal[1:8],
        main = "Relative risk contribution", xlab = "stocks", ylab = "risk", beside = TRUE, 
        legend = colnames(RRC_all))

Unified RPP problem formulation

A more general risk parity formulation is (Feng and Palomar 2015) \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i=1}^{N}g_{i}\left(\mathbf{w}\right)^2+\lambda F\left(\mathbf{w}\right)\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array}\] where

  • \(\sum_{i=1}^{N}g_{i}\left(\mathbf{w}\right)^2\): risk concentration measurement, e.g., \[g_{i}\left(\mathbf{w}\right)\triangleq\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}-\frac{1}{N},\]
  • \(F\left(\mathbf{w}\right)\): preference, e.g., \(0\), \(-\boldsymbol{\mu}^{T}\mathbf{w}\), \(-\boldsymbol{\mu}^{T}\mathbf{w}+\nu\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\),

  • \(\lambda\geq0\): trade-off parameter,

  • \(\mathbf{w}\in\mathcal{W}\): capital budget (\(\mathbf{1}^T\mathbf{w}=1\)) & other convex constraints.

Challenge: the problem is highly nonconvex due to the term \(\sum_{i=1}^{N}g_{i}\left(\mathbf{w}\right)^2\).

Risk concentration term

  • The previous general formulation contains the risk concentration term \(R(\mathbf{w}) = \sum_{i=1}^{N}g_{i}\left(\mathbf{w}\right)^{2}\), which can be written in a compact way to represent the many formulations presented before.

  • Define \(\mathbf{M}_i\in\mathbb{R}^{N\times N}\) as a sparse matrix with its \(i\)th row equal to that of the covariance matrix \(\boldsymbol{\Sigma}\).

  • Examples:

    • \(R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}\right)^{2}\) corresponds to \[g_{i,j}(\mathbf{w})=\mathbf{w}^T(\mathbf{M}_i-\mathbf{M}_j)\mathbf{w}\]

    • \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i} - \theta\right)^{2}\) corresponds to \[g_i(\mathbf{w})=\mathbf{w}^T\mathbf{M}_i\mathbf{w}-\theta\]

    • \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}-b_{i}\right)^{2}\) corresponds to \[g_i(\mathbf{w})=\frac{\mathbf{w}^T\mathbf{M}_i\mathbf{w}}{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}-b_i.\]

Risk concentration term

  • More examples:

    • \(R(\mathbf{w}) = \sum_{i,j=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{b_i} - \frac{w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}}{b_j}\right)^{2}\) corresponds to \[g_{i,j}(\mathbf{w})=\mathbf{w}^T(\mathbf{M}_i/b_i-\mathbf{M}_j/b_j)\mathbf{w}\]

    • \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i} - b_i\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}\right)^{2}\) corresponds to \[g_i(\mathbf{w})=\mathbf{w}^T(\mathbf{M}_i-b_i\boldsymbol{\Sigma})\mathbf{w}\]

    • \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{{\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}}} - b_i\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}\right)^{2}\) corresponds to \[g_i(\mathbf{w})=\frac{\mathbf{w}^T\mathbf{M}_i\mathbf{w}}{\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}}-b_i\sqrt{\mathbf{w}^T\boldsymbol{\Sigma}\mathbf{w}}\]

    • \(R(\mathbf{w}) = \sum_{i=1}^{N}\left(\frac{w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}}{b_i} - \theta \right)^{2}\) corresponds to \[g_i(\mathbf{w})=\mathbf{w}^T\mathbf{M}_i\mathbf{w}/b_i-\theta.\]

Solving the unified nonconvex RPP problem

  • Recall the unified nonconvex RPP formulation: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i=1}^{N}g_{i}\left(\mathbf{w}\right)^2+\lambda F\left(\mathbf{w}\right)\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W}. \end{array}\]

  • We can solve this with some general-purpose multivariate optimization solver:

    • in R we can use the function optim
    • in Matlab we can use the function fmincon


  • However, for our RPP problem, such off-the-shelf nonlinear numerical optimization methods can be slow and may get stuck at some unsatisfactory points.

  • This is because the structure of the objective is not exploited.

  • We can develop some tailored numerical algorithm with much faster convergence speed and computational efficiency; in particular, we will use the framework of successive convex approximation (SCA).

Slow convergence of general-purpose solvers

Successive Convex Approximation (SCA)

  • Consider our difficult nonconvex problem: \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & U(\mathbf{w})\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W}. \end{array}\]

  • Basic idea of SCA: solve a difficult problem via solving a sequence of simpler problems.

  • Minimize \(U(\mathbf{w})\) over \(\mathbf{w}\in\mathcal{W}\) via SCA (Scutari et al. 2014):

    • Approximation: find \(\tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right)\) that approximates the function \(U\left(\mathbf{w}\right)\) at the point \(\mathbf{w}^{k}\) with
      • \(\tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right)\): uniformly strongly convex & cont. differentiable
      • \(\nabla\tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right)\): Lipschitz continuous on \(\mathcal{W}\)
      • \(\nabla\tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right)|_{\mathbf{w}=\mathbf{w}^{k}}=\nabla U\left(\mathbf{w}\right)|_{\mathbf{w}=\mathbf{w}^{k}}\)
    • Minimization: minimize \(\tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right)\) to get the update \[\mathbf{w}^{k+1} \triangleq\arg\min_{\mathbf{w}\in\mathcal{W}}\tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right).\]

Construction of approximation

Minimization

One more iteration

Classical methods as SCA

  • (Unconstrained) gradient descent: Choose \[\tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right)=U\left(\mathbf{w}^{k}\right)+\nabla U\left(\mathbf{w}^{k}\right)^{T}\left(\mathbf{w}-\mathbf{w}^{k}\right)+\frac{1}{2\alpha^{k}}\left\Vert \mathbf{w}-\mathbf{w}^{k}\right\Vert _{2}^{2}.\] Setting the derivative w.r.t. \(\mathbf{w}\) to zero yields: \[\mathbf{w}^{k+1} =\mathbf{w}^{k}-\alpha^{k}\nabla U\left(\mathbf{w}^{k}\right).\]

  • (Unconstrained) Newton’s method: Choose \[\begin{aligned} \tilde{U}\left(\mathbf{w};\mathbf{w}^{k}\right) = U\left(\mathbf{w}^{k}\right) & + \nabla U\left(\mathbf{w}^{k}\right)^{T}\left(\mathbf{w}-\mathbf{w}^{k}\right)\\ & + \frac{1}{2\alpha^{k}}\left(\mathbf{w}-\mathbf{w}^{k}\right)^{T}\nabla^{2}U\left(\mathbf{w}^{k}\right)\left(\mathbf{w}-\mathbf{w}^{k}\right). \end{aligned}\] Setting the derivative w.r.t. \(\mathbf{w}\) to zero yields: \[\mathbf{w}^{k+1} =\mathbf{w}^{k}-\alpha^{k}\left(\nabla^{2}U\left(\mathbf{w}^{k}\right)\right)^{-1}\nabla U\left(\mathbf{w}^{k}\right).\]

SCA for RPP optimization

  • Recall the objective \[U\left(\mathbf{w}\right)=\sum_{i=1}^Ng_i(\mathbf{w})^{2}+\lambda F\left(\mathbf{w}\right).\]

  • At the \(k\)th iteration \(\mathbf{w}^{k}\), linearize \(g_i(\mathbf{w})\) to construct \[\begin{aligned} \tilde{U}\left(\mathbf{w},\mathbf{w}^{k}\right)=\, \overset{P\left(\mathbf{w};\mathbf{w}^{k}\right)\triangleq}{\overbrace{\sum_{i=1}^{N}\left(g_{i}\left(\mathbf{w}^{k}\right)+\nabla g_{i}\left(\mathbf{w}^{k}\right)^{T}\left(\mathbf{w}-\mathbf{w}^{k}\right)\right)^{2}}}\\ +\frac{\tau}{2}\left\Vert \mathbf{w}-\mathbf{w}^{k}\right\Vert _{2}^{2}+\lambda F\left(\mathbf{w}\right) \end{aligned}\]

  • Idea: lineare nonconvex functions \(g_{i}\left(\mathbf{w}\right)\) inside the square leads to quadratic convex \(P\left(\mathbf{w};\mathbf{w}^{k}\right)\) that approximates \(R(\mathbf{w})=\sum_{i=1}^{N}g_i(\mathbf{w})^2\), with \(\nabla P\left(\mathbf{w},\mathbf{w}^{k}\right)|_{\mathbf{w}=\mathbf{w}^{k}}=\nabla R\left(\mathbf{w}\right)|_{\mathbf{w}=\mathbf{w}^{k}}\).

SCA for RPP optimization

  • \(P\left(\mathbf{w};\mathbf{w}^{k}\right)=\sum_{i=1}^{N}\left(g_{i}\left(\mathbf{w}^{k}\right)+\nabla g_{i}\left(\mathbf{w}^{k}\right)^{T}\left(\mathbf{w}-\mathbf{w}^{k}\right)\right)^{2}\) can be rewritten more compactly as \[P\left(\mathbf{w};\mathbf{w}^{k}\right) = \|\mathbf{A}^k\left(\mathbf{w}-\mathbf{w}^k\right) + \mathbf{g}\left(\mathbf{w}^{k}\right)\|^2\] where \[\begin{aligned} \mathbf{A}^{k} &\triangleq \left[\nabla g_{1}\left(\mathbf{w}^{k}\right),\dots,\nabla g_{N}\left(\mathbf{w}^{k}\right)\right]^{T},\\ \mathbf{g}\left(\mathbf{w}^{k}\right) & \triangleq \left[g_{1}\left(\mathbf{w}^{k}\right),\dots,g_{N}\left(\mathbf{w}^{k}\right)\right]^{T}. \end{aligned}\]

  • We can further expand \(P\left(\mathbf{w};\mathbf{w}^{k}\right)\) as \[\begin{aligned} P\left(\mathbf{w};\mathbf{w}^{k}\right) = \left(\mathbf{w}-\mathbf{w}^k\right)^T\left(\mathbf{A}^k\right)^T\mathbf{A}^k\left(\mathbf{w}-\mathbf{w}^k\right) & + \mathbf{g}\left(\mathbf{w}^{k}\right)^T\mathbf{g}\left(\mathbf{w}^{k}\right)\\ & + 2\mathbf{g}\left(\mathbf{w}^{k}\right)^T\mathbf{A}^k\left(\mathbf{w}-\mathbf{w}^k\right)\\ \end{aligned}\]

SCA for RPP optimization

  • The quadratic program (QP) approximation problem at the \(k\)th iteration is \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \tilde{U}\left(\mathbf{w},\mathbf{w}^{k}\right)=\frac{1}{2}\mathbf{w}^{T}\mathbf{Q}^{k}\mathbf{w}+\mathbf{w}^{T}\mathbf{q}^{k}+\lambda F\left(\mathbf{w}\right)\\ \textsf{subject to} & \mathbf{w}\in\mathcal{W} \end{array} \] where \[\begin{aligned} \mathbf{Q}^{k} &\triangleq 2\left(\mathbf{A}^{k}\right)^{T}\mathbf{A}^{k}+\tau\mathbf{I},\\ \mathbf{q}^{k} &\triangleq 2\left(\mathbf{A}^{k}\right)^{T}\mathbf{g}\left(\mathbf{w}^{k}\right)-\mathbf{Q}^{k}\mathbf{w}^{k}, \end{aligned}\]

  • This problem can be solved direclty with a QP solver or, depending on the constraints in \(\mathcal{W}\), one may derive simpler closed-form solutions.

  • For example, if we only have equality constraints in the form \(\mathbf{C}\mathbf{w}=\mathbf{c}\), then from the KKT optimality conditions the optimal solution is found as \(\hat{\mathbf{w}}^k=-{(\mathbf{Q}^k)}^{-1}(\mathbf{q}^k+\mathbf{C}^T\boldsymbol{\lambda}^k)\) where \(\boldsymbol{\lambda}^k=-\left(\mathbf{C}{(\mathbf{Q}^k)}^{-1}\mathbf{C}^T \right)^{-1}\left(\mathbf{C}{(\mathbf{Q}^k)}^{-1}\mathbf{q}^k+\mathbf{c}\right)\).

RPP: Sequential numerical algorithm

Algorithm 2: Successive Convex optimization for RIsk Parity portfolio (SCRIP)

Set \(k=0\), \(\mathbf{w}^{0}\in\mathcal{W}\), \(\tau>0\), \(\{\gamma^{k}\}\in(0,1]\)
repeat

  • Solve QP problem to get the optimal solution \(\hat{\mathbf{w}}^{k}\) (global minimum)

  • \(\mathbf{w}^{k+1}=\mathbf{w}^{k}+\gamma^{k}\left(\hat{\mathbf{w}}^{k}-\mathbf{w}^{k}\right)\)

  • \(k \gets k+1\)

until convergence
return \(\mathbf{w}^k\)

More advanced algorithms can be found in (Feng and Palomar 2015):

📃 Y. Feng and D. P. Palomar, “SCRIP: Successive convex optimization methods for risk parity portfolios design,” IEEE Trans. Signal Process., vol. 63, no. 19, pp. 5285–5300, 2015.

Convergence analysis



Proposition 2:

Under some technical conditions, suppose \(\tau>0\), \(\gamma^{k}\in\left(0,1\right]\), \(\gamma^{k}\rightarrow0\), \(\sum_{k}\gamma^{k}=+\infty\) and \(\sum_{k}\left(\gamma^{k}\right)^{2}<+\infty\), and let \(\left\{ \mathbf{w}^{k}\right\}\) be the sequence generated by Algorithm 2. Then, either Algorithm 1 converges in a finite number of iterations to a stationary point or every limit of \(\left\{ \mathbf{w}^{k}\right\}\) (at least one such point exists) is a stationary point.

Fast numerical convergence of SCA

R session: Solving the nonconvex case with the SCA method

We will now devise our own algorithm based on the successive convex approximation (SCA) method. Basically, at the \(k\)th iteration, one approximates the problem by a convex one around the current point \(\mathbf{w}^{(k)}\). In our case, the approximated problem is a QP: \[ \begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \tilde{U}\left(\mathbf{w},\mathbf{w}^{k}\right)=\frac{1}{2}\mathbf{w}^{T}\mathbf{Q}^{k}\mathbf{w}+\mathbf{w}^{T}\mathbf{q}^{k}+\lambda F\left(\mathbf{w}\right)\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1,\quad\mathbf{w}\ge\mathbf{0}, \end{array} \] where \[\begin{aligned} \mathbf{Q}^{k} &\triangleq 2\left(\mathbf{A}^{k}\right)^{T}\mathbf{A}^{k}+\tau\mathbf{I},\\ \mathbf{q}^{k} &\triangleq 2\left(\mathbf{A}^{k}\right)^{T}\mathbf{g}\left(\mathbf{w}^{k}\right)-\mathbf{Q}^{k}\mathbf{w}^{k},\\ \mathbf{A}^{k} &\triangleq \left[\nabla g_{1}\left(\mathbf{w}^{k}\right),\dots,\nabla g_{N}\left(\mathbf{w}^{k}\right)\right]^{T},\\ \mathbf{g}^{k} & \triangleq \left[g_{1}\left(\mathbf{w}^{k}\right),\dots,g_{N}\left(\mathbf{w}^{k}\right)\right]^{T}. \end{aligned}\]

This QP can be solved by a QP solver at each iteration (we will use the function solve.QP() from the package quadprog), obtaining \(\hat{\mathbf{w}}^{k}\), and the next iterate is formed by introducing some smoothing: \[\mathbf{w}^{k+1}=\mathbf{w}^{k}+\gamma^{k}\left(\hat{\mathbf{w}}^{k}-\mathbf{w}^{k}\right),\] where \(\gamma^{k}\) can be chosen in practice as \[\gamma^{k} = \gamma^{k-1}(1-\zeta\gamma^{k-1})\] with \(\gamma_0\in(0,1]\) and \(\zeta\in(0,1)\).

We are ready to implement our own algorithm based on the SCA approach. First we define a function that computes the functions \(g_{i,j}\) and their gradients (which are needed for \(\mathbf{g}^{k}\) and \(\mathbf{A}^{k}\)):

compute_gA <- function(w, Sigma) {
  N <- length(w)
  g <- rep(NA, N^2)
  A <- matrix(NA, N^2, N)
  for (i in 1:N) {
    Mi <- matrix(0, N, N)
    Mi[i, ] <- Sigma[i, ]
    for (j in 1:N) {
      Mj <- matrix(0, N, N)
      Mj[j, ] <- Sigma[j, ]
      #g[i + (j-1)*N]   <- t(w) %*% (Mi - Mj) %*% w
      g[i + (j-1)*N]   <- w[i]*(Sigma[i, ] %*% w) - w[j]*(Sigma[j, ] %*% w)  # faster
      A[i + (j-1)*N, ] <- (Mi + t(Mi) - Mj - t(Mj)) %*% w
    }
  }
  # # g can be computed much more efficiently with this code:
  # wSw <- w * (Sigma %*% w)
  # g <- rep(wSw, times = N) - rep(wSw, each = N)
  return(list(g = g, A = A))
}

Now we can implement the main loop of the SCA algorithm:

library(quadprog)  # install.packages("quadprog")

# parameters
max_iter <- 40
tau <- 1e-6
zeta <- 0.1
gamma <- 0.99
# initial point
obj_value <- NULL
w_SCA <- rep(1/N, N)  # initial point
for (k in 1:max_iter) {
  # compute parameters for QP
  gA <- compute_gA(w_SCA, Sigma)
  g <- gA$g
  A <- gA$A
  Q <- 2 * t(A) %*% A + tau*diag(N)  # faster code is: crossprod(A) = t(A) %*% A
  q <- 2 * t(A) %*% g - Q %*% w_SCA
  obj_value <- c(obj_value, sum(g^2))
  
  # # solve the inner QP with CVXR
  # w_ <- Variable(N)
  # prob <- Problem(Minimize(0.5*quad_form(w_, Q) + t(q) %*% w_),
  #                 constraints = list(sum(w_) == 1))
  # result <- solve(prob)
  # w_ <- as.vector(result$getValue(w_))
  
  # solve the problem with solve.QP() which is much faster than CVXR)
  w_ <- solve.QP(Q, -q, matrix(1, N, 1), 1, meq = 1)$solution
  
  # next w
  gamma <- gamma*(1 - zeta*gamma)
  w_SCA_prev <- w_SCA
  w_SCA <- w_SCA + gamma*(w_ - w_SCA)
  
  # stopping criterion
  if (max(abs(w_SCA - w_SCA_prev)) <= 1e-4*max(abs(w_SCA_prev)))
    break
  # if (k>1 && abs(obj_value[k] - obj_value[k-1]) <= 1e-2*obj_value[k-1])
  #   break
}
cat("Number of iterations:", k)
R>> Number of iterations: 25
plot(obj_value, type = "b", col = "blue",
     main = "Convergence of SCA", xlab = "iteration", ylab = "objective value")

We can now plot the achieved dollar allocation:

# plot
w_all <- cbind(w_all, "RPP (SCA)" = w_SCA)
barplot(t(w_all), col = rainbow10equal[1:9],
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars", beside = TRUE, 
        legend = colnames(w_all))

and the risk contribution:

# compute risk contributions
risk_all <- cbind(risk_all, 
                  "RPP (SCA)" = as.vector(w_SCA * (Sigma %*% w_SCA)))
RRC_all <- sweep(risk_all, MARGIN = 2, STATS = colSums(risk_all), FUN = "/")  # normalize each column

# plot
barplot(t(RRC_all), col = rainbow10equal[1:9],
        main = "Relative risk contribution", xlab = "stocks", ylab = "risk", beside = TRUE, 
        legend = colnames(RRC_all))

We can observe that the achieved risk contributions are perfectly equalized for the solution based on: i) convex formulation, ii) package riskParityPortfolio, and iii) SCA.

R session: Solving the general nonconvex formulation including the expected return

Thus far, the risk parity formulation was dealing only with the risk concentration term while ignoring the expected return. We can now design a risk parity portfolio that also takes into account the expected return \(\mathbf{w}^T\boldsymbol{\mu}\) as \[\begin{array}{ll} \underset{\mathbf{w}}{\textsf{minimize}} & \sum_{i,j=1}^{N}\left(w_{i}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{i}-w_{j}\left(\boldsymbol{\Sigma}\mathbf{w}\right)_{j}\right)^{2} - \lambda \mathbf{w}^T\boldsymbol{\mu}\\ \textsf{subject to} & \mathbf{1}^T\mathbf{w}=1,\quad\mathbf{w}\in\mathcal{W}, \end{array}\]

The R package riskParityPortfolio can deal with such formulation:

library(riskParityPortfolio)
#?riskParityPortfolio  # to get help for the function

# use package
rpp_mu <- riskParityPortfolio(Sigma, mu = mu, lmd_mu = 5e-5, formulation = "rc-double-index")

# plot
w_all <- cbind(w_all, "RPP + mu" = rpp_mu$w)
barplot(t(w_all), col = rainbow6equal,
        main = "Portfolio allocation", xlab = "stocks", ylab = "dollars", beside = TRUE, 
        legend = colnames(w_all))

Let’s compare the performance (in-sample vs out-of-sample):

# compute returns of all portfolios
ret_all <- xts(X_lin %*% w_all, index(X_lin))
ret_all_trn <- ret_all[1:T_trn, ]
ret_all_tst <- ret_all[-c(1:T_trn), ]

# performance in-sample
t(table.AnnualizedReturns(ret_all_trn))
R>>               Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> GMVP                     0.0447             0.1762                    0.2538
R>> Markowitz MVP            0.1200             0.1968                    0.6100
R>> EWP                     -0.1172             0.2574                   -0.4553
R>> RPP (naive)             -0.0180             0.1884                   -0.0955
R>> RPP (convex)            -0.0353             0.1954                   -0.1804
R>> RPP + mu                 0.0256             0.1826                    0.1400
# performance out-of-sample
t(table.AnnualizedReturns(ret_all_tst))
R>>               Annualized Return Annualized Std Dev Annualized Sharpe (Rf=0%)
R>> GMVP                     0.3285             0.1516                    2.1672
R>> Markowitz MVP            0.2709             0.1445                    1.8752
R>> EWP                      0.7153             0.2246                    3.1841
R>> RPP (naive)              0.5699             0.1874                    3.0412
R>> RPP (convex)             0.5980             0.1932                    3.0955
R>> RPP + mu                 0.4938             0.1700                    2.9052

The new RPP that takes into account the expected return has better performance in-sample, but not out-of-sample. This is because during the training set \(\boldsymbol{\mu}\) is too different from the test set. In order to properly exploit the RPP + mu formulation, we should take a rolling window approach (which is out of the scope of this analysis).

Let us plot the cumulative PnL over time (recall that it looks deceptive to the untrained eye):

{ chart.CumReturns(ret_all, main = "Cum PnL", 
                   wealth.index = TRUE, legend.loc = "topleft", colorset = rainbow6equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

The drawdown, on the other hand, clearly shows the superior performance of the risk-parity portfolio including the expected return:

{ chart.Drawdown(ret_all, main = "Drawdown",
                 legend.loc = "bottomleft", colorset = rainbow6equal)
  addEventLines(xts("training", index(X_lin[T_trn])), srt=90, pos=2, lwd = 2, col = "darkblue") }

chart.Drawdown(ret_all_tst, main = "Drawdown during test phase", 
               legend.loc = "bottomleft", colorset = rainbow6equal)

Overall, the best in-sample performance goes to GMVP and Markowitz MVP, but they are quite bad out-of-sample. The best out-of-sample in terms of Sharpe ratio are the EWP and all the RPP. Amont those best performers, after a more carefull inspection of the drawdown, it seems that the EWP is the worst. So finally we can conclude that the RPP are the best.

R session: Sensitivity of RPP to parameters

To study the sensitivity with respect to the parameters \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\), we will design multiple portfolios based on different samples from the training set:

# compute 8 different set of portfolios from different input data
w_RPP_mu_acc <- w_RPP_acc <- NULL
for (i in 1:8) {
  # sample means with random samples
  idx <- sample(1:T_trn, T_trn/2)
  mu_ <- colMeans(X_log_trn[idx, ])
  Sigma_ <- cov(X_log_trn[idx, ])
  
  # design risk-parity portfolio
  w_RPP_acc <- cbind(w_RPP_acc, 
                     riskParityPortfolio(Sigma_)$w)
  w_RPP_mu_acc <- cbind(w_RPP_mu_acc, 
                        riskParityPortfolio(Sigma_, mu = mu_, lmd_mu = 5e-5, 
                                            formulation = "rc-double-index")$w)
}
rownames(w_RPP_mu_acc) <- rownames(w_RPP_acc) <- colnames(X_lin)

The RPP is not very sensitive to the parameter \(\boldsymbol{\Sigma}\) since all the realizations have a similar allocation:

barplot(t(w_RPP_acc), col = rainbow8equal,
        main = "Different realizations of RPP", 
        xlab = "stocks", ylab = "dollars", beside = TRUE)

Let’s check out the RPP that takes into account the expecter return:

barplot(t(w_RPP_mu_acc), col = rainbow8equal,
        main = "Different realizations of RPP with expected return", 
        xlab = "stocks", ylab = "dollars", beside = TRUE)

We can observe that this risk-parity portfolio is still not very sensitive to the parameters including \(\boldsymbol{\mu}\). However, if the value of \(\lambda\) is increased to, say, \(10^{-5}\) then one can observe a stronger sensitivity.

Conclusions

Conclusions

  • We have reviewed the Markowitz portfolio formulation and understood that it has many practical flaws that make it impractical. Indeed, it has not been embraced by practitioners.

  • We have learned about the risk parity portfolio formulation.

  • We have explored different numerical methods for the risk parity portfolio:

    • the closed-form solution for the naive diagonal formulation
    • many algorithms for the vanilla convex formulation
    • the successive convex approximation (SCA) method for the general nonconvex formulation.

  • The performance of risk parity portfolio vs. Markowitz portfolio is much improved.

  • Side result: we have learned how to develop efficient numerical algorithms for nonconvex problems based on SCA.

Thanks

References

Asness, C. S., Frazzini, A., & Pedersen, L. H. (2012). Leverage aversion and risk parity. Financial Analysts Journal, 68(1), 47–59.

Bertsekas, D. P. (1999). Nonlinear programming. Athena Scientific.

Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.

Bruder, B., & Roncalli, T. (2012). Managing risk exposures using the risk budgeting approach. University Library of Munich, Germany.

Chopra, V., & Ziemba, W. (1993). The effect of errors in means, variances and covariances on optimal portfolio choice. Journal of Portfolio Management.

Feng, Y., & Palomar, D. P. (2015). SCRIP: Successive convex optimization methods for risk parity portfolios design. IEEE Trans. Signal Processing, 63(19), 5285–5300.

Feng, Y., & Palomar, D. P. (2016). A Signal Processing Perspective on Financial Engineering. Foundations; Trends in Signal Processing, Now Publishers.

Griveau-Billion, T., Richard, J.-C., & Roncalli, T. (2013). A fast algorithm for computing high-dimensional risk parity portfolios. SSRN. https://ssrn.com/abstract=2325255

Kaya, H., & Lee, W. (2012). Demystifying risk parity. Neuberger Berman.

Maillard, S., Roncalli, T., & Teiletche, J. (2010). The properties of equally weighted risk contribution portfolios. Journal of Portfolio Management, 36(4), 60–70.

Markowitz, H. (1952). Portfolio selection. J. Financ., 7(1), 77–91.

Meucci, A. (2005). Risk and asset allocation. Springer.

Qian, E. (2005). Risk parity portfolios: Efficient portfolios through true diversification. PanAgora Asset Management.

Qian, E. E. (2016). Risk parity fundamentals. CRC Press.

Roncalli, T. (2013). Introduction to risk parity and budgeting. CRC Press.

Scutari, G., Facchinei, F., Song, P., Palomar, D. P., & Pang, J.-S. (2014). Decomposition by partial linearization: Parallel optimization of multi-agent systems. IEEE Trans. Signal Processing, 62(3), 641–656.

Spinu, F. (2013). An algorithm for computing risk parity weights. SSRN. https://ssrn.com/abstract=2297383

Zhao, Z., Zhou, R., Palomar, D. P., & Feng, Y. (2019). Portfolio optimization. submitted.