MAFS5310 - Portfolio Optimization with R
MSc in Financial Mathematics
The Hong Kong University of Science and Technology (HKUST)
Fall 2020-21

Outline

  • Primer on Financial Data

  • Modeling the Returns

  • Portfolio Basics

  • Heuristic Portfolios

  • Markowitz’s Modern Portfolio Theory (MPT)

    • Mean-variance portfolio (MVP)
    • Global minimum variance portfolio (GMVP)
    • Maximum Sharpe ratio portfolio (MSRP)
  • Risk-Based Portfolios (GMVP, IVP, RPP, MDP, MDCP)

  • Comparison of Portfolios

  • Conclusions

Primer on Financial Data

Asset log-prices

  • Let \(p_{t}\) be the price of an asset at (discrete) time index \(t\).
  • The fundamental model is based on modeling the log-prices \(y_{t}\triangleq\log p_{t}\) as a random walk: \[y_{t}=\mu+y_{t-1}+\epsilon_{t}\]

Asset returns

  • For stocks, returns are used for the modeling since they are “stationary” (as opposed to the previous random walk).

  • Simple return (a.k.a. linear or net return) is \[R_{t} \triangleq\frac{p_{t}-p_{t-1}}{p_{t-1}}=\frac{p_{t}}{p_{t-1}}-1.\]

  • Log-return (a.k.a. continuously compounded return) is \[r_{t} \triangleq y_{t}-y_{t-1}=\log\frac{p_{t}}{p_{t-1}}=\log\left(1+R_{t}\right).\]

  • Observe that the log-return is “stationary”: \[r_{t}=y_{t}-y_{t-1}=\mu+\epsilon_{t}\]

  • Note that \(r_{t}\approx R_{t}\) when \(R_{t}\) is small (i.e., the changes in \(p_t\) are small) (Ruppert 2010).

S&P 500 index - Log-returns

Autocorrelation

ACF of S&P 500 log-returns:

Autocorrelation

ACF of S&P 500 absolute value of log-returns:

Non-Gaussianity and asymmetry

Histograms of S&P 500 log-returns:

Heavy-tailness

QQ plots of S&P 500 log-returns:

Volatility clustering

S&P 500 log-returns:

Volatility clustering removed

Standardized S&P 500 log-returns:

Conditional heavy-tailness

QQ plots of standardized S&P 500 log-returns (conditional heavy-tailness aka aggregational Gaussianity):

Frequency of data


  • Low frequency (weekly, monthly): Gaussian distributions seems to fit reality after correcting for volatility clustering (except for the asymmetry), but the nonstationarity is a big issue



  • Medium frequency (daily): definitely heavy tails even after correcting for volatility clustering, as well as asymmetry



  • High frequency (intraday, 30min, 5min, tick-data): below 5min the noise microstructure starts to reveal itself

Modeling the Returns

Returns of the universe

  • In practice, we don’t just deal with one asset but with a whole universe of \(N\) assets.

  • We denote the log-returns of the \(N\) assets at time t with the vector \(\mathbf{r}_{t}\in\mathbb{R}^{N}\).

  • The time index \(t\) can denote any arbitrary period such as days, weeks, months, 5-min intervals, etc.

  • \(\mathcal{F}_{t-1}\) denotes the previous historical data.

  • Econometrics aims at modeling \(\mathbf{r}_{t}\) conditional on \(\mathcal{F}_{t-1}\).

  • \(\mathbf{r}_{t}\) is a multivariate stochastic process with conditional mean and covariance matrix denoted as (Feng and Palomar 2016) \[\begin{aligned} \boldsymbol{\mu}_{t} &\triangleq\textsf{E}\left[\mathbf{r}_{t}\mid\mathcal{F}_{t-1}\right]\\ \boldsymbol{\Sigma}_{t} &\triangleq\textsf{Cov}\left[\mathbf{r}_{t}\mid\mathcal{F}_{t-1}\right]=\textsf{E}\left[(\mathbf{r}_{t}-\boldsymbol{\mu}_{t})(\mathbf{r}_{t}-\boldsymbol{\mu}_{t})^{T}\mid\mathcal{F}_{t-1}\right]. \end{aligned}\]

i.i.d. model

  • For simplicity we will assume that \(\mathbf{r}_{t}\) follows an i.i.d. distribution (which is not very innacurate in general).


  • That is, both the conditional mean and conditional covariance are constant: \[\begin{aligned} \boldsymbol{\mu}_{t} &= \boldsymbol{\mu},\\ \boldsymbol{\Sigma}_{t} &= \boldsymbol{\Sigma}. \end{aligned}\]


  • Very simple model, however, it is one of the most fundamental assumptions for many important works, e.g., the Nobel prize-winning Markowitz’s portfolio theory (Markowitz 1952).

Factor models

  • Factor models are special cases of the i.i.d. model with the covariance matrix being decomposed into two parts: low dimensional factors and marginal noise.

  • The factor model is \[\mathbf{r}_{t}=\boldsymbol{\alpha}+\mathbf{B}\mathbf{f}_{t}+\mathbf{w}_{t},\] where

    • \(\boldsymbol{\alpha}\) denotes a constant vector
    • \(\mathbf{f}_{t}\in\mathbb{R}^{K}\) with \(K\ll N\) is a vector of a few factors that are responsible for most of the randomness in the market
    • \(\mathbf{B}\in\mathbb{R}^{N\times K}\) denotes how the low dimensional factors affect the higher dimensional market assets
    • \(\mathbf{w}_{t}\) is a white noise residual vector that has only a marginal effect.
  • The factors can be explicit or implicit.

  • Widely used by practitioners (they buy factors at a high premium).

  • Connections with Principal Component Analysis (PCA) (Jolliffe 2002).

Time-series models

  • The previous models are i.i.d., but there are hundreds of other models attempting to capture the time correlation or time structure of the returns, as well as the volatility clustering or heteroskedasticity.

  • To capture the time correlation we have mean models: VAR, VMA, VARMA, VARIMA, VECM, etc.

  • To capture the volatility clustering we have covariance models: ARCH, GARCH, VEC, DVEC, BEKK, CCC, DCC, etc.

  • Standard textbook references (Lütkepohl 2007; Tsay 2010, 2013):

    📘 H. Lutkepohl. New Introduction to Multiple Time Series Analysis. Springer, 2007.

    📘 R. S. Tsay. Multivariate Time Series Analysis: With R and Financial Applications. John Wiley & Sons, 2013.


  • Simple introductory reference (Feng and Palomar 2016):

    📘 Y. Feng and D. P. Palomar. A Signal Processing Perspective on Financial Engineering. Foundations and Trends in Signal Processing, Now Publishers, 2016.

Fitting process

  • Before we can use a model, we need to estimate the model parameters (for example, in the i.i.d. model: \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\)) using a training set.
  • Then use cross-validation to select the best fit (assuming we have different possible models each with a different fit).
  • Finally, we can use the best fitted model in the test data (aka out-of-sample data) for performance evaluation.

  • Be careful: if you use the test data many times, basically it is not out-of-sample anymore but cross-validation!

Sample estimates

  • Consider the i.i.d. model: \[\mathbf{r}_{t}=\boldsymbol{\mu}+\mathbf{w}_{t},\] where \(\boldsymbol{\mu}\in\mathbb{R}^{N}\) is the mean and \(\mathbf{w}_{t}\in\mathbb{R}^{N}\) is an i.i.d. process with zero mean and constant covariance matrix \(\boldsymbol{\Sigma}\).

  • The mean vector \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\) have to be estimated in practice based on \(T\) observations.

  • The simplest estimators are the sample estimators:

    • sample mean: \(\quad\hat{\boldsymbol{\mu}} =\frac{1}{T}\sum_{t=1}^{T}\mathbf{r}_{t}\)
    • sample covariance matrix: \(\quad\hat{\boldsymbol{\Sigma}} =\frac{1}{T-1}\sum_{t=1}^{T}(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}})(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}})^{T}.\)
      Note that the factor \(1/\left(T-1\right)\) is used instead of \(1/T\) to get an unbiased estimator (asymptotically for \(T\rightarrow\infty\) they coincide).
  • Many more sophisticated estimators exist, namely: shrinkage estimators, Black-Litterman estimators, etc.

Least-Square (LS) estimator

  • Minimize the least-square error in the \(T\) observed i.i.d. samples: \[\underset{\boldsymbol{\mu}}{\textsf{minimize}} \quad\frac{1}{T}\sum_{t=1}^{T}\left\Vert \mathbf{r}_{t}-\boldsymbol{\mu}\right\Vert _{2}^{2}.\]

  • The optimal solution is the sample mean: \[\hat{\boldsymbol{\mu}}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{r}_{t}.\]

  • The sample covariance of the residuals \(\hat{\mathbf{w}}_{t}=\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\) is the sample covariance matrix: \[\hat{\boldsymbol{\Sigma}}=\frac{1}{T-1}\sum_{t=1}^{T}\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)^{T}.\]

Maximum Likelihood Estimator (MLE)

  • Assume \(\mathbf{r}_{t}\) are i.i.d. and follow a Gaussian distribution: \[f(\mathbf{r}) =\frac{1}{\sqrt{\left(2\pi\right)^{N}\left|\boldsymbol{\Sigma}\right|}}e^{-\frac{1}{2}(\mathbf{r}-\boldsymbol{\mu})^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{r}-\boldsymbol{\mu})}.\] where
    • \(\boldsymbol{\mu}\in\mathbb{R}^{N}\) is a mean vector that gives the location
    • \(\boldsymbol{\Sigma}\in\mathbb{R}^{N\times N}\) is a positive definite covariance matrix that defines the shape.

MLE

  • Given the \(T\) i.i.d. samples \(\mathbf{r}_{t}, \;t=1,\ldots,T,\) the negative log-likelihood function is \[\begin{aligned} \ell(\boldsymbol{\mu},\boldsymbol{\Sigma}) &= -\log\prod_{t=1}^{T}f(\mathbf{r}_{t})\\ &=\frac{T}{2}\log\left|\boldsymbol{\Sigma}\right| + \sum_{t=1}^{T}\frac{1}{2}(\mathbf{r}_{t}-\boldsymbol{\mu})^{T}\boldsymbol{\Sigma}^{-1}(\mathbf{r}_{t}-\boldsymbol{\mu})+\text{const}. \end{aligned}\]

  • Setting the derivative of \(\ell(\boldsymbol{\mu},\boldsymbol{\Sigma})\) w.r.t. \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}^{-1}\) to zeros and solving the equations yield: \[\begin{aligned} \hat{\boldsymbol{\mu}} &= \frac{1}{T}\sum_{t=1}^{T}\mathbf{r}_{t}\\ \hat{\boldsymbol{\Sigma}} &= \frac{1}{T}\sum_{t=1}^{T}\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)\left(\mathbf{r}_{t}-\hat{\boldsymbol{\mu}}\right)^{T}. \end{aligned}\]

Parameter estimation

  • The parameter estimates \(\hat{\boldsymbol{\mu}}\) and \(\hat{\boldsymbol{\Sigma}}\) are only good for large \(T\), otherwise the estimation error is unacceptable.

  • For instance, the sample mean is particularly a very inefficient estimator, with very noisy estimates (Meucci 2005).

  • In practice, \(T\) cannot be large enough due to either:

    • unavailability of data or
    • lack of stationarity of data.
  • As a consequence, the estimates contain too much estimation error and a portfolio design (e.g., Markowitz mean-variance) based on those estimates can be severely affected (Chopra and Ziemba 1993).

  • Indeed, this is why Markowitz’s portfolio and other extensions are rarely used by practitioners.

Portfolio Basics

Portfolio return

  • Suppose the capital budget is \(B\) dollars.

  • The portfolio \(\mathbf{w}\in\mathbb{R}^{N}\) denotes the normalized dollar weights of the \(N\) assets such that \(\mathbf{1}^{T}\mathbf{w}=1\) (so \(B\mathbf{w}\) denotes dollars invested in the assets).

  • For each asset \(i\), the initial wealth is \(Bw_{i}\) and the end wealth is \[Bw_{i}\left(p_{i,t}/p_{i,t-1}\right)=Bw_{i}\left(R_{it}+1\right).\]

  • Then the portfolio return is \[R_{t}^{p}= \frac{\sum_{i=1}^{N}Bw_{i}\left(R_{it}+1\right)-B}{B}=\sum_{i=1}^{N}w_{i}R_{it}\approx\sum_{i=1}^{N}w_{i}r_{it}=\mathbf{w}^{T}\mathbf{r}_{t}\]

  • The portfolio expected return and variance are \(\mathbf{w}^{T}\boldsymbol{\mu}\) and \(\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}\), respectively.

Performance measures

  • Expected return: \(\mathbf{w}^{T}\boldsymbol{\mu}\)

  • Volatility: \(\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}\)

  • Sharpe Ratio (SR): expected excess return per unit of risk \[\mathsf{SR} =\frac{\mathbf{w}^{T}\boldsymbol{\mu}-r_{f}}{\sqrt{\mathbf{w}^{T}\boldsymbol{\Sigma}\mathbf{w}}}\] where \(r_{f}\) is the risk-free rate (e.g., interest rate on a three-month U.S. Treasury bill).

  • Information Ratio (IR): SR with respect to a benchmark (e.g., the market index): \(\mathsf{IR} =\frac{\textsf{E}\left[\mathbf{w}^T\mathbf{r}_t - r_{b,t}\right]}{\sqrt{\textsf{Var}\left[\mathbf{w}^T\mathbf{r}_t - r_{b,t}\right]}}\).

  • Drawdown: decline from a historical peak of the cumulative profit \(X(t)\): \[D(T)=\max_{t\in[0,T]}X(t)-X(T)\]

  • VaR (Value at Risk): quantile of the loss.

  • ES (Expected Shortfall) or CVaR (Conditional Value at Risk): expected value of the loss above some quantile.

Practical constraints

  • Capital budget constraint: \[\mathbf{1}^T\mathbf{w} = 1.\]

  • Long-only constraint: \[\mathbf{w} \geq 0.\]

  • Dollar-neutral or self-financing constraint: \[\mathbf{1}^T\mathbf{w} = 0.\]

  • Holding constraint: \[\mathbf{l}\leq\mathbf{w}\leq \mathbf{u}\] where \(\mathbf{l}\in\mathbb{R}^{N}\) and \(\mathbf{u}\in\mathbb{R}^{N}\) are lower and upper bounds of the asset positions, respectively.

Practical constraints

  • Leverage constraint: \[\left\Vert \mathbf{w}\right\Vert _{1}\leq L.\]

  • Cardinality constraint: \[\left\Vert \mathbf{w}\right\Vert _{0} \leq K.\]

  • Turnover constraint: \[\left\Vert \mathbf{w}-\mathbf{w}_{0}\right\Vert _{1} \leq u\] where \(\mathbf{w}_{0}\) is the currently held portfolio.


  • Market-neutral constraint: \[\boldsymbol{\beta}^T\mathbf{w} = 0.\]

Wealth, NAV, or cumulative return

  • The Net Asset Value (NAV) is the value of the portfolio (aka wealth, cumulative return, or cumulative PnL).

  • The portfolio return in terms of NAV is \(R_{t}^{p} = \frac{\textsf{NAV}_t - \textsf{NAV}_{t-1}}{\textsf{NAV}_{t-1}}\) (note that \(1 + R_{t}^{p} = \frac{\textsf{NAV}_t}{\textsf{NAV}_{t-1}}\)).

  • Also recall that the portfolio return is calculated from the assets’ returns as \(R_{t}^{p} = \mathbf{w}^{T}\mathbf{r}_{t}\).

  • Thus, if we fully reinvest, it follows that we can compute the wealth of a portfolio as \[ \textsf{NAV}_t = \textsf{NAV}_0 \times (1+R_{1}^{p}) \times (1+R_{2}^{p}) \times \dots \times (1+R_{t}^{p}). \] where \(\textsf{NAV}_0=B_0\) is the initial budget.

  • If, instead, we keep investing the same initial budget \(B_0\) at each period, then \[\begin{aligned} \textsf{NAV}_t &= \textsf{NAV}_0 + \textsf{NAV}_0 \times R_{1}^{p} + \textsf{NAV}_0 \times R_{2}^{p} + \dots + \textsf{NAV}_0 \times R_{t}^{p}\\ &= \textsf{NAV}_0 \times (1 + R_{1}^{p} + R_{2}^{p} + \dots + R_{t}^{p}) \end{aligned}\]

R session: Loading market data

We will load some stock market data and divide it into a training part (for the estimation of the expected return \(\boldsymbol{\mu}\) and covariance matrix \(\boldsymbol{\Sigma}\), and subsequent portfolio design) and a test part (for the out-of-sample performance evaluation).

In particular, we will start by loading some stock data from three different sectors:

library(xts)  # to manipulate time series of stock data
library(quantmod)  # to download stock data
library(PerformanceAnalytics)  # to compute performance measures

# download data from YahooFinance
stock_namelist <- c("AAPL", "AMD", "ADI",  "ABBV", "AEZS", "A",  "APD", "AA","CF")
prices <- xts()
for (i in 1:length(stock_namelist)) {
  tmp <- Ad(getSymbols(stock_namelist[i], from = "2013-01-01", to = "2016-12-31", auto.assign = FALSE))
  tmp <- na.approx(tmp, na.rm = FALSE)  # interpolate NAs
  prices <- cbind(prices, tmp)
}
colnames(prices) <- stock_namelist
tclass(prices) <- "Date"
str(prices)
R>> An 'xts' object on 2013-01-02/2016-12-30 containing:
R>>   Data: num [1:1008, 1:9] 17.1 16.9 16.4 16.3 16.4 ...
R>>  - attr(*, "dimnames")=List of 2
R>>   ..$ : NULL
R>>   ..$ : chr [1:9] "AAPL" "AMD" "ADI" "ABBV" ...
R>>   Indexed by objects of class: [Date] TZ: UTC
R>>   xts Attributes:  
R>>  NULL
head(prices)
R>>                AAPL  AMD      ADI     ABBV AEZS        A      APD       AA       CF
R>> 2013-01-02 17.09469 2.53 36.35426 25.36849  253 27.86081 64.63237 20.62187 28.26927
R>> 2013-01-03 16.87892 2.49 35.76762 25.15901  254 27.96059 64.40654 20.80537 28.13558
R>> 2013-01-04 16.40876 2.59 35.13143 24.84119  257 28.51276 65.27223 21.24121 28.76579
R>> 2013-01-07 16.31224 2.67 35.23884 24.89175  259 28.30653 65.21201 20.87419 28.65803
R>> 2013-01-08 16.35615 2.67 34.87529 24.35000  255 28.08034 65.33247 20.87419 28.23788
R>> 2013-01-09 16.10052 2.63 34.78441 24.48724  258 28.83873 66.21318 20.82831 29.22548
tail(prices)
R>>                AAPL   AMD      ADI     ABBV AEZS        A      APD    AA       CF
R>> 2016-12-22 27.58162 11.60 67.92088 51.29409 4.10 44.42307 132.3398 29.75 26.71214
R>> 2016-12-23 27.63617 11.58 68.28153 51.85978 4.10 44.64533 132.6931 29.71 27.24443
R>> 2016-12-27 27.81169 12.07 68.71616 51.99287 4.05 44.94490 133.5083 29.65 28.34450
R>> 2016-12-28 27.69310 11.55 68.02262 51.80154 3.55 44.18148 131.4610 29.43 28.06061
R>> 2016-12-29 27.68598 11.59 68.04111 52.18421 3.60 44.23188 131.5613 28.89 28.30014
R>> 2016-12-30 27.47014 11.34 67.15337 52.09269 3.60 44.15434 131.0601 28.08 27.92754
# compute log-returns and linear returns
X_log <- diff(log(prices))[-1]
X_lin <- (prices/lag(prices) - 1)[-1]

# or alternatively...
X_log <- CalculateReturns(prices, "log")[-1]
X_lin <- CalculateReturns(prices)[-1]

N <- ncol(X_log)  # number of stocks
T <- nrow(X_log)  # number of days

We can take a look at the prices of the stocks:

plot(prices/rep(prices[1, ], each = nrow(prices)), col = rainbow10equal, legend.loc = "topleft",
     main = "Normalized prices")

We now divide the data into a training set and test set:

T_trn <- round(0.7*T)  # 70% of data
X_log_trn <- X_log[1:T_trn, ]
X_log_tst <- X_log[(T_trn+1):T, ]
X_lin_trn <- X_lin[1:T_trn, ]
X_lin_tst <- X_lin[(T_trn+1):T, ]

Heuristic Portfolios

Heuristic portfolios

  • Heuristic portfolios are not formally derived from a sound mathematical foundation. Instead, they are intuitive and based on common sense.


  • We will explore the following simple and heuristic portfolios:
    • Buy & Hold (B&H)
    • Buy & Rebalance
    • equally weighted portfolio (EWP) or \(1/N\) portfolio
    • quintile portfolio
    • global maximum return portfolio (GMRP).

Buy & Hold (B&H)

  • The simplest investment strategy consists of selecting just one asset, allocating the whole budget \(B\) to it:

    • Buy & Hold (B&H): chooses one asset and sticks to it forever.
    • Buy & Rebalance: chooses one asset but it reevaluates that choice regularly.
  • The belief behind such investment is that the asset will increase gradually in value over the investment period.

  • There is no diversification in this strategy.

  • One can use different methods (like fundamental analysis or technical analysis) to make the choice.

  • Mathematically, it can be expressed as \[\mathbf{w} = \mathbf{e}_i\] where \(\mathbf{e}_i\) denotes the canonical vector with a 1 on the \(i\)th position and 0 elsewhere.

Buy & Hold (B&H)

Cumulative PnL of 9 possible B&H for 9 assets: