"Overfitting: Phantom Risk in Portfolio Management"

Finance Published: June 02, 2013

BACAGG

The Elusive Pursuit: Why Overfitting is a Phantom menace in Portfolio Management

Overfitting, the bane of many an investor's existence, lurks like a phantom in the world of portfolio management. It's a term as familiar to seasoned investors as it is misunderstood by newcomers. But why does overfitting matter now more than ever? As investment technology advances at breakneck speed, our models grow more complex, and with them, the risk of overfitting. Today, we're going to demystify this spectral foe and arm ourselves with knowledge to combat it.

The Overfitting Conundrum: A Tale of Two Polynomials

Picture this: you're trying to predict stock prices using a polynomial model. You've got two polynomials, one quadratic (`p2`) and one fourth-degree (`p4`). You create these in R like so:

> p2 <- poly(1:100, degree=2) > p4 <- poly(1:100, degree=4)

Now, you're trying to predict future returns using past data. But here's where things get tricky. The more parameters your model has, the more it can 'fit' the noise in your historical data. Let's see how this plays out:

> days <- 1:70 > reg.pt4.1 <- lm(true4[1:70] ~ days) > reg.pt4.2 <- lm(true4[1:70] ~ poly(days, 2)) > reg.pt4.4 <- lm(true4[1:70] ~ poly(days, 4))

Surprised? The linear model (`reg.pt4.1`) often outperforms the quadratic (`reg.pt4.2`) and even the 'right' fourth-degree model (`reg.pt4.4`). Why? Because the more complex models can overfit the noise in your data, leading to poor predictions.

The Devil in the Details: Noise and Overfitting

Real-world returns come with noise. Let's add some noise to our polynomial:

> noise4 <- true4 + rnorm(100, sd=.5)

Now, check out how our predictions fare:

![Noise and Predictions](http://i.imgur.com/X87VYZM.png)

The linear prediction beats even the 'right' model. The fourth-degree prediction starts strong but ends poorly. This is overfitting in action.

Overfitting in All Its Glory: Pure Noise

Let's try our exercise on pure noise:

> noise0 <- rnorm(100)

The results? Downright bizarre predictions from the polynomial fits:

![All-Noise Data](http://i.imgur.com/9tH78ZF.png)

The Moral of the Story: Less is Often More

The more parameters in your model, the greater the risk of overfitting. Even with perfect knowledge of the generating mechanism, simpler models can outperform complex ones. Splines are a better choice than polynomials for practical use.

Understanding Overfitting: A Mathematical Deep Dive

Overfitting occurs when our model is too complex relative to the amount and noise in our data. It's like trying to fit a curve to just three points—you'll get a wild, wiggly line that doesn't generalize well.

Mathematically, overfitting happens when we minimize the error on our training data at the expense of increasing variance (and thus decreasing bias). This is illustrated by the bias-variance tradeoff:

![Bias-Variance Tradeoff](http://i.imgur.com/2Z9347j.png)

In high dimensions or with small datasets, overfitting becomes more likely. This is why regularization techniques like LASSO and Ridge regression are popular—they introduce a penalty term to discourage overfitting.

Portfolio Implications: From C to AGG

Overfitting can wreak havoc on portfolios by leading to poor predictions of returns, leading to suboptimal asset allocation. Let's consider how it might affect some familiar assets:

- C (Coca-Cola): If your model overfits to recent performance, you might underestimate long-term stability and miss out on steady growth. - BAC (Bank of America): Overfitting to short-term noise could lead you to misjudge the bank's recovery potential after a scandal or downturn. - MS (Microsoft): A complex model might overfit to recent tech trends, causing you to overweight MS relative to other tech stocks with more stable fundamentals.

But it's not all doom and gloom. Spotting overfitting can also uncover opportunities:

Combating Overfitting: Practical Strategies

To combat overfitting, consider these strategies:

1. Cross-validation: Divide your data into training and validation sets. Train your model on one set and validate it on the other. Repeat this process with different subsets to get a better sense of how well your model generalizes.

2. Regularization: As mentioned earlier, techniques like LASSO or Ridge regression can help prevent overfitting by adding a penalty term to discourage complexity.

3. Early stopping: If you're training a neural network or other iterative model, stop training once performance on a validation set starts to degrade. This can prevent your model from memorizing the noise in your data.

4. Simpler models: Don't reach for complex models when simpler ones will do. Start with linear regression and only add complexity if it improves performance.

The Final Showdown: Your Action Plan

Overfitting is a constant challenge, but here's how you can fight back:

1. Inspect your data: Always check your data for noise and outliers that could lead to overfitting. 2. Keep models simple: Start with simpler models and only add complexity if it improves performance. 3. Use cross-validation and regularization: These tools help prevent overfitting by assessing how well your model generalizes. 4. Stay vigilant: Regularly review and update your models to ensure they're not overfitting to historical data.

Overfitting: Phantom Risk in Portfolio Management