Catching Overfitting is Easier Than You Think

Let's say you are a newly hired quantitative researcher. You did great in your interviews, but now your chief investment officer wants to really put you to the test.

For your first test, your CIO gives you a dataset and requests you to find an alpha signal out of it. You get to work right away, wanting to prove yourself.

Knowing how scarce alpha is in the market, you work very diligently, going over the data dozens of times. You create various low-pass filters. You develop numerous factors and metrics. After working tirelessly, you eventually are able to build an alpha signal out of the data, and very proudly hand it to your CIO.

Now here's the thing, your CIO was very clever, she didn't give you just any dataset, she gave you a completely randomly generated dataset. Just a purely synthetic dataset with no discernable pattern whatsoever.

You've essentially fit your algorithm to pure noise. You have been completely fooled, and your CIO is not too pleased that his new analyst has been so easily fooled.

Let me give you a hypothetical to better illustrate this example. If you would have tracked each of your dozen or so back-tests, what you would have found is a normal distribution of backtests. Some backtests perform well, some perform poorly, but on average, they are centered about zero. I.e. they have zero alpha. If you are not tracking these, then you would be blissfully unaware of this distribution.

Every researcher is incentivized to maximize their return, so the ignorant quant will simply keep trying different backtests until they find really good performance. Unbeknownst to them, they have simply picked a backtest in the right-most tail of a random distribution centered about zero. They can expect zero return going forward.

This pitfall is more common than you think. it is essentially the main problem to be overcome in finance, separating signal from noise.

So how can a researcher defend against this? Well there are a few simple heuristics that you can guide you. I'll also mention a more complex technique to improve upon this concept.

The first robustness technique - often called the ranked long / short technique - is to take one of your factors, bucket the factor by strength of the signal and see how the return performance varies across the signal. You can create 5 buckets (quintiles), 10 buckets (deciles), etc, to create a histogram. Typically, if the signal is not random noise, what you should see is very negative return when the factor is weakest, and gradually increasing return up until a highly positive return when the factor is strongest (or vice versa if the factor has a negative signal). To generalize, you typically want to see a strong slope from one end of the histogram to the other. If the factor is random, what you will see is no discernable trend across the histogram.

A second robustness technique you can easily deploy is to generate minor variations of the factor in question. Let me give you an example. A common momentum lookback for long-term investing is to take a stock's return for the past 12 months. Simply go long the stocks with the highest 12 month return, and go short the stocks with the lowest 12 month return. What you will find is that your return will be very similar if you take a lookback of 13 months, or 11 months. This is a good sign and means that your signal is robust. I would go so far as to see the sensitivity of a large range of lookbacks to see where the sensitivities move in and out of positive return. What you don't want to see is a highly positive return in one lookback, and then flat or negative return in an adjacent lookback, that would be a clear sign that your factor is fit to noise.

Here's one last technique, but I'll warn you that this one is more complex to carry out. Remember the hypothetical normal distribution we discussed previously, the one that was centered about zero. Well what if we could actually create that distribution in our research? One of the easiest ways to do this is through cross-validation. Cross-validation is a topic that deserves it's own post, but in summary, it involves splitting your data up into many different train and test sets, and creating a performance distribution of all these models. Using this technique you can easily extend your distribution to hundreds of model performances trained on different times throughout history, different assets and different factors. From this distribution you can begin to approximate the likelihood that your data is overfit. Typically you want to see a strong positive mean and median return, and a minimal negative tail distribution. This will ensure that your factors are robustness through time and market regime, and will provide an added assurance on out of sample performance.

In conclusion, the path to generating alpha is strewn with potential pitfalls. The primary risk of overfitting to noise can fool even the best quants. However, as highlighted, there are tools and techniques at your disposal to robustly test and validate strategies, ensuring they are grounded in true economic rationales and not merely statistical mirages. The ranked long / short technique, and the minor factor variation technique will provide a quick and dirty sanity check on the robustness of a factor. Furthermore, the sophisticated cross-validation technique will add an additional layer of fidelity and understanding. These techniques will help you lay the foundation for genuine, sustainable returns returns.