Manage overfitting

Manage overfitting

How to manage Overfitting

Wikipedia gives the following definition of overfitting:"Overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model that has been overfit will generally have poor predictive performance, as it can exaggerate minor fluctuations in the data."

It can happen to have an over-optimized trading system, that could lead your backtest to lie. What you see it is likely something not happening in reality. Using a dataset for backtesting and finding the best combination of parameters can become a big mistake. In fact the more the parameters you have, like stop loss, take profit and so on, your model runs the risk of loosing ability to surf a trend and thus interprete just some noise. The following image shows a wonderful equity line.

Winning Backtest lying

Will it be real or just a mistake given by overfitting?

If we take a look at a wider historical dataset, let's say a 40 years history, instead of a 8 years one, the result is quite different.

Real backtest

It does not mean necessarily that your model is not good, it could just be that such stock started a precise trend after 2007.

The behavior of a stock or any kind of futures, can be defined in a double way: by the noise they make and, inside it, by a precise behavior they have. Market behaves following a precise idea, the best thing to do is to clear noise and find the way market moves. It can sound strange, but it is important to have a row strategy, tested in a random dataset (I advice a 10 years backtest), in case it is profitable, optimize it, but remember to use little parameters, especially during the initial tests.

Once it is ok for you, take a 30 years (or even more) dataset and run the same backtest with the same parameters used before. There are three possibilities for the 20 previously not backtested data: the model loose money, is flat or behave exactly as for the last 10 years.

The last scenario is the best one, because it means that the model has been working for 30 years. The case of an almost flat equity line is related to a complete randomness of the behavior according to the way you open a position. It could be a good point too according to me, because it means that some new behavior has born. The case of losses can be interesting because it means that there is always been a precise behavior behind, but it has changed. In that case the analyst has to investigate about the causes of such a change and try to forecast a new one in order to catch it next time.

As a general rule, never overfit the model, just make up your mind with a trading idea, run row backtests and decide whether they are good or not from this first approach. As a second step, optimize it for the period under analysis (10 years is enough) and finally run a longer test in order to show its profitability for a not overfitted dataset. If profits still remain, well... You might have found your way to make money