ORATS University - Backtesting methodology

Our accurate historical options data and volatility summaries lay the groundwork for tackling a critial step in the research process - Backtesting.

Why backtesting?

A robust options trading strategy depends on accurate backtesting practices. A backtest shows you how the strategy performed historically, and gives you an idea of what to expect. You can learn a lot about a strategy with a backtest. While historical performance is not an indicator of future performance, a backtest can help you identify things like:

The risk and reward potential of a strategy
The relative performance of different entry criteria like days to expiration and strike deltas
The impact of market conditions on strategy performance
The importance of exit criteria like stop losses and profit targets

These are benefits, but there are also complications with backtesting. Let’s look at some of the common issues traders face when backtesting options.

Common pitfalls of options backtesting

Having been in the backtesting business for over a decade, we see many traders fall into the same traps. The complexity of backtesting options makes it easy to:

Use inaccurate or unrealistic trade execution prices
Overfit the data to create unrealistic returns
Accidentally follow path dependency
Misunderstand notional vs. marginal return

We’ve carefully designed our new backtester to solve each of these issues.

Using inaccurate or unrealistic trade execution prices

Traders often gravitate to the end-of-day closing price of the option as the correct price to use for a backtest. However, we’ve found that these closing prices are not the best representation of the true value of the end-of-day price. Rather, our data shows that 14 minutes before the close is the closest you can get without experiencing deterioration in the quality of the quote. Our backtester uses quotes from 14 minutes before the close data to simulate trades. Slippage assumptions are key for a realistic backtest and ours are based on years of experience. We use a slippage of 75% of bid ask width for single legs all the way to 56% for four leg spreads. Traveling past the mid-price to trade is a reasonable assumption, however, for multi-legged strategies the percent traveled is not as much.

Overfitting the data to create unrealistic returns

When running multiple backtests on the same data set, there are risks of overfitting. An example would be testing 10 different days to expiration for the same symbol and strategy while varying no other parameters, and then choosing the best performing backtest to trade. Instead, you should look at the other similar backtests to see if they also performed well. Backtests with similar inputs but a wide variety of results is a bad sign. That’s why in our new backtester, we’ve added a “find similar” button to help do this automatically:

Accidentally following path dependency

Path dependency refers to how the order and timing of trades can significantly affect a strategy's performance. Ignoring this aspect can lead to misleading results and unrealistic expectations. Our backtester tool takes into account path dependency in a unique way, by only putting on one trade per day, every day it meets the entry criteria. This means you can have 10 trades on at the same time, all entered one day after the previous one. While this isn’t how most people trade in real life, it provides more accurate performance metrics because it eliminates any statistical bias that would occur if you only entered one trade at a time. For example, starting a backtest on the first day of the year might have very different results than starting the backtest a few weeks later.

Misunderstanding notional vs. marginal return

Notional and margin returns are both crucial elements for assessing the profitability and efficiency of a strategy. By understanding how these returns work, investors and traders can make informed decisions when it comes to selecting the strategies that are right for them.

In options trading, the notional value of an option refers to the value of the shares controlled if the option were to either expire or be exercised or assigned. For example, if I buy a $5 call option on a $100 stock, because this option controls 100 shares of the underlying, the notional value of the trade would be 100 * $100 = $10,000. However, I would only need to pay $5 * 100 = $500 to open this trade. Thus, the notional return measure the performance relative to the $10,000, while the margin return measures the performance relative to the $500.

It’s easy to see that notional returns are much lower than margin returns. We use notional returns in the backtester because it helps standardize and normalize performance across all different types of strategies and symbols. We show the margin return to highlight how efficiently the strategy used the capital at its disposal. However, you have to be careful with margin returns, as sometimes they are not the most accurate reflection of a real-world trading environment.

Generally, if an investor is using options and stocks, or if you want to normalize and compare disparate trading strategies, the notional calculation is best. If you want to see how much you would make on the amount at risk from a brokerage firms perspective the margin returns are best to use. Brokerage firm’s margin do not always present the actual risk, so care needs to be used. For example, a short put may require 20% margin in portfolio margined accounts or 100% margin in cash accounts. The margin returns would be significantly different in either approaches.

At ORATS, we show margin returns but have more data on notional returns because they are more conservative and consistent and allow better comparisons between backtests.

Creating a database of 300 million+ backtests

The pitfalls listed above are only a brief selection of some of the most common situations we see traders deal with when backtesting options. ORATS was inspired to create an even easier, faster, and more effective way to backtest that incorporates not only solutions to the above, but changes how you think about backtesting entirely.

Our original custom backtester has been around for over five years. We threw in every feature we could dream of, which made for a highly functional but overly complex product. We wanted to build a new backtester that was ultra-fast, easy to use, and still provided traders with just as much value as the original. While at a team retreat in Florida in early 2023, we had an idea. What if instead of requiring you to input a bunch of parameters and run your own backtest, we did it for you. You would simply query a database of backtests (that already exist) for one that meets your investment objective. This would get you to the same result as before, but without the hassle, time commitment, and inevitable errors that were a part of creating your own backtest.

After crunching the numbers, we determined this would mean running millions - if not billions - of backtests, before you even touched the software. This presented a difficult challenge from an engineering standpoint, because we had to figure out how to efficiently store and access all of this data. After a few months, we were ready to launch with a few million backtests for SPY. Now we have over 100 symbols, 15 strategies, and more than 300 million backtests.

Creating a database of 300 million+ backtests