Backtesting Models at Scale
Welcome to another edition of “In the Minds of Our Analysts.”
At System2, we foster a culture of encouraging our team to express their thoughts, investigate, pen down, and share their perspectives on various topics. This series provides a space for our analysts to expose their insights.
All opinions expressed by System2 employees and their guests are solely their own and do not reflect the opinions of System2. This post is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of System2 may maintain positions in the securities discussed in this post.
Today’s post was written by Seth Leonard.
This might get nerdy, so get your blue-light glasses out.
In an earlier post, I wrote about how, despite the impressive results of machine learning models like ChatGPT, ML still has a hard time delivering good stock picks. It’s a Catch 22 really: if an accessible ML model was able to pick stocks effectively, everyone would use it, and its recommendations would already be priced in, eliminating any alpha. But what if you come up with something really good that no one else has? What do you do?
Every statistical and ML model is prone to overfitting. The simplest way to judge out-of-sample performance is to break the data into a training and evaluation set and only fit the model on the former. But when using a time series model that depends on all observations up to the current date, that’s not so simple. You’ll have to backtest sequentially, which can be computationally intensive. System2 recently tested our inflation nowcasts using just such an approach, a process that took more than a day to run using 48 cores. Since we didn’t want to tie up our own machines, we ran it on AWS. That sounds simple enough, but the devil is in the details. In case anyone else out there is looking to do something similar, I’ll describe some of those details here. This post primarily covers building your Dockerfile and getting it to AWS ECR, because running jobs on Batch is pretty well documented elsewhere.
The Task
The Tools
The Process
Let's assume you have the routines you want to backtest written out already in Python, R, or any other language. With that ready to go, the first step is building your docker container. The idea here is that the container should have everything you need, but nothing you don’t, to keep it as lightweight as possible. Because we’re using R and Tidyverse, we’ll begin with an R-specific base image, then add all the dependencies needed for the other packages we use (i.e., Tidyverse).
The Dockerfile
Our Dockerfile is a plain text file (no extension, just titled Dockerfile) that has instructions for what we want in our container. A few notes:
Your build environment is everything in the folder that contains your Dockerfile. If you have scripts or credentials you want in your container, put them here. But don’t put anything you don’t need in this folder, or your build environment will be huge.
Put the most basic stuff you’re not likely to change first. If you need to rebuild, Docker can use a cached version up until the line you change.
We begin with our base image and some basic utilities, including git:
Since none of this is done interactively, the -y tells the installation to respond yes to questions and do the whole process; the q is optional and tells the install to run quietly. Next, we add AWS command line tools:
The last line just prints the AWS CLI version to make sure things installed. Next are all the Tidyverse, Rcpp, and other dependencies. I figured this out by reading through the errors and adding them one at a time. If you’re using Tidyverse in Docker, I highly recommend cutting and pasting.
Next, install all the R packages needed. We run this from an R file in our build environment. But if you’re having trouble, it may help to install one at a time using RUN Rscript -e "install.packages(R.utils)" (for example).
Finally, we’ll also copy over everything else in our build environment (be careful it’s not full of junk!); this includes our actual backtesting script.
the -t mycontainer is a tag naming it mycontainer for easy reference. You’ll then need to log in to AWS ECR. In my case, the command is:
--profile is only necessary if you’re not using your default AWS profile. region is in the format us-east-2 (or whatever), and account number is your AWS account number. You can then tag your container, and push it to AWS ECR, though you’ll have to make sure you have a repository set up on AWS ECR first. In this case, it’s called backtest.
Phew. From here you can set up a job on Batch to run your container. As mentioned, you’ll need to make sure your results get saved to S3; otherwise, they’ll be lost when the job finishes. In our case, that happens within the R script:
This requires that your docker container have the necessary permissions to access your S3 account. There are multiple ways to accomplish this. One of the best is simply to ensure that the IAM assigned to your batch job has the necessary permissions — in this case the ability to write to S3.
What’s neat about Batch is that you can schedule things to run whenever you want. So if you have a data file that updates daily, you can schedule it for a once-a-day run, and not pay to have an EC2 instance running all the time. Clever.