Quantitative Investing and Machine Learning

Welcome to another edition of “In the Minds of Our Analysts.”

At System2, we foster a culture of encouraging our team to express their thoughts, investigate, pen down, and share their perspectives on various topics. This series provides a space for our analysts to expose their insights.

All opinions expressed by System2 employees and their guests are solely their own and do not reflect the opinions of System2. This post is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of System2 may maintain positions in the securities discussed in this post.

Today’s post was written by Seth Leonard.

Garbage Can Recycling Bin Basketball

Every office has a certain amount of banter, i.e., whether so-and-so can make the three-point shot with a crumpled-up piece of paper and that sort of thing, and System2 is no different. Since most staffers here are both data scientists and developers, you won’t be surprised to learn our banter tends towards nerdy stuff. Here’s a tidbit a coworker shared that I particularly liked:

With the rise of AI for the masses in the form of ChatGPT, Stable Diffusion, and Soundraw, a popular question is why can’t I use this stuff to make lots of money on the stock market?

In Theory, Theory and Practice are the Same

In theory, the price of an equity ought to equal the expected discounted flow of future dividends. In mathematical terms (assuming a fixed discount rate), this means

The discount ß comes from the fact that we have a preference for money (consumption) today over money tomorrow, and we measure that preference by the interest rate. That is, how much are we willing to pay to move tomorrow’s consumption forward in time? How much do we need to be compensated to put off consumption (save money) until a later date?

We can break this asset price into two more meaningful chunks:

Today’s price simply depends on the dividend, and what the expected price will be tomorrow. Because we know today’s dividend, when it comes to assessing whether an asset is correctly priced, all the action is in the price tomorrow (and the day after that, and the day after that…). No kidding. As an aside, if this was true then why would anyone pay money for a stock like Zillow? Well, it turns out that Et(Pt+1) incorporates a lot more than future dividends. It’s more the “he thinks that she thinks that they think…”

What that means, is that if future price movements are expected, then they are already incorporated into today’s price. If you want to buy an undervalued stock, you need to have expectations of future prices that are not shared by the market (and you need to be correct). Because it’s easy to spin up an ML model in R or python these days and run it on historical price movements, doing so won’t help you find pricing information that is new news to the market. That’s a longish way to answer the above question.

Quantitative Investing

For the above reasons, traditional quantitative investing (if there is such a thing) has focused not on smarter models, but on better information. The recipe is simple; it’s usually some variation of:

  1. Pay for superior data

  2. Take the assets with the strongest positive signal and go long

  3. Take the assets with the strongest negative signal and go short

In this case, superior data refers to the sort of vendors we worked with at the recent Eagle Alpha conference in New York. Not much ML in there, though you might find it on the marketing brochure.

ML, Statistics, and Quantitative Modeling

If you want to make a model, don’t throw out statistics. It’s been around for a while; what is perhaps the most powerful theorem in stats was published in 1763. That doesn’t mean it’s outdated. Statistics (as compared to ML) is particularly powerful when we don’t have gigabytes of training data. A brief aside on the distinction between the two:

Statistics:

Fitting distributions to observed data to make inferences about variables. For example:

is the solution to

where

Machine Learning:

source: xkcd

So how to incorporate ML into quantitative strategies? The ultimate predict-anything-HAL-9000-AI bot is still out there, and when it’s invented markets will become perfectly stable (unless, of course, we train it to be human). Though System2’s focus as a firm is using alt data for fundamental investing, we are really, really familiar with many interesting datasets. So ML strategies for quantitative investing have become something of a hobby of mine. Here are a few insights:

  • Don’t try to predict raw price movements. It depends on the stock, but something like 50% of price volatility comes from the aggregate market, not anything specific to the stock itself. Have a look here. It turns out basic, easy-to-access factors do a pretty good job of this (we use ETFs as market benchmarks). Once we strip out market factors (by OLS), what we want to estimate is the residual.

Correctly identifying stock volatility needs data that, if not widely utilized, is at least underutilized. To us, that means alt data: Brain, NewMark Risk, and EPFR are a few of the many we look at.

  • Long top half, short bottom half is tried and, well, maybe not quite true, but commonly used. However, what is really interesting (and much harder) is correctly identifying movements for a single stock.

  • Try different models. To the extent that their errors are uncorrelated, pooling the results will improve out-of-sample performance.

  • For an interesting long-short exercise, try estimating the difference in returns between two similar firms (e.g., Home Depot vs Lowe’s). This is the best way to strip out non-firm-specific factors and is the area where System2 has seen some of the most promising results.

  • Stretch your time horizon. It’s easier to find excess returns at short horizons because it’s harder to actually implement those strategies in practice (due to trading costs and the speed at which trades can be executed). A longer horizon will make your strategy more useful.

Happy geeking. You might not find $100 bills on the sidewalk, but there’s still a lot of underutilized data out there which can give you an edge.

matei zatreanu