Overcoming the Challenge of Extracting Financials from SEC Filings

Welcome to another edition of “In the Minds of Our Analysts.”

At System2, we foster a culture of encouraging our team to express their thoughts, investigate, pen down, and share their perspectives on various topics. This series provides a space for our analysts to expose their insights.

All opinions expressed by System2 employees and their guests are solely their own and do not reflect the opinions of System2. This post is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of System2 may maintain positions in the securities discussed in this post.

Today’s post was written by Aquiba Benarroch.


Analysts are responsible for analyzing vast amounts of financial data to identify investment opportunities and make informed decisions. But before they can get to the analysis part, they have to clean the financial statement data. It's a critical step to ensure the analysis is accurate and reliable. Professional investors can access clean (and standardized) financials from S&P CapitalIQ and Bloomberg, but retail investors have few choices other than manually collecting the information themselves.

The accuracy of investment analysis depends heavily on the quality of the data. Financial statements can be messy, prone to restatements and inconsistent accounting labels. Analysts risk making flawed investment decisions based on inaccurate information if they don’t take the time to clean the data.

The Potential Solution: XBRL

Here's where the SEC comes into play. They've made financial statement data more accessible thanks to something called eXtensible Business Reporting Language (XBRL). By tagging financial statements with standardized labels, they've created a way to extract financial data from their filings programmatically. Sounds great, but the documentation could use some work. On top of that, companies frequently make restatements of past financials, which makes it harder to compare numbers apples-to-apples.

Other Solutions

Companies like AlphaSense (and its product Sentieo) or Canalyst (from Tegus) are candidates to fill this need. They collect and analyze thousands of filings from companies worldwide and have a rich history of documents with historical financial statements. Still, for the retail investor, their APIs are challenging because they require some engineering to get the desired outcome. Also, their costs may put them out of reach for the little guy trading at home.

Other products, such as SEC-API, aim to simplify extracting financial statement data. They’re cheap, like free for the first 100 API calls or unlimited for $55/mo kind of cheap. The product is more like a wrapper on top of the SEC-API, which may already be an improvement. But, they've yet to bridge the gap from organizing data to making it easily digestible for the end-users, especially retail investors.

Given that SEC-API is the most cost-effective and promising, let's go through some of the roadblocks we need to clear before getting the historical financials just the way we want.

Navigating the SEC-API documentation

There is a section in SEC-API's documentation titled Access financial statements in EDGAR filings”. Looks promising. But to retrieve the financial statement data, we first need to either:

  1. Build functions around the XBRL converter API

  2. Download each quarterly financial statement in Excel (which defeats the purpose of using the API)

  3. Download the SEC’s HTML file containing the financials (which also defeats the purpose of the API).

So let’s go with number 1, the XBRL converter API.

The good news is that we can extract all the financial statement items. The bad news is that we need to do some data engineering. First, we need links to the SEC files containing the financial statements we want for every year and quarter. These links could be the HTML, XBRL URLs, or filing accession numbers.

Once we've got all the links for the company and called the XBRL converter function, we get a response in JSON format.

Then, we need to translate the JSON to a pandas data frame, which may be time-consuming depending on the structure of the JSON. We do the same for all the filing links and hope that all periods are captured, and that the names are consistent across all financials (it is unclear if the financial statement names are standardized at this point).

Finally, we need to do this for every single company. Luckily, our XBRL converter function would already work; we only need to collect the SEC links. Or figure out how to programmatically find every company filing link or accession numbers. But that’s a topic for another day.

Conclusion

In a nutshell, pulling clean financial statement data from SEC filings is still a tough nut to crack for investment analysts, and it's even more challenging for retail investors. XBRL has made some headway in standardizing financial statement data, but it's still far from usable. Tools like Sentieo and Canalyst are helpful but require data engineering skills that the ordinary retail investor might not have. Other products, like SEC API, are promising but haven't quite hit the mark.

matei zatreanu