MEFA (Make Earth Flat Again!)

Make Earth Flat Again

This is about all the times I wished the Earth was flat. While working at System2, I can’t help but to think about how much easier chunks of my job would be if the Earth was actually flat. Don’t worry, I’m a Glober (that is, what someone who believes the world is flat would call me). When working on geospatial data, code, and analysis, it’s impossible to be a Flerfer (Flat Earth believer). I’m sure the same applies to airplane pilots.

Life must have been simpler when everyone thought the Earth was flat. Unfortunately back then, data science didn’t exist as a profession. But assuming we could achieve a flat Earth, how would things be better for data science (or anyone)?

What Kind of Flat Earth?

First of all, there are a lot of ideas on how the Earth is flat in the vibrant and growing Flat Earth community. One of my favorite topics is how gravity might work. My favorite idea is that the Earth is constantly accelerating upward however impractical that may seem to me.

To borrow some ideas from the Flerfers, my flat Earth optimized for data science would have the North Pole at the center of a circle and the South pole stretched along the circumference. Gravity would come from the rest of the Earth “underneath” as a cone so my nicely terraformed planet wouldn’t immediately collapse (I am not a geologist). My flat Earth would still have a tilted axis (I like seasons), and a 24-hour rotation (changing time would be scope creep). While we’re on this journey of impractical solutions to practical problems, I’d still want my flat Earth to orbit the sun but slow it to 365 days a year to get rid of the leap years (another thing I hate in code).

Generated by Bing Image Creator using prompt - “flat earth as a disc in space in vertical orientation with ice on the edges and the north pole as the center”

The image above is the best version I could get from Dall-E 3. The center, unfortunately, is not the north pole and I couldn’t get a prompt to generate the mass of earth as a cone underneath. What’s the point? How is any of this better?

No More Time zones

With a flat earth, the sun would break the horizon at the same time for everyone. We wouldn’t need time zones. At System2 it can be tough taking calls from Singapore and Hong Kong (we’re based in NYC). Coordinating with team members in other time zones and figuring core hours for company-wide meetings is always tricky. Client coverage can also be tough because requests often come at the end of the day. Imagine if everyone had the same hours! No more jet lag when traveling!

From a code and data perspective, time zones and daylight savings time (DST) have been the bane of developers. Here’s a fantastic list of 30+ misconceptions people learn over their careers working with time zones: https://www.zainrizvi.io/blog/falsehoods-programmers-believe-about-time-zones/. Some highlights:

There are more time zones than there are countries in the world

  • Some time zone offsets are 00:30 or 00:45.

  • In the West Bank of Israel, the time is dependent on whether you’re an Israeli or Palestinian settler

As an example, here’s what the time zones look like in the US. Notice how often it doesn’t fall on state lines? Imagine trying to make a watch that always shows the right local time no matter where you are. You’d probably need GPS and a whole lot of code.

Unfortunately, there isn’t any fantastic library to deal with time zones. Time zone support is often part of the programming language or the underlying operating system. Windows internally operates off of local time. MacOS/Linux internally operate off of UTC time. Some of these fundamental differences often affect file exports and headaches for data science work.

In practice for code and data, it’s best to store local time with UTC offset, but sadly everyone didn’t get the memo. Working with local time only is the worst. Without information on location, you can’t even convert back to UTC which is necessary for sorting or comparisons. Working with UTC time only is better because you can compare but not great for answering human questions like, “Where is everyone at 3 AM?”

Easier Distance Calculations

The Earth isn’t a perfect sphere and the shortest distance between two points on a sphere-like object is not a straight line on a map. Ever notice how a lot of maps show curved lines? The shortest route from NYC to Bangkok goes over the top of the Earth (purple) instead of the red line through Africa.

Google maps courtesy of https://academo.org/demos/geodesics/

In practice, most locations are given as longitude and latitude, which come in polar units of degrees. To find the distance between the two points, you can’t just take the change in longitude and latitude and apply the Pythagorean theorem (a^2 + b^2 = c^2) because the distance a degree covers varies by where you are on the planet.

At System2, most of our code is in Python, and we use the geopy library which provides us geodesic (Earth is an ellipsoid) distance calculations. Yay to open source code and re-use!

If the surface of the Earth was flat, we wouldn’t even use polar coordinates. We’d use rectangular coordinates where (0, 0) would be the North pole. The x-axis could split the former East and West hemispheres. The y-axis could split the US into East and West (let’s make maps US centric!). See my improvements below.

With a new coordinate system that everyone is familiar with at an elementary school level, calculating distances would be easy. Pythagoras and Euclid could be great again! School kids wouldn’t have to worry as much about learning polar vs rectangular coordinates! Distances between any two points could be figured out with a ruler and a scale! Calculating distances for large datasets would drastically speed up!

No More Map Projections

Because the world isn’t flat, trying to create a 2D representation is requires a projection. Every projection has drawbacks. Most people are familiar with the Mercator projection (prioritize shapes and angles). Online maps tend to use a Equirectangular (aka Platte Carree) projection (lat/lon correspond to x/y).

Outside of data science, map projections mess with sense of scale, shapes, and distances. In a Mercator projection, Greenland is about the size of Africa. In the Platte Carree projection, Greenland is 1/3 of the size of Africa. In reality, Greenland has 1/14th the land mass of Africa. The equator is usually below the middle and the UK is in the middle. Maybe if Australia had a more popular projection, we’d end up seeing the world like this:

In data science, we have to pick a projection to use. The projection system is used to translate latitude and longitude coordinates to x/y coordinates. Projections can do more than just make a ellipsoid flat. A projection like Albers USA is useful because it translates lat/lon around Hawaii and Alaska to the left of Texas and corrects the land mass distortion.

Projection systems usually require quite a bit of complicated math. Because Python is slow and it’s not fun to re-invent complicated wheels, at System2 we rely heavily on the PROJ library that is part of OSGeo. Using the wrong projection system when plotting yields weird problems such as having everything off by a few feet or miles.

If we could make the Earth flat, our 2-dimensional maps could accurately convey both distances and shapes. Projection systems would not need to involve trigonometry. We’d probably keep some around like Albers USA to show all the US territories together but projection systems wouldn’t distort shapes, angles, or distances. Geospatial work would become more accessible to more people.

Improved Geocoding

When working with geospatial analysis, it’s incredibly helpful to work with standard units of space independent of administrative boundaries (states, counties, census blocks). Some geospatial schemes use square tiles, some use hexagons, and some use rectangles. As with projection systems, an ellipsoid Earth complicates tilings.

At System2 we’ve used the public domain Geohash for a lot of analyses. Geohash is nice because it’s hierarchical, indexes nicely for a database, is easy to understand, and aligns with latitudes. You divide the world into 32 tiles labeled by a letter or numeral. You then repeat by subdividing that tile into 32 tiles and repeat until you have your desired precision. System2’s offices are in dr5ru. Here’s a picture of dr5ru from this interactive tool.

As you zoom out or move closer to the poles, the area represented by each tile start varying wildly. The distance from the center of one tile to the next can also vary considerably. This is all because the world isn’t flat.

At System2, we’ve also used H3 from Uber which solves some of those issues using hexagons and pentagons but it’s more complicated. The hierarchy is not as clean (you can’t nicely fit 7 hexagons inside a big hexagon).

There are a variety of geocoding / geospatial schemes out there to explore. Each with its own strengths and weaknesses depending on what you’re doing so there aren’t any “strong standards”. Google also created its own scheme called S2. Here’s an easy overview of popular schemes.

While we’re thinking of flattening the Earth’s surface to make data science easier, maybe it’s worth considering making it a shape that’s friendlier to tiling? A square would be nice. In the meantime geocoding tiles is an important part of the toolkit of working with geospatial data. It provides nice visualizations, but more importantly makes processing large amounts of geospatial data efficient by minimizing a lot of the math.

Shameless Plug

For now, terraforming our planet to make data science easier is impractical. In case it wasn’t sufficiently clear, I’m a glober (the world is globe) and not a flerfer (flat Earther). This article is meant to highlight some of the challenges of working with data that a lot of people don’t think about as a result of the Earth being round. If you enjoy working with time zones and doing geospatial stuff I’d encourage you to apply to System2 via our website. If you have money that you’d like to throw at a problem in the real world and want to hire an experienced data team, System2 has worked on complicated geospatial analysis in cases that dealt with fires and insurance claims data, how forecasting weather impacts companies, and telecom.


Disclaimer: All opinions expressed by System2 employees and their guests are solely their own and do not reflect the opinions of System2. This post is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of System2 may maintain positions in the securities discussed in this post.

matei zatreanu