Learning to Code

Welcome to another edition of “In the Minds of Our Analysts.”

At System2, we foster a culture of encouraging our team to express their thoughts, investigate, pen down, and share their perspectives on various topics. This series provides a space for our analysts to showcase their insights.

All opinions expressed by System2 employees and their guests are solely their own and do not reflect the opinions of System2. This post is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of System2 may maintain positions in the securities discussed in this post.

Today’s post was written by David Cheng


What’s this about?

Are you thinking about learning to code? Are you just starting to code? Have you been writing code for a while? This will provide a high-level guide to what your journey will be like. This guide is generally about Python but can be applied to a lot of other structurally similar languages.

Why Code?

At System2 writing code is an essential part of what we do. We’ll write code to:

  • Scrape - fetch data from websites.

  • Wrangle data - a “-” should be zero, all the times are wrong, dates are a weird format.

  • Calculate - figure out bias and re-weight, separate real growth from seasonality, forecast.

  • Tell a story - determine an appropriate visualization, add English to make it digestible, publish it in an interactive format.

  • Make it live - automate updates to the data, add alerts for significant events.

In any field that uses coding, it’s more often than not that you’ll have to learn more than one language. In data science, people often start with R and then learn Python (or vice versa). Then they’ll have to beef up their SQL skills. A good subset picks up Scala because PySpark is too limiting. Everyone does some scripting in bash to make their lives easier. As a result, a big part of writing code is having to learn new languages and new ways of doing things.

The Journey

When you first start with a new language, inevitably all the code you write will be terrible. You’re just trying to get stuff to work. In each stage, you learn some concepts and then:

  • You learn how to do a bunch of things terribly.

  • You learn when to do those things.

  • You learn to do it in a not terrible way.

Before jumping into the stages and concepts in each one, it’s helpful to explain how to not do things terribly.

Not Doing Things Terribly

Terrible code is:

  • Painful to understand

  • Scary to fix

  • A sin

I’d like people to write “good code” but that implies a false threshold that creates a binary outcome. I find it more helpful to focus on writing “better code”. At System2, the rules to follow in order of priority are:

  • Rule 1: It needs to work and be used Otherwise it should be deleted or kept on a branch. Use git and don’t leave lots of red herrings around. Red herrings confuse and mislead.

  • Rule 2: Maintainable Is it easy to read? Is it easy to test? Does it have “nice” errors? Does it do any more than it needs to? Be considerate of whoever may have to use or fix your code in the future.

  • Rule 3: Performant Is it efficient? Does it use memory well? Are you using the database to do database things or are you trying to recreate one in Python?

You can follow these rules and still make terrible code. The order is important! Making code faster (Rule 3) often makes code harder to follow (Rule 2). Adding flexibility to code to make it more reusable can bring down maintenance (Rule 2) but then we may never make use of the flexibility and we have a bunch of unused stuff (Rule 1). So how do you measure code quality?

With that said and hopefully present in the back of your mind, let’s move onto the journey of learning to code!

The Journey to Hell is Paved with Good Intentions

The following are my generalizations on the stages of learning how to code in Python. These are also applicable to other general purpose programming languages (not SQL, not bash). They’re generally sequential, and I believe having completed one stage definitely makes the subsequent stage easier to complete. In practice, I see people are often forced to tackle multiple stages at once.

At each stage, I try to highlight a common problem I encounter that’s specific to that stage. The problem happens when folks are close to completing the stage. They’re problems that straddle between mastery of the skills and of the stage and knowing when to apply them.

Stage 1: Learning to Code

At first, your code is no different than most scripts. The only reason why you’re coding is to glue together various libraries you used Google to find. Eventually, you’re figuring out language features, like generators, and you’ve gotten to know a couple of libraries well. You’re at the end of this stage when your code has structure, that is, it’s organized in modules, functions have sensible arguments, and you’re minimizing the use of global state or module-level variables.

You know you don’t quite get it when… you have too many functions

  • Symptoms

    • Your code does 3 things but somehow you accomplish it with 6 functions.

    • Your code looks deceptively short and clean but 5m into your code review people have furrowed brows trying to mentally simulate a stack.

    • Your functions don’t do much and are usually used by one other function.

Causes

  • Belief that long code is always bad. You replace blocks of code with functions to make sure no function has more than n lines.

    • Why this is wrong: Your code is deceptively short and painful to read for humans. Your code isn’t really any shorter, you just manage to hide it under a rug that’s under another rug.

    • How to fix: Don’t break it up into a function unless something else will really use it.

  • Belief that any repeated code no matter how small is bad so you replace them all with functions.

    • Why this is wrong: It’s sometimes just not worth it. Especially if you start having unrelated code tied to unrelated due to them both sharing a minor function.

  • Solution

    • Don’t make a function unless it has more than one user.

Stage 2: Using Libraries

At first, you Google and pip install the first library you find. Then your first instinct is to Google then pip install anything remotely challenging you might have to do. At the end of this stage, after getting burned a few times, you start to check to see when the library was last updated, how many stars it has on GitHub, and realize a lot of pip packages are crap.

You know you don’t quite get it when… you start writing functions to do things the library you’re using already does

Symptoms

  • If you’re using a popular library and building your own library of functions around it for things you keep needing to do.

  • You wrap a library with your own code that does nothing

  • Causes

    • You didn’t read the documentation or look at the source. So you write stuff to do things you think the library should do.

      • Why this is wrong: Depending on the project maturity you may have to look at source code which means more work today. Their version of your function will have fewer bugs and more features than yours over time. If you disagree you should push your changes to the project.

  • Solution

    • Make time in your calendar (or someone else’s) to go over your latest pip package.

    • Before writing a function, check the documentation and the source code.

Stage 3: OOPsies

At first, you struggle to see the difference between modules and classes. Aren’t they just a collection of functions and variables? You then make everything into a class before realizing classes should be used as data structures with operators (like pandas) OR for building scaffolding to build specific use cases through inheritance (like scrapy) OR for things that maintain state (like sqlalchemy’s Engine). At the end of this stage, you’re comfortable solving problems using both cases.

You know you don’t quite get it when… you have classes that shouldn’t be classes

  • Symptoms

    • Classes that are never extended.

    • You find yourself instantiating a class to just call a method.

    • Your classes are like puzzles. You have to call the right methods in the right order for it to work.

  • Causes

    • Belief that everything is better in a class, especially functions. You create classes that hold functions thinking that if anyone wants to add a function they can just inherit your class.

      • Why this is wrong: If your functions aren’t using self (Python), putting it in a class isn’t giving you much. Classes are meant to hold state or structures.

    • Belief that a deep hierarchy of classes will help your code make more sense. You group similar classes beneath a common parent (car and bicycle inherit vehicle).

      • Why this is wrong: Modules can also organize “related” things. Unless there’s a serious case of re-use, you usually don’t gain much by having a parent with a “shared” function that both children will inherit. It also makes your code harder to read (oops, didn’t realize the parent had a function) and fragile.

    • Belief that classes are inherently more extensible than modules. If I put my functions in a class, people can add to it by inheriting my class and adding their own method.

      • Why this is wrong: People can import all functions of your module into another module too.

  • Solution

    • Rationalize your class(es) with the following questions. If all the answers are no, you should probably kill it:

      • Is it used for a data structure? What are its benefits over dictionaries and lists?

      • Is your class used as a framework? Is it easier to use than doing something similar in a function library (mix-ins vs inheritance)?

      • Do you have state to keep?

Stage 4: I See Design Patterns Everywhere

At first, you Google up words like “Singleton.” You read a post about design patterns. Maybe you venture to read the Gang of Four book or a post about it. You eventually start using patterns to describe code and realize object-oriented programming was just sort of the beginning. At the end of this stage, you’ve implemented and solved problems using the top 10 used design patterns.

You know you don’t quite get it when… your code is so full of design patterns that most of it doesn’t actually do anything

  • Symptoms

    • Your conversations about code are densely populated with design pattern vocabulary, similar to Crossfit enthusiasts talking about their workouts.

    • You have more classes than you need.

    • There’s lots of code or scaffolding to support a future that never comes.

  • Causes

    • Belief that by using enough design patterns, all future changes can be isolated to a thin slice of code instead of rippling through.

    • By designing-in flexibility, the code will live forever and thereby making it easier to maintain.

    • Belief that more design patterns are always better.

  • Solution

    • This should be a common theme at this point (and like your parents probably told you): “Everything in moderation.” Use design patterns sparingly when they solve a problem at hand.

Stage 5: Everything Services

At first, you’re getting familiar with REST or GraphQL and building endpoints. You then delve into the messy world of supporting authentication and authorization for the endpoints. Because micro-services is still a thing, you get a lot deeper into containerization. At the end of this stage, you can make anything into a deployable endpoint.

You know you don’t quite get it when… most of your endpoints are used by your other endpoints

Symptoms

  • Every use case requires so many endpoints, you start building endpoints that consolidate other endpoints.

  • You suspect your cloud bill is a lot bigger than it should be.

  • People on your team steal credentials to reach “around” your endpoints and hit your database directly.

  • Causes

    • Belief that endpoints are good for everything.

      • Why this is wrong: Endpoints are slow. Like everything in every earlier stage, do you really need it? Will it really be widely used?

    • Belief that endpoints are no different from providing a function and hence cost-free

      • Why this is wrong: Code auto-complete rarely works on endpoints. Someone is eventually going to provide a Python library to make calling endpoints easier so now you’ve doubled the code base and halved the speed (at least).

    • Belief that distributed micro-services will be more robust

      • Why this is wrong: When a critical service fails, everything will still fail but the stack traces will be harder to follow

  • Solution

    • Endpoint calls are expensive. Don’t do them unless there’s a use case for it

    • Make sure you have the team size and the actual need to break your system up into micro-services and only do so for things that will truly be shared.

Get Ready to Start Over

Will there ever be one coding language good for everything? Will there ever be a multi-function tool to replace all tools? Probably not. So you’ll have to learn another language at some point.

Will the same stages exist? The four earlier stages are pretty applicable to other general-purpose languages but with fairly nuanced differences. Python classes are not like Java classes. Functional programming languages don’t really go down the object-oriented route and have a different language of design patterns.

Wrapping it up

Every language has different levels of structures:

  • Stage 1 was about mastering what is given in the language (loops, generators, modules, functions).

  • Stage 2 was about leveraging other libraries

  • Stage 3 was about object-oriented programming, which is a big part of a lot of programming languages. Unfortunately, Python is missing important chunks (IMHO like function overloading and interfaces).

  • Stage 4 was about what you do with object-oriented programming.

  • Stage 5 was about building parts of systems by building services.

Going through all those stages while trying to make your code good (Rules 1-3) is challenging. Have fun!

matei zatreanu