Monorepo Madness

Our Journey from Monolith to Multirepo to Monorepo

September 12, 2019

Over the past few years at Aloompa, we have experimented with a number of different architectures, code management strategies and deployment systems. When it comes to building large flexible systems, there is never a single silver bullet, but rather a series of trade-offs between warring pros and cons.

Today, I wanted to explore three different approaches we have taken for managing our repositories. I hope to illuminate some of the pros and cons of each.

The Monolith

Monolith

When I first joined the Aloompa about five years ago, we were maintaining a small handful of monolithic codebases. If you aren’t familiar with the term, a monolith is basically one single codebase that contains all of your code across concerns.

Here are a few pros and cons we experienced while using this approach:

Easy to maintain

This architecture is really simple to maintain because everything is in one place. You don’t have to worry about managing sub-packages or versions because everything is versioned together.

Very Interconnected

Often, an entire monolith will be running on a single server or cluster of servers. This makes deployment and server management simpler, but it also means that when something goes wrong, everything breaks. We saw this once when a package we were only using for a small component got completely removed from the npm registry. This resulted in all of our servers deploying in an invalid state. We had services that technically had nothing to do with each other all breaking at once simply because they all deployed together.

Huge codebase

As your codebase grows and matures, it becomes large. Large codebases are by nature more difficult to maintain and manage because of the high level of cognitive overhead it takes to reason about how all of the pieces are interconnected. In addition to the developer experience, deployments and CI tend to take longer as there is more and more code to deploy and test.

Multirepo

Multirepo

After experiencing a huge amount of pain from the interconnectedness of our monoliths, we migrated most of our services to a multirepo architecture. Multirepos are about as much of a departure as you could possibly make from a monolith. With multirepo architecture, spinning up new repositories became second nature. Every new React app, API or SDK was an opportunity to start fresh with a whole new repo. Using private NPM for our package management, we were able to merge all of the small pieces of our app together to create the whole organism.

It was a beautiful type of architectural anarchy, fraught with its own list of pros and cons:

Easy to innovate

When new features often meant entirely new repositories, we could choose to build each next feature with the latest greatest toolset. We were never locked in to any one language or framework.

Difficult to upgrade everything

With so many disparate services, maintenance became a nightmare. While it wasn’t necessarily difficult to upgrade the React version of a single package, making the time to upgrade all of them was inconceivable. It forced us to focus on only having the highest level maintenance for the packages we were most frequently doing bug fixes and feature requests on.

Being picky about what to upgrade could be either a pro or a con depending on your perspective.

Difficult to manage code reviews and pull requests

During the height of our multirepo architecture, it was not at all uncommon for a pull request to have a comment like “make sure you merge in the change in the shared components library before you merge this in.” That of course was a huge indication that the process was broken.

When it is possible to deploy a set of repositories in the wrong order, resulting in an unexpected state, there is definitely a problem.

Difficult to manage the development environments

When we had multiple packages working together to create a single new feature, we would use npm link to link the packages together. But that sometimes resulted in having multiple repositories running in our local development environment in wildly different branch states.

Furthermore, there was no true guarantee that the local dev environment was the same as what would be deployed.

The Monorepo

Monorepo

Early this year, we migrated most of our primary repositories to a single monorepo.

In a way, a monorepo takes the best ideas from the rigid oneness of the monolith and the fresh-faced unrule of the multirepo.

The essence of a monorepo is one single repository containing a series of very componentized packages.

As always, there are good and bad things about taking a monorepo approach:

Better Deployments and Continuous Integration

We quickly found that managing a monorepo introduced it’s own challenges. To help ease the pain, we chose to use a tool called Lerna, named for the multi-headed Hydra defeated by Hercules in Greek mythology. It turned out to be an excellent utility to manage our packages within our single repository.

It provided some basic tooling for running scripts in only each package that had changed, which significantly sped up our time on the CI server.

Great Local Development Experience

Using lerna bootstrap to install our dependencies at the top-level automatically linked all of our required sibling dependencies in the monorepo, which let us have our local dev environment in the correct state at all times. No more worrying about getting all of the branches in sync for local development.

Difficult to Get the CI Just Right

We had a vision of being able to push a change to a single package and only have our tests and deployments run for that single package instead of the entire unchanged repository. This ended up being more difficult to get perfect than we initially thought.

Currently, we have three deployment stages: develop, staging and production. We wanted to run our tests against any change that occurred on any branch we pushed, but only run deployments against our develop, master and production branches. In order to achieve this, we made heavy use of Lerna’s ability to pass in a -- since parameter, which filters which packages are called based on filtering. We ended up appending something like this to our four types of branches:

feature/branch: --since origin/develop
develop: --since origin/master
master: --since origin/production
production: --since `echo $(git describe --tags) | sed 's/production.*/production/'`

None of it was particularly hard to figure out, aside from how we could use git tags to determine which packages had already been deployed to production.

We made a conscious decision that every package would have a package.json with a build, test and release script. For every package, we would run the three scripts in that order. Outside of the scripts being run in order, every package was free to do whatever it needed because there was enough structure around the CI process to do literally anything.

Takeaways

The biggest takeaway is that there is no perfect architecture. Our team has definitely loved the organization of our monorepo, but there was also a period of growing pains as we got it set up. Not all of our code is in a monorepo. Some of our features are still in the wild, living their best life as multirepos. We even still have monolithic architecture running some of our most important systems.

At the end of the day, that’s okay.

For a company that is growing and innovating as fast as Aloompa, we don’t have the time to take months out of our roadmap to migrate everything a single architecture. What it all comes down to is what adds the most value.

Tyson Cadenhead

Tyson is the Chief Technology Officer at Aloompa.

He has a passion for Functional Programming, GraphQL, the Serverless architecture and React.

When he's not writing code or working with his team, Tyson enjoys playing guitar, growing vegetables and spending time with his family.

Tyson primarily works remotely to help support the needs of his oldest son who has level 3 autism.