Can’t Git No Satisfaction: Why We Need a New-Gen Source Control
Remember the good old days of enterprise software? When everything had to be installed on-premises? To install an application, you’d have to set up a big, vertically scalable server. You would then have to execute a single process written in C/C++, Java or .NET. Well, as you know, those days are long gone.
Everything has changed with the transition to the cloud and SaaS. Today, instead of comprising a single vertically scalable process, most applications comprise multiple horizontally scalable processes. This model was first pioneered by Google’s borg and by Netflix on EC2. Nowadays, though, you no longer have to be a large enterprise to access microservice infrastructures. Kubernetes and serverless have made microservices viable and accessible to even small startups and lone coders.
Let’s Git down to business
So where does Git fit into the picture? Git is an excellent match for single-process applications, but it starts to fail when it comes to multi-process applications. This is precisely what gave birth to the endless “mono-repo vs. multi-repo” flame-wars.
Each side of this debate classifies the other as zealous extremists (as only developers can!), but both of them miss the crux of the matter: Git and its accompanying ecosystem are not yet fit for the task of developing modern cloud-native applications.
Shots fired: multi-repos suck
Before we dive in, let’s answer this: what’s great about Git? It’s the almighty atomic commit, the groundbreaking (at the time) branching capabilities, and the ever-useful blame. Well, these beloved features all but disappear in a multi-repo setup. Working in multiple repositories comes with significant drawbacks, which is why it’s not at all surprising that some of the biggest names in the tech world, including Google and Facebook, have gone down the mono-repo path at a huge investment of time and resources.
Dependency management in a multi-repo setup is a nightmare. Instead of having everything in a single repository, you end up with repositories pointing to each other using two git features (git submodules and git subtree) and language-specific dependency management such as npm or Maven. The very existence of the many different methods to manage multi-repos is in itself proof that none of these tools are enough on their own. Git’s “source-of-truth” is no longer a single folder on your computer but a mishmash of source providers and various artifactories.
In developers’ everyday work, repository separation becomes an artificial barrier that impacts technological decisions. This creates a Conway’s Law effect, making early design decisions about component boundaries very hard to change. It also makes large scale refactorings a much trickier business.
However, the biggest failure of the multi-repo is cultural. Instead of having all your source code readily available to all developers, they have to jump hurdles to figure out which repo they need and then clone it. These seemingly-small obstacles often become high fences: developers stop reading and updating code in components and repositories that aren’t directly in their responsibility.
With all these engineering, operations and cultural barriers, why doesn’t everyone go the mono-repo route?
Take no prisoners: mono-repos suck too
Once you’ve packed everything into a single repository, figuring out the connections within the repository becomes a challenge. For humans, that can chip away at the original architecture, breaking away useful abstractions and jumbling everything together.
For machines, this lack of separation within the repo is even worse. When you push a code change to a repo, automated processes kick in. CI systems build and test the code, and then CD systems deploy it. Sometimes it’s to a test or staging environment, and sometimes directly to production.
There are certain components you will need to build and deploy hundreds of times a day. At the same time, there are other more delicate and mission-critical components. These require human supervision and extra precaution. The problem with mono-repository is that it mixes all of these components into one. More surprising is the fact that today’s vast Git CI ecosystem, with its impressive offerings in both the hosted and the SaaS space, doesn’t even try to tackle the issue. In fact, not only will Git CI tools rebuild and redeploy your entire repo, they are often built explicitly for multi-repo projects.
Another issue is large repository sizes. Git doesn’t handle large repos gracefully. You can easily end up with repo sizes that don’t fit in your hard-drive, or clone time that ends up in the hours. For big projects, this requires careful management and pruning of commit history. It is also essential to avoid committing dependencies, auto-generated files and other large files which may be necessary for specific scenarios.
Is there still hope for multi-repos?
There are new tools that seek to bring some of the benefits of mono-repos to multi-repos. These tools try to set up a configuration that would unite multiple repos under a single umbrella/abstraction layer, thus making managing multiple-repositories easier — for example, TwoSigma’s Git-meta, mateodelnorte’s meta, gitslave , and a bunch of others.
These tools bring back a bit of sanity into the complexities of managing multi-repos, reducing some of the toil and error-prone manual operations. But none of them truly give back the control and power of a single Git repo.
You can’t have your cake and Git it too
The downsides of multi-repos are real. You can’t deny the value of a (truly) single source of truth, (truly) atomic commits, and a (truly) single place to develop and collaborate. On the other hand, none of the downsides of mono-repos are inherent. All of them are related to the current implementation of the Git source control tool itself and its accompanying eco-system, especially CI/CD tools.
It’s time for a new generation of source control that wasn’t purely designed for open-source projects, C and the Linux kernel. A source control designed for delivering modern applications in a polyglot cloud-native world. One that embraces code dependencies and helps the engineering team define and manage them, rather than scaring them away. A source control that treats CI, CD, and releases as first-class citizens, rather than relying on the very useful add-ons provided by GitHub and its community.