Debugging in Production: How To Stop Fearing The Inevitable
You’ve been staring at your screen for hours, trying to check why a certain bug is occurring only for end-users in your production environment. You’ve tried a multitude of approaches: adding log lines in all kinds of indicative places, logging potentially relevant variables to get an indication of the state in which the bug occurs, and the list just goes on. Sounds familiar?
So what happens when you’re stuck? Do you reach for that last holy grail of debugging in production and hope for the best? This type of debugging is what most developers hope to never need to do, yet often have to. The ability to debug in production is a valuable tool as it eradicates the necessity of reproducing issues and rather allows the developer to find the issue and fix it while the system is still running.
“Each new user of a new system uncovers a new class of bugs.” – Brian W. Kernighan
Debugging modern infrastructures and its challenges
Many common debugging methods are quite difficult, can be complex, and are often inefficient and not cost-effective. While debugging, it is often time-consuming to try and understand what exactly is happening and how to get the necessary data to fix it.
The rise of modern infrastructures, such as serverless and microservices, took away the visibility that used to exist into software. Visibility became limited at scale, and slower response times arose when it came to understanding and debugging production environments. When comparing these modern infrastructures with the older monoliths, some key differences between them are apparent. The main one is the fact that monoliths are simpler structures.
Debugging approaches: Modern infrastructures vs. monoliths
When comparing debugging monolithic infrastructures with debugging microservices, there is a clear difference. Debugging monolithic applications is easier than debugging microservices, due to the fact that monoliths are a single system with a single codebase. Most significantly, due to it being a single entity, logging is kept to a single location instead of being divided up as is done with microservices. Microservices, in comparison, are multiple independent and maintainable services, which due to their quantity, make debugging more complicated.
Serverless architectures are similar to microservices, yet are even more complex, which complicates even further. Serverless applications are broken down into smaller pieces than microservices and are also fully managed by the cloud provider. Tracing is a crucial component of debugging serverless applications, as it aids in analyzing the whole picture. Yet as the serverless architectures become more complex, tracing can be difficult to do, often resulting in a waste of time and resources.
Debugging in production – the plot thickens
Production debugging is even more challenging than traditional debugging. So why do we even need to debug in production? Debugging in production allows one to find the issue and fix it while the program is still running. Production debugging also eliminates the need to reproduce issues. Reproducing problems is similar to looking for a needle in a haystack- sometimes it works, often it doesn’t. Eradicating the need to attempt reproducing problems is significant as it can save precious time and resources.
Debugging in production does carry certain dangers. There are three main dangers involved in debugging apps in a production environment. The first danger is the fact that production debugging can disrupt the current users of the running application. Another danger lies in slowing down the performance of the application or even crashing the app altogether, which would also greatly disrupt users. Last but not least, it might necessitate restarting the application in order to debug, which can also involve the first two dangers – since the app would need to be stopped and be unavailable to current users.
Debugging techniques: how to debug in production
So how do you dip your toes in the deep waters of production debugging? There are a few ways you can go about it, some more code-friendly than others. One way is to simply ignore the issue until you can’t possibly ignore it any longer. Another way is to test absolutely everything you possibly can, in the hopes that you’ll leave no room for doubt when something goes wrong. A third path you could take would be to write as many logs as possible, as well as monitor everything as much as possible.
Of course, you could also, as Mark Zuckerberg so aptly put it, “move fast and break things”. You increase the speed of software updates that take care of whatever went wrong, without actually investing the resources necessary to debug well. Last- and definitely not least- is the newest approach that focuses on decoupling the data layer from the applications and lends visibility into production, which can be achieved on demand.
The most common method out of these options is logging. As you debug, you go through all the log files in order to find the data you need to understand where the bug is and what happened at the time of the error. This is not only time-consuming, as you will have to have to access and sift through many log files, but it’s also often necessary to write additional logs, and then redeploy and restart your application, just to get additional data.
The optimal debugging tool would be on par with the fifth and newest method. It would have all the full stack trace and variable data that you’d need to debug your app while eliminating the need to restart the application in order to debug it. It would also not disrupt the app’s users to debug an issue, nor would it slow down the app’s performance to gather the required debug data.
Production debugging: looking to the future
The future of production debugging lies in being able to reach the necessary data, whenever and wherever it’s needed. In order to successfully debug in production a method of debugging is needed that won’t break anything, slow down any applications, or cause the users any ruckus. One such way is by setting breakpoints that don’t stop the application, but rather let it continue to run while enabling visibility into the running code and letting the developers collect the data they need. No more breaking things- just more data, more simply.
Is your elusive production bug still getting you down? Hopefully, this article gave you a better perspective into production debugging and it’ll be less of a scary journey for you moving forward. If you need someone to hold your hand while you proceed on your debugging path, let us know and we’ll do our best virtually.