Setting a Live Debugging Dashboard to Catch a Thief of Time
They say that procrastination is the thief of time. In the world of software development, there are some additional “time thieves” that prevent our teams from developing new features or slow them down as they attempt to fix issues.
As software engineers or R&D managers, we take it for granted that our teams spend a lot of their time waiting for compiling, testing, and deploying. We know that many a coffee break was justified by an unpreventable waiting period created by an automated process that causes idle downtime and unwanted context switches in the engineer’s work process. We invest in building automated tools and devops practices to ensure they have a minimal impact on the velocity of our teams. But one thief still lurks in the darkness, stealing precious time without us even realizing it. This thief is called debugging.
Debugging in the best of times and the worst of times
We are used to thinking of debugging as something that happens only after feature development has finished. In the traditional world of waterfall, debugging mostly happens after the code was thrown over the wall at the QA team. In the world of DevOps, however, we expect that debugging will mostly happen after the feature is deployed in production, and then see customers interacting with it in ways we didn’t anticipate.
We know that we will then face the challenges of reproducing the issue locally, writing a unit test that covers it, connecting to a remote machine, struggling with getting the right version of the code, and other such time consuming tasks that will slow us down as we try to reach the root cause.
In reality, we know that a lot of debugging happens during feature development. As soon as a developer wrote a couple of lines of code, the code already has at least one bug. The obvious bugs are cleared when testing the feature locally on the developer’s laptop. Some will only be discovered later in testing or staging environments. And no matter how hard we try, some will be encountered by customers in the production environment.
We usually give more weight to those bugs, as we measure our MTTR and try to understand the impact of production issues on our business. We also tend to take for granted that debugging locally and debugging in dev/staging/testing environments will always happen. But as with other KPIs in the world of software development, without measurement the weight we give to debugging in the production environment will often be biased, as we will not realize how many hours our teams invested in debugging locally or in other remote environments.
Oh, where does the time go?
In other development and troubleshooting domains, we make sure to have very strict and precise time tracking. We measure how long it takes to develop a feature, either by directly using designated tools or indirectly based on our agile planning and retrospective ceremonies. We measure how long it takes to fix an issue, and we track it rigorously in our support ticketing systems. Then, if we think our engineers have had enough coffee that day, we may even invest in not only improving our CICD systems, but also in measuring and reducing time wasted on compiling, building, and automating testing flows.
In some teams, the above time tracking is part of the release ceremony. Before every sprint, goals are set to reduce idle time and issue resolution time. At the end of every sprint, teams hope that by reducing wasted time, they are able to release more features and keep their customers satisfied. However, such measurement is rarely allocated to debugging efforts, which are a significant part of every feature development cycle. Often, this is because there are no specific tools for measuring debugging time or because of the misconception that “debugging just happens”. We can’t anticipate how many bugs we will have and we can’t anticipate how long it will take to fix them.
Saving time to spend more time
When a feature has been developed, developers believe that they can’t say how much of the development time was spent debugging. When a feature has been released, teams measure how much they invested in writing it, but not how long they keep investing in debugging it in prod.
This is where the Rookout Live Debugging Heatmap comes in. This new feature was developed in order to help uncover the hidden time thief and empower developers to steal back the time it took from them. This allows them to spend more time building cool features, and make their bug resolution become much quicker.
Rookout Live Debugging Heatmap shows how much time was spent debugging. Take it one step further, and it shows us a breakdown of how much time was spent debugging in a specific application, environment, or version.
It will show you, for example, that even though you thought the most important issues happen in production, 80% of remote debug sessions actually happen in your staging environment. Debugging in staging is just as hard and time-consuming as debugging in production, and the fact that we solve most bugs in staging means two things:
One, that we saved a bunch of bugs from being discovered by our customers.
And two, that we saved a lot of time and effort. This means we saved many programming hours that can instead be spent on developing more features, fixing more bugs, and drinking more coffee.
It will also show you that even though you thought the overall quality and user experience of your application is high, because the APM dashboards are green and very few jira tickets are identified with resolving issues in the application – the reality is that every day your developers spend hours debugging issues, issues that are not tracked by jira and not monitored by the APM. This knowledge will help you prioritize a refactoring, test coverage, and logging effort into improving the stability of your application. Lastly, it will help you pinpoint the stability and quality of specific releases., which will allow you to track the ongoing improvement in your team’s velocity and quality.
Always leave time for coffee. And chocolate.
In addition to the above, at Rookout we have added features that let you know who in your team is debugging, how often your team has live debugging sessions, and how much time and effort Rookout has saved your team.
That last one is based on feedback we received from customers. These customers told us that 10 minutes of debugging with Rookout will often save them 1-2 hours of having to reproduce the issue locally, wait for log lines to be added, find the correct repo and more.
For some of our customers, the ratio is even higher. Recently a customer told us that in certain environments adding a log line will take up to 24 hours, as the application is deployed once a day. So, having Rookout around would reduce their MTTR significantly.
Other customers suggested we integrate into their lifecycle management or support systems, so we can track how much debugging time was invested in each customer incident. We may get to that in the near future, and once we do, we hope the heatmap will be even more helpful.
With the Rookout Live Debugging Heatmap coming to your Rookout dashboard, we expect it will reveal a lot of time saved for you. What will you do with that extra time? Will you develop more features to increase your customer satisfaction? Will you fix even more bugs, improving the quality and user experience of your service? Or will you go and have another cup of coffee? (Maybe with a biscuit along with it. Chocolate is preferable. Just make sure it’s not nougat.)