Distributed Tracing with Jaeger 101
We all know of – or maybe work for — organizations that are phasing out old monolithic systems in favor of distributed systems with microservice architectures. And for good reason! Microservice architectures allow system components to be scaled independently; deployments are decoupled and continuous; and small, agile dev teams can work quickly, efficiently, and in parallel.
But when it comes to debugging, the romance with microservice architecture fades mighty fast. As complex, distributed systems at scale, they are exceptionally hard to debug: There is no way to isolate a single instance, as you would do for a monolith, and reproduce the problem.
Root causes of failure can rarely, if ever, be identified by looking at individual services, since the sum of the parts, in this case, is definitely not equal to the whole. The performance of all the distinct services does not provide a full picture of application performance.
I recently discussed the topic of Distributed Tracing at a Meetup and presented a demo of Jaeger, a more robust open-source version that adheres to OpenTrace standards. Distributed Tracing allows us to track requests as they pass through the multiple transactions and workflows of distributed systems. Once reassembled, timing and other metadata generated during tracing provide a valuable, complete picture of runtime application behavior.
Distributed tracing, however, has its drawbacks as well, in the form of source code instrumentation that is complex, fragile, and difficult to maintain. In addition, many current systems use application-level implementation with incompatible APIs, to which developers are reluctant to commit, particularly for multilingual systems that require a different tool for each platform.
To address these issues, the OpenTracing project advances the development of robust, vendor-neutral APIs and distributed tracing instrumentation for popular platforms. For the demo, which entailed transforming a monolith into several small microservices, I used Jaeger, distributed tracing tool developed by Uber Technologies. Jaeger can be used to monitor distributed, microservice-based systems for context propagation, distributed transactions, root cause analysis, and more. And because it adheres to OpenTrace standards, it allows you to move from Jaeger to datadog (and other solutions) without rewriting code.
I am happy to share my slides as well as the code that I used for my demo. Have a look, recreate the demo for yourself, and definitely share feedback if you have any question!