Distributed Tracing in Microservices (Spring Cloud Sleuth + Zipkin)
- 4.2/5
- 3528
- Jul 20, 2024
In this article, we'll look at how to use Spring Cloud Sleuth and Zipkin to implement distributed tracing.
Spring Cloud Sleuth is used to collect trace information, and Zipkin is used for the storage and visualization of these traces.
What are traces and spans?
Spans and traces are the fundamental building blocks of distributed tracing.
A trace or trace tree is a collection of operations that represents a unique transaction handled by an application and its constituent services.
A span represents a single operation within a trace. Spans can be subdivided into sub-spans, forming the trace tree.
A span might refer to another span as its parent, indicating a relationship between the operations involved in the trace.
Tracing with Spring Cloud Sleuth
The trace-id is created when the request hits the very first service in the chain. For subsequent calls, an already existing trace-id is passed along, typically as an http header attribute.
We do not need to write any custom code to create or propagate the trace contexts, as it is done automatically by Spring Cloud Sleuth. It typically intercepts the requests to do it. It also configures the logging context to include trace-id and other variables.
Spring Cloud Sleuth can send trace information to Zipkin either synchronously over HTTP or asynchronously using a message broker such as "Kafka" or "RabbitMQ".
Zipkin comes with native support for storing trace information either in memory or in a database such as Apache Cassandra, Elasticsearch, or MySQL.
Implementation
To demonstrate the distributed tracing, we have two microservices, "order-service" and "product-service," where "order-service" calls a REST API exposed by "product-service."
In order to send a request to "product service," the "order-service" has a list of available instances of "product service," which is fetched periodically from the Eureka server.
We also have an edge server developed using Spring Cloud Gateway.
A browser request to "order-service" comes through this "Edge server," and the "order-service" in turn calls the "product-service" and sends back the response to the "Edge server" to send it back to the browser.
Whenever the "Edge server" routes a request to "order-service" or the "order-service" sends a request to "product-service" both of these requests are sent to instances fetched from the Eureka server.
1) Run Zipkin
Zipkin receives the traces from all the different services. It aggregates them based on the trace-id and provide multiple views for lookup.
If you have Java 8 or higher installed, the quickest way to get started is to fetch the latest release as a self-contained executable jar:
curl -sSL https://zipkin.io/quickstart.sh | bash -s java -jar zipkin.jar
By default it connects to the local address — http://localhost:9411/.
2) Enable distributed tracing
We need to add the below-mentioned dependencies to use "Spring Cloud Sleuth" and send trace information to "Zipkin."
<dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> <version>2.2.8.RELEASE</version> </dependency>
2.1) The "spring-cloud-eureka-server"
This instance of the "Netflix Eureka server" helps in service discovery from "order-service" to "product-service" and also from the "Edge server" to "order-service".
For more information, visit How to use Netflix Eureka as a discovery service in Spring BOOT.
Configuration
The "spring.sleuth.sampler.probability: 1.0" makes sure that all traces are sent to Zipkin. Another important property is "spring.zipkin.baseUrl," which is the Zipkin server's base-url.
2.2) The "spring-cloud-gateway"
A browser request to "order-service" will come through this "Edge server," and the "order-service" in turn will call the "product-service" and send back the response to the "Edge server" to send it back to the browser.
For more information, visit How to use Spring Cloud Gateway in Spring BOOT.
Configuration
The "spring.sleuth.sampler.probability: 1.0" makes sure that all traces are sent to Zipkin. Another important property is "spring.zipkin.baseUrl," which is the Zipkin server's base-url.
2.3) The "product-service"
This is a simple Spring Boot microservice with a straightforward REST API.
Configuration
The "spring.sleuth.sampler.probability: 1.0" makes sure that all traces are sent to Zipkin. Another important property is "spring.zipkin.baseUrl," which is the Zipkin server's base-url.
2.4) The "order-service"
Like "product-service," this too is a simple Spring Boot microservice with a couple of straightforward REST APIs.
Configuration
The "spring.sleuth.sampler.probability: 1.0" makes sure that all traces are sent to Zipkin. Another important property is "spring.zipkin.baseUrl," which is the Zipkin server's base-url.
Test
Let's start "Zipkin" server and the four microservices in order - "spring-cloud-eureka-server", "spring-cloud-gateway", "product-service" and "order-service".
Now that everything is in place, we can send an API call with the help of the swagger UI for "order-service" exposed through the "Edge server," at: http://localhost:8080/order-service/swagger-ui.html.
Let's see the generated logs of "order-service":
2023-01-24 20:55:55.534 TRACE [order-service,b3de0baad168aa5b,b3de0baad168aa5b] 1285 --- [nio-8081-exec-7] o.s.web.servlet.DispatcherServlet : POST "/order/", parameters={}, headers={masked} in DispatcherServlet 'dispatcherServlet' 2023-01-24 20:55:55.618 TRACE [order-service,b3de0baad168aa5b,b3de0baad168aa5b] 1285 --- [nio-8081-exec-7] o.s.web.servlet.DispatcherServlet : No view rendering, null ModelAndView returned. 2023-01-24 20:55:55.618 DEBUG [order-service,b3de0baad168aa5b,b3de0baad168aa5b] 1285 --- [nio-8081-exec-7] o.s.web.servlet.DispatcherServlet : Completed 200 OK, headers={masked}
The log statement carries Service Name, TraceId & SpanId in their respective orders. These values are automatically populated by Spring Cloud Sleuth.
The scope of the SpanId is limited to service only.
In this service, the TraceId and SpanId are same, as this is the first service. This behaviour will change with the subsequent call(s).
Now, lets see the generated logs of "product-service" also:
2023-01-24 20:55:55.577 TRACE [product-service,b3de0baad168aa5b,9b98bd843c83c293] 1287 --- [nio-8082-exec-5] o.s.web.servlet.DispatcherServlet : GET "/product/10", parameters={}, headers={masked} in DispatcherServlet 'dispatcherServlet' 2023-01-24 20:55:55.603 TRACE [product-service,b3de0baad168aa5b,9b98bd843c83c293] 1287 --- [nio-8082-exec-5] o.s.web.servlet.DispatcherServlet : No view rendering, null ModelAndView returned. 2023-01-24 20:55:55.603 DEBUG [product-service,b3de0baad168aa5b,9b98bd843c83c293] 1287 --- [nio-8082-exec-5] o.s.web.servlet.DispatcherServlet : Completed 200 OK, headers={masked}
The TraceId for "order-service" and "product-service" are same - b3de0baad168aa5b.
With this id we can track all the relevant logs from both the services. We can troubleshoot any issues navigating the logs based on Trace Id.
Zipkin UI
Go to zipkin, and refresh you will see something like this.
This provides very helpful information like - TraceId, Start/End Time, response status, latency and Total Processing Time of request. The entry also includes all of the services that were invoked during the request processing.
We can also view a service dependency graph to visualize the interactions.
This crucial information provides a quick insight into the service's performance, reliability, and availability.
Source code: GitHub