Distributed Tracing Terminology

  • OpenTelemetry: The industry-standard open-source observability framework that includes tools, APIs, and SDKs that help in instrumenting a system to generate, collect, and export telemetry data.

  • Telemetry: The process of collecting the data a system automatically transmits about its behavior such as tracing, metrics, and log data.

  • Observability: The degree to which you can understand the current state of a complex system based only on the outputs of the system.

  • Instrumentation: In microservices environments, instrumentation usually refers to code added to a service so that monitoring data can be collected to measure performance, diagnose errors, and write tracing information. Modern tracing tools usually support instrumentation in multiple languages and frameworks. Additionally, tools such as the OpenTelemetry Java Agent and some monitoring agents of APM vendors offer automatic instrumentation that does not require you to manually change your code. provide automatic instrumentation.

  • Request: A request (also known as a transaction) is how the services and applications in a distributed system communicate with each other. Requests are typically processed in multiple components of the system. Each service in the distributed system can use different technologies and transport protocols to handle the requests. For example, HTTP, Kafka, or MQTT. The request can also travel between local infrastructure and cloud services.

  • Trace: In distributed tracing, a trace represents the end-to-end execution path of a request as it is processed through multiple microservices. Each trace is composed of multiple spans.

  • Span: In distributed tracing, a span shows a single unit of work that is done in a trace such as an API call or database query. Each microservice in the path of a specific request through the distributed system contributes a span that represents one named and timed operation in the workflow of the tracked request.

    • Root span: The root span (also known as the parent span) is the first span in a trace. All following work units are captured in child spans, which inherit from each other.

    • Child span: A child span is a sub-operation that the parent span triggers such as a function call or a call to a database or another service.

  • Trace context: The trace context is used to track the request through microservices similar to the shipping label on a parcel.
    The trace context contains a trace ID, span ID, and sampled flag. The IDs are unique identifiers for the trace and each of its spans. The sampled flag shows whether the trace records the spans of the request.
    When a request exists in a system, the trace context is added to the metadata of the used transport protocol. For example, the HTTP headers, Kafka record headers, or MQTT user properties.

  • Sampling: A mechanism to control the amount of data the distributed trace exports.

    • Head-based sampling: Applies a sampling decision to a single trace when the trace starts.

    • Tail-based sampling: Applies a sampling decision at the end of the workflow for a single trace. Tail-based sampling is commonly used for latency analysis since end-to-end latency cannot be calculated until the end of the request workflow.

  • Exporter: The component that batches and transports the telemetry data that is obtained in a trace to the destination backend or endpoint in the proper format. For example, OTLP and Zipkin.