HiveMQ Enterprise Distributed Tracing Extension

The HiveMQ Enterprise Distributed Tracing Extension is a first-of-its-kind monitoring tool that lets you track the performance of MQTT PUBLISH messages within an MQTT broker.

The extension uses an open-standards OpenTelemetry-based, enterprise-grade approach to add tracing capabilities to the HiveMQ MQTT broker and its integrations. In unison, HiveMQ logging, monitoring, and tracing make your MQTT broker fully observable.

Use the extension to bring MQTT visibility to the Application Performance Monitoring (APM) solution of your choice. With the HiveMQ Enterprise Distributed Tracing Extension, you can see exactly how long the HiveMQ MQTT broker takes to process a request.

The insights you gain from the extension help you eliminate gaps in your APM solution, speed up your root-cause analysis, impact your mean time to resolve (MTTR), and increase the mean time between failures (MTBF) across your entire distributed architecture.

Features

Tracing of sent and received MQTT PUBLISH messages within the HiveMQ broker, including publish inbound and outbound interceptors.
Tracing of sent and received Kafka records in the HiveMQ Enterprise Extension for Kafka, including Kafka to MQTT transformers and MQTT to Kafka transformers.
Tracing of publish authorizers for MQTT PUBLISH packets.
Configurable sampling and filtering of MQTT PUBLISH messages to precisely control the type and amount of data exported.
Support of open standards for sending traces to ensure maximum interoperability without restrictive vendor lock-in.
Ease of integration with your existing Application Performance Monitoring (APM) for end-to-end observability.
Ability to build alerting and monitoring dashboards that give you clear insights into your entire process.

Distributed Tracing Concepts

Distributed tracing tracks application requests as they move through the microservices that make up your distributed applications. The ability to observe and collect data on requests as they flow from one service to another gives you the visibility you need to pinpoint where and why performance issues and failures occur in a complex system.

To create a high-level overview of the progress of a request, distributed tracing solutions gather telemetry data from each component in a distributed system through which the request passes. The resulting end-to-end picture of the request path is called a trace.

The individual sections of work that each component in the distributed system contributes to the trace are called spans.

The spans of a trace give you detailed information for every part of a system (without the need for full knowledge of each component). For complex modern systems that are made up of numerous services, distributed tracing is a valuable way to increase observability.

Spans

In distributed tracing, all traces are composed of spans. A span describes a single operation with a start time and end time such as a database request, outgoing remote request, or a function invocation. The spans of a single request are linked together by parent-child relationships to form a tree. The resultant tree is your trace. The root element of the tree is called the root span.

The HiveMQ Enterprise Distributed Tracing Extension implements all standard attributes of OpenTelemetry messaging spans and adds additional MQTT-specific message attributes. For detailed information, see HiveMQ Distributed Tracing Spans.

Sampling

When the HiveMQ Enterprise Distributed Tracing Extension is enabled, the extension automatically collects tracing information for all requests that pass through the HiveMQ broker. This 'always-on' behavior can produce large amounts of trace data. Sampling allows you to limit the number of traces you need to process. There are two common types of sampling. Each sampling method has advantages and disadvantages:

Head-based sampling: In head-based sampling, the decision whether to sample a trace is made at the beginning of the trace. In this case, the HiveMQ extension randomly selects traces and propagates the selected traces to other services using the configured trace context. Head-based sampling is an efficient way to quickly sample large amounts of trace data in real-time, is easy to set up, and has little impact on application performance. Head-based sampling is the most commonly used sampling method.

A disadvantage of head-based sampling is that you cannot decide to only sample spans that have errors since you make the sampling decision before errors occur.
Tail-based sampling: In tail-based sampling, the decision whether to sample a trace takes place after all spans in the trace are complete. Tail-based sampling makes it possible to filter your traces based on specific criteria. Processing of tail sampling takes place in an external processor. For more information, see Tail Sampling with OpenTelemetry.

A disadvantage of tail-based sampling is that it can increase the load on your system.

For more information on distributed tracing terms and definitions, see Distributed Tracing Terminology.

Requirements

A running HiveMQ Platform.
For production use, a valid HiveMQ Enterprise Distributed Tracing Extension license.
A compatible OpenTelemetry endpoint or APM vendor agent to receive tracing data from the extension.

If you do not provide a valid license, HiveMQ automatically uses a free evaluation license.
Evaluation licenses for HiveMQ Enterprise Extensions are valid for 5 hours. For more license information or to request an extended evaluation period, contact HiveMQ sales.

Installation

Place your HiveMQ Enterprise Distributed Tracing Extension license file (.elic) in the license folder of your HiveMQ installation. (Skip this step if you are using a trial version of the extension).
All HiveMQ Enterprise Extensions are preinstalled in your HiveMQ release bundle and disabled by default
```
└─ <HiveMQ folder>
    ├─ bin
    ├─ config
    ├─ data
    ├─ extensions
    │   ├─ hivemq-distributed-tracing-extension
    │   └─ ...
    ├─ license
    ├─ log
    └─ ...
```
Before you enable the extension, you need to configure the extension to fit your individual use case.
For your convenience, we provide a hivemq-distributed-tracing-extension.example.xml example configuration file that you can copy and modify as desired.
The included hivemq-distributed-tracing-extension.xsd file outlines the schema and elements that can be used in the XML configuration.
Your completed configuration file must be named hivemq-distributed-tracing-extension.xml.
For detailed information on configuration options, see Configuration.
To enable the HiveMQ Enterprise Distributed Tracing Extension, locate the hivemq-distributed-tracing-extension folder in the extensions directory of your HiveMQ installation and remove the DISABLED file (if present).

To function properly, the HiveMQ Enterprise Distributed Tracing Extension must be installed on all HiveMQ broker nodes in your HiveMQ cluster and the configuration file on each node must be identical.

Configuration

The HiveMQ Enterprise Distributed Tracing Extension supports hot reload of the extension configuration. Changes that you make to the configuration of the extension are updated while the extension is running, with no need to restart. When the extension recognizes a valid configuration has been loaded, the previous configuration file is automatically archived in the config-archive of the extension home folder.

If you load an invalid configuration at runtime and a previous valid configuration exists in the archive, HiveMQ uses the previous configuration.

The HiveMQ Enterprise Distributed Tracing Extension requires configuration of the extension, your HiveMQ broker, and all extensions that support tracing.

HiveMQ Distributed Tracing Extension Configuration

In the HiveMQ Enterprise Distributed Tracing Extension, you configure the global OpenTelemetry setup that is used to capture, process, and export spans in the HiveMQ broker and its extensions.

The global configuration includes:

Service name
Trace context propagation
Batch Span Processor
Span Exporter

In addition to the global configuration each HiveMQ component with tracing support has a domain-specific configuration. The traceable functionality of the individual component determines the scope of the domain-specific configuration. For example, custom sampling rules that are based on the metadata of domain-specific message types such as MQTT message packets or Kafka records.

Distributed Tracing Extension Configuration File

The hivemq-distributed-tracing-example.xml file is located in the hivemq-distributed-tracing-extension folder within the extensions folder of your HiveMQ installation.

The extension uses a simple but powerful XML-based configuration.

The hivemq-distributed-tracing-extension.example.xml file contains a basic configuration example that shows the parameters you need to set up the extension with an OpenTelemetry Protocol Exporter (OTLP) endpoint.
Additional example configurations with Zipkin and OTLP OpenTelemetry exporters are included in the examples folder of your extension directory.

If you copy and reuse one of the example files, be sure to rename the file hivemq-distributed-tracing-extension.xml before you enable your extension. For more information, see Installation.

Example HiveMQ Enterprise Distributed Tracing Extension configuration

<?xml version="1.0" encoding="UTF-8"?>
<hivemq-distributed-tracing-extension xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                                      xsi:noNamespaceSchemaLocation="config.xsd">
    <service-name>HiveMQ Broker</service-name>

    <propagators>
        <propagator>tracecontext</propagator>
    </propagators>

    <batch-span-processor>
        <schedule-delay>5000</schedule-delay>
        <max-queue-size>2048</max-queue-size>
        <max-export-batch-size>512</max-export-batch-size>
        <export-timeout>30</export-timeout>
    </batch-span-processor>

    <exporters>
        <otlp-exporter>
            <id>my-oltp-exporter</id>
            <endpoint>http://localhost:4317</endpoint>
            <protocol>grpc</protocol>
        </otlp-exporter>
    </exporters>
</hivemq-distributed-tracing-extension>

Parameter Required Type Description

Parameter	Type	Description
`service-name`	String	The logical name of the service. The `service-name` must be the same for all instances of a horizontally-scaled service. All nodes in the same HiveMQ cluster must have the same service name. For ease of identification, we recommend that each HiveMQ cluster has a unique service name. For example, `HiveMQ Megafactory EMEA`.
`propagators`	Complex Type	Determines which distributed tracing header format is used for all incoming and outgoing requests. For example, MQTT PUBLISH and Kafka records. `tracecontext`: W3C Trace Context (includes W3C Baggage) `jaeger`: Jaeger (includes Jaeger baggage) `b3`: Zipkin B3 (single header) `b3multi`: Zipkin B3 (multi header)
`batch-span-processor`	Complex Type	Lists the options for the internal batch span processor. The batch processor accepts spans and places them into batches. Batching helps better compress the data and reduce the number of outgoing connections required to transmit the data. This processor supports both size and time-based batching. `schedule-delay`: The interval, in milliseconds, between two consecutive exports. The default value is `5000`. `max-queue-size`: The maximum queue size. The default value is `2048`. `max-export-batch-size`: The maximum batch size. The default value is `512`. `export-timeout`: The maximum allowed time, in milliseconds, to export data. The default value is `30000`.
`exporters`	Complex Type	Lists the transport components that are used to send telemetry data to the configured endpoints. `otlp-exporter`: An OLTP exporter that uses gRPC and HTTP for its communications protocols. `id`: The user-defined name of the selected OTLP exporter. `endpoint`: The destination to which the configured OTLP exporter transfers tracing data. For example, `http://otel-collector:4317`. `protocol`: The transport protocol used by the configured OTLP exporter. Must be `grpc` or `http/protobuf`. `zipkin-exporter`: A Zipkin exporter that sends JSON in Zipkin format to a specified HTTP URL. `id`: The user-defined name of the selected Zipkin exporter. `endpoint`: The destination to which the configured Zipkin exporter transfers tracing data. For example, `http://otel-collector:9411/api/v2/spans`.

service-name

String

The logical name of the service. The service-name must be the same for all instances of a horizontally-scaled service. All nodes in the same HiveMQ cluster must have the same service name. For ease of identification, we recommend that each HiveMQ cluster has a unique service name. For example, HiveMQ Megafactory EMEA.

propagators

Complex Type

Determines which distributed tracing header format is used for all incoming and outgoing requests. For example, MQTT PUBLISH and Kafka records.

tracecontext: W3C Trace Context (includes W3C Baggage)
jaeger: Jaeger (includes Jaeger baggage)
b3: Zipkin B3 (single header)
b3multi: Zipkin B3 (multi header)

batch-span-processor

Complex Type

Lists the options for the internal batch span processor. The batch processor accepts spans and places them into batches. Batching helps better compress the data and reduce the number of outgoing connections required to transmit the data. This processor supports both size and time-based batching.

schedule-delay: The interval, in milliseconds, between two consecutive exports. The default value is 5000.
max-queue-size: The maximum queue size. The default value is 2048.
max-export-batch-size: The maximum batch size. The default value is 512.
export-timeout: The maximum allowed time, in milliseconds, to export data. The default value is 30000.

exporters

Complex Type

Lists the transport components that are used to send telemetry data to the configured endpoints.

otlp-exporter: An OLTP exporter that uses gRPC and HTTP for its communications protocols.
- id: The user-defined name of the selected OTLP exporter.
- endpoint: The destination to which the configured OTLP exporter transfers tracing data. For example, http://otel-collector:4317.
- protocol: The transport protocol used by the configured OTLP exporter. Must be grpc or http/protobuf.
zipkin-exporter: A Zipkin exporter that sends JSON in Zipkin format to a specified HTTP URL.
- id: The user-defined name of the selected Zipkin exporter.
- endpoint: The destination to which the configured Zipkin exporter transfers tracing data. For example, http://otel-collector:9411/api/v2/spans.

HiveMQ Broker Tracing Configuration

In the HiveMQ broker, you configure the domain-specific sampling rules and outbound context propagation for incoming and outgoing MQTT packets.

HiveMQ Broker Tracing Configuration File

The tracing.xml file is located in the conf folder of your HiveMQ installation.

The provided tracing.xml file contains a deactivated basic configuration for tracing MQTT PUBLISH packets.

Example HiveMQ broker tracing configuration

<?xml version="1.0" encoding="UTF-8" ?>
<tracing xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="tracing.xsd">
    <context-propagation>
        <outbound-context-propagation>
            <enabled>true</enabled>
            <exclude>
                <client-id-patterns>
                    <client-id-pattern>iot-device-.*</client-id-pattern>
                </client-id-patterns>
            </exclude>
        </outbound-context-propagation>
    </context-propagation>
    <sampling>
        <publish-sampling>
            <enabled>true</enabled>
            <include>
                <topic-filters>
                    <topic-filter>car/command/#</topic-filter>
                </topic-filters>
            </include>
        </publish-sampling>
    </sampling>
</tracing>

The following <outbound-context-propagation> values can be configured:

Table 1. Available outbound context propagation parameters
Parameter	Type	Description
`enabled`	Boolean	Determines whether the trace context is added to outgoing MQTT PUBLISH packets
`include`	Complex Type	See HiveMQ Broker Include/Exclude Configuration.
`exclude`	Complex Type	See HiveMQ Broker Include/Exclude Configuration.

The following <publish-sampling> values can be configured:

Table 2. Available publish sampling parameters
Parameter	Type	Description
`enabled`	Boolean	If no trace context is present in the PUBLISH user properties, the `enabled` setting determines whether incoming MQTT PUBLISH packets are sampled.
`include`	Complex Type	See HiveMQ Broker Include/Exclude Configuration.
`exclude`	Complex Type	See HiveMQ Broker Include/Exclude Configuration.

HiveMQ Broker Include/Exclude Configuration

The optional <include> and <exclude> tags give you the ability to refine your sampling and outbound context propagation configuration. By default, everything is included. Inclusions are applied before exclusions.

The following <include> and <exclude> options can be configured:

Table 3. Available include and exclude parameters
Parameter	Required	Type	Description
`client-id-patterns`		Complex Type	See HiveMQ Broker Client ID Patterns Configuration. You can define as many individual `<client-id-pattern>` tags as your use case requires.
`topic-filters`		Complex Type	See HiveMQ Broker Topic Filters Configuration. You can define as many individual `<topic-filter>` tags as your use case requires.

HiveMQ Broker Client ID Patterns Configuration

The following <client-id-patterns> values can be configured:

Parameter Required Type Description

Parameter	Required	Type	Description
`client-id-pattern`		String	A pattern that uses regular expressions to match MQTT client IDs. For example, `iot-device-.*`.

client-id-pattern

String

A pattern that uses regular expressions to match MQTT client IDs. For example, iot-device-.*.

HiveMQ Broker Topic Filters Configuration

The following <topic-filters> values can be configured:

Table 4. Available topic filter parameters
Parameter	Required	Type	Description
`topic-filter`		String	Any valid MQTT topic filter, including wildcards (`+`,). For example, `car/commands/`.

HiveMQ Enterprise Extension for Kafka Tracing Configuration

In the HiveMQ Enterprise Extension for Kafka, you configure the domain-specific sampling rules and outbound context propagation for incoming and outgoing Kafka records.

Outbound context propagation configuration for MQTT to Kafka mappings and transformers
Sampling configuration for Kafka to MQTT mappings and transformers

Extension for Kafka Configuration File

The kafka-configuration.xml file is located in the Kafka extension folder within the extensions folder of your HiveMQ installation.

You can add outbound context propagation and sampling configuration to existing mappings and transformers.

Example HiveMQ Enterprise Extension for Kafka tracing configuration

<?xml version="1.0" encoding="UTF-8" ?>
<kafka-configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                     xsi:noNamespaceSchemaLocation="kafka-extension.xsd">
    <kafka-clusters>
        <kafka-cluster>
            <id>cluster01</id>
            <bootstrap-servers>kafka:9092</bootstrap-servers>
        </kafka-cluster>
    </kafka-clusters>

    <mqtt-to-kafka-mappings>
        <mqtt-to-kafka-mapping>
            <cluster-id>cluster01</cluster-id>
            <id>mqtt-to-kafka-mapping-01</id>
            <mqtt-topic-filters>
                <mqtt-topic-filter>car/command/#</mqtt-topic-filter>
            </mqtt-topic-filters>
            <kafka-topic>my-kafka-topic</kafka-topic>
            <tracing>
                <outbound-context-propagation>
                    <enabled>true</enabled>
                </outbound-context-propagation>
            </tracing>
        </mqtt-to-kafka-mapping>
    </mqtt-to-kafka-mappings>

    <kafka-to-mqtt-transformers>
        <kafka-to-mqtt-transformer>
            <cluster-id>cluster01</cluster-id>
            <id>kafka-to-mqtt-transformer-01</id>
            <kafka-topics>
                <kafka-topic>my-kafka-topic</kafka-topic>
            </kafka-topics>
            <transformer>com.example.KafkaToMqttTransformer</transformer>
            <tracing>
                <sampling>
                    <enabled>true</enabled>
                </sampling>
            </tracing>
        </kafka-to-mqtt-transformer>
    </kafka-to-mqtt-transformers>
</kafka-configuration>

You can use the optional <tracing> tags to enable or disable outbound context propagation and sampling per mapping. If no <tracing> tag is defined, the respective inner configuration is disabled.

The required <outbound-context-propagation> tag is only applicable for MQTT to Kafka mappings and transformers.

The following <outbound-context-propagation> values can be configured:

Table 5. Available outbound context propagation parameters
Parameter	Required	Type	Description
`enabled`		Boolean	Determines if a trace context is added to outgoing Kafka records, using the globally configured propagator.

The required <sampling> tag is only applicable for Kafka to MQTT mappings and transformers.

The following <sampling> values can be configured:

Table 6. Available sampling parameters
Parameter	Required	Type	Description
`enabled`		Boolean	If no trace context is present in the Kafka record headers, the `enabled` setting determines whether incoming Kafka records are sampled.

Metrics

The HiveMQ Enterprise Distributed Tracing Extension exposes several useful metrics that you can use to monitor extension behavior.

The following table lists each metric the HiveMQ Enterprise Distributed Tracing Extension exposes. For increased readability, the com.hivemq.extensions.distributed-tracing. prefix is omitted in the table.
For more information on HiveMQ metrics, see metric types.

Table 7. Available HiveMQ Distributed Tracing Extension Metrics
Name	Type	Description
queue-size.current	`Gauge`	The current size of the batch span processor queue.
spans.started	`Counter`	The number of spans that are started.
spans.processed	`Counter`	The number of spans that finished and are successfully added to the span processor queue.
spans.dropped	`Counter`	The number of spans that finished but are dropped due to a full queue.
exporter.seen	`Counter`	The number of spans successfully delivered to the exporter from the queue.
exporter.exported	`Counter`	The number of spans successfully exported.
exporter.failed	`Counter`	The number of spans that are not exported. For example due to an unreachable export endpoint.

For more information on HiveMQ metrics, see Standard HiveMQ Metrics and HiveMQ REST API Metrics.

Next Steps

If you have additional questions or want expert support to set up your HiveMQ Enterprise Distributed Tracing Extension, contact us. We are happy to assist you.