HiveMQ Swarm Monitoring
Monitor Your Swarm with InfluxDB
InfluxDB is a widely-used open-source time-series database that is written in Go and optimized for fast, high-availability storage and retrieval of time-series data.
InfluxDB is a popular choice for gathering and visualizing application metrics.
You can configure the InfluxDB connection in two ways:
Environment-variable Based Configuration
The following environment variables are available for configuration of your InfluxDB connection:
Variable |
Definition |
|
InfluxDB host |
|
The name of the InfluxDB database |
|
Optional string used for basic HTTP authorization. For example, admin:admin. |
|
Optional prefix of all metrics with the specified string |
|
Interval how often to send to influxDB (in seconds) |
|
Adds the InfluxDB tag hostname |
|
Adds the InfluxDB tag {suffix} |
File-based InfluxDB Configuration
If no InfluxDB configuration can be acquired via environment variables, HiveMQ Swarm looks for the configuration in the HiveMQ Swarm config.xml
configuration file.
<swarm>
<metrics>
<influxDB>myInfluxDB</influxDB>
<influxDBHost>http://localhost:8086</influxDBHost>
<influxDBAuthString>authString</influxDBAuthString>
<influxDBInterval>20m</influxDBInterval>
<influxDBPrefix>myPrefix</influxDBPrefix>
<influxDBTags>
<influxDBTag>
<key>myTag</key>
<value>myValue</value>
</influxDBTag>
</influxDBTags>
</metrics>
</swarm>
Monitor Your Swarm with Prometheus
Prometheus is a popular open-source solution for event monitoring and alerting.
Prometheus provides a simple and powerful dimensional data model, flexible query language, efficient time-series database, and real-time metrics.
When the rest
service of HiveMQ Swarm is enabled, the metrics are provided
via the /metrics
endpoint.
<swarm>
<commander>
<agents>
<agent>
<host>localhost</host>
<port>3881</port>
</agent>
</agents>
</commander>
<rest>
<enabled>true</enabled>
<listeners>
<http>
<enabled>true</enabled>
<bindPort>8080</bindPort>
<bindAddress>0.0.0.0</bindAddress>
</http>
</listeners>
</rest>
</swarm>
Be sure to verify that your Prometheus server can reach the IP address of the network interface. |
The metrics of HiveMQ Swarm agents reset when a scenario finishes. Since Prometheus gathers metrics periodically, it is possible that the final metric values do not get polled. To avoid this we recommend adding a stage to the end of your scenario that waits for at least two times the Prometheus scraping interval. Use the delay command to do so. |
Test Your REST Service Configuration
To test the configuration of your REST service, use your browser to navigate to <restservice-ip>:<restservice-port>/metrics
.
For example http://localhost:8181/metrics.
Information similar to the following verifies that the metrics are available:
# HELP com_hivemq_messages_incoming_publish_rate_total Generated from Dropwizard metric import
(metric=com.hivemq.messages.incoming.publish.rate, type=com.codahale.metrics.Meter)
# TYPE com_hivemq_messages_incoming_publish_rate_total counter
com_hivemq_messages_incoming_publish_rate_total 0.0
# HELP com_hivemq_messages_incoming_pubrec_rate_total Generated from Dropwizard metric import (metric=com.hivemq.messages.incoming.pubrec.rate, type=com.codahale.metrics.Meter)
# TYPE com_hivemq_messages_incoming_pubrec_rate_total counter
com_hivemq_messages_incoming_pubrec_rate_total 0.0
...
Install Prometheus
-
Download Prometheus and install the Prometheus application on a machine of your choice.
For best results, we recommend that you do not run Prometheus on the same machine as HiveMQ Swarm.
A step by step Prometheus getting started guide and detailed configuration information are available in the Prometheus documentation.
-
To enable Prometheus to gather metrics from HiveMQ Swarm, add a scrape configuration to your Prometheus configuration. Scrape from the
<ip>:<port>\metrics
address of the REST service. -
Open the web address of your Prometheus application and verify that HiveMQ Swarm metrics are visible.
global:
scrape_interval: 15s
query_log_file: /prometheus/query.log
scrape_configs:
- job_name: 'swarm'
scrape_interval: 5s
metrics_path: '/metrics'
static_configs:
- targets: ['<agent-1>:8181', '<agent-2>:8181']
This example is for a 2 agent setup. If you want more agents, add the additional addresses to the targets. |
Display HiveMQ Swarm Metrics in Prometheus
Prometheus provides built-in functionality to display metrics on-the-fly that can be helpful when you want an in-depth look into specific metrics that you do not monitor constantly. Navigate to http://localhost:9090/.
Frequently, Prometheus is used as a data source for monitoring dashboards such as Grafana. For a complete tutorial on how to set up a Grafana dashboard and use Prometheus as a data source, see HiveMQ - Monitoring with Prometheus and Grafana.
HiveMQ Swarm Metrics
Or visit our Community Forum.
HiveMQ Swarm offers five types of metrics
Metric Type | Description |
---|---|
|
A gauge returns a simple value at the point of time the metric was requested. |
|
A counter is a simple incrementing and decrementing number. |
|
A histogram measures the distribution of values in a stream of data. They allow to measure min, mean, max, standard deviation of values and quantiles. |
|
A meter measures the rate at which a set of events occur. Meters measure mean, 1-, 5-, and 15-minute moving averages of events. |
|
A timer is basically a histogram of the duration of a type of event and a meter of the rate of its occurrence. It captures rate and duration information. |
Metric Name | Type | Description |
---|---|---|
agent_end_to_end_latency_histogram |
|
The time between publishes sent and received by the subscribers. (MQTT 5 only) |
|
|
The number of connection attempts. |
|
|
The number of failed connect attempts. |
|
|
The number of successful connect attempts. |
|
|
The number of outgoing publish messages. |
|
|
The number of successful publish messages. |
|
|
The number of successful publish messages. |
|
|
The number of incoming publish messages. |
|
|
The number of outgoing subscribes. |
|
|
The number of successful subscribes. |
|
|
The number of failed subscribes. |
|
|
The number of outgoing unsubscribes. |
|
|
The number of successful unsubscribes. |
|
|
The number of failed unsubscribes. |
|
|
The rate of connection attempts. |
|
|
The rate of successful connection attempts. |
|
|
The rate of failed connection attempts. |
|
|
The rate of outgoing subscribes. |
|
|
The rate of successful subscribes. |
|
|
The rate of failed subscribes. |
|
|
The rate of outgoing unsubscribes. |
|
|
The rate of successful unsubscribes. |
|
|
The rate of failed unsubscribes. |
|
|
The rate of outgoing QoS 0 publishes. |
|
|
The rate of successful QoS 0 publishes. |
|
|
The rate of failed QoS 0 publishes. |
|
|
The rate of the payloads of outgoing QoS 0 publishes. |
|
|
The rate of outgoing QoS 1 publishes. |
|
|
The rate of successful QoS 1 publishes. |
|
|
The rate of failed QoS 1 publishes. |
|
|
The rate of the payloads of outgoing QoS 1 publishes. |
|
|
The rate of outgoing QoS 2 publishes. |
|
|
The rate of successful QoS 2 publishes. |
|
|
The rate of failed QoS 2 publishes. |
|
|
The rate of the payloads of outgoing QoS 2 publishes. |
|
|
The rate of outgoing publishes. |
|
|
The rate of successful publishes. |
|
|
The rate of failed publishes. |
|
|
The rate of the payloads of outgoing publishes. |
|
|
The rate of outgoing QoS 0 publishes. |
|
|
The rate of the payloads of successful QoS 0 publishes. |
|
|
The rate of outgoing QoS 1 publishes. |
|
|
The rate of the payloads of successful QoS 1 publishes. |
|
|
The rate of outgoing QoS 2 publishes |
|
|
The rate of the payloads of successful QoS 2 publishes. |
|
|
The rate of outgoing total publishes |
|
|
The rate of the payloads of successful total publishes. |
|
|
The rate of outgoing connects |