HiveMQ Clusters
One of the outstanding features of HiveMQ is the ability to form resilient, highly available, and ultra-scalable MQTT broker clusters.
An MQTT broker cluster is a distributed system that acts as a single logical MQTT broker for connected MQTT clients. Typically, the nodes of an MQTT cluster are installed on separate physical or virtual machines and connected over a network.
Clustering ensures that your MQTT communication has no single point of failure and is indispensable for building a service with high availability. If one MQTT broker node in the cluster is no longer available, the remaining nodes can handle the traffic from your MQTT clients.
HiveMQ employs a sophisticated cluster design that is specifically designed for MQTT. HiveMQ MQTT broker clusters implement a distributed masterless cluster architecture that provides true horizontal scalability. Each HiveMQ broker node can handle hundreds of thousands to millions of concurrently connected MQTT clients.
Each HiveMQ cluster can grow and shrink elastically at runtime with no loss of data or decrease in availability.
We highly recommend HiveMQ clusters for your production IoT deployments.
Enable Clustering
When you enable a HiveMQ cluster, it is important to select the type of transport (TCP/UDP) and discovery that is right for your individual use case.
For more information, see Cluster Discovery and Cluster Transport.
The following example configuration uses static discovery and TCP transport to form a HiveMQ broker cluster:
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<tcp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.1</bind-address>
<bind-port>7800</bind-port>
</tcp>
</transport>
<discovery>
<static>
<node>
<!-- replace this IP with the IP address of your interface -->
<host>192.168.1.1</host>
<port>7800</port>
</node>
<node>
<!-- replace this IP with the IP address of another node -->
<host>192.168.1.2</host>
<port>7800</port>
</node>
</static>
</discovery>
</cluster>
...
</hivemq>
This example shows the minimal configuration to enable a HiveMQ cluster with the default values:
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
</cluster>
...
</hivemq>
If you do not explicitly define the transport and discovery in your cluster configuration, HiveMQ uses UDP for data transport and multicast as the discovery mechanism. |
The minimal HiveMQ cluster configuration is only recommended for testing purposes in local networks. |
Cluster Discovery
Discovery refers to the mechanism that individual HiveMQ broker nodes use to find each other and form a HiveMQ broker cluster.
Both static and dynamic discovery mechanisms are available.
Dynamic discovery methods are ideal for increasing the size of the HiveMQ broker cluster during runtime.
Static discovery is useful for HiveMQ deployments that maintain a fixed cluster size.
HiveMQ supports several cluster discovery mechanisms.
Not all discovery mechanisms and transport protocols are compatible. |
Name | UDP transport | TCP transport | Dynamic | Description |
---|---|---|---|---|
Checks a static list of nodes with their TCP bind ports and IP addresses |
||||
Finds all nodes that use the same multicast address and port |
||||
Finds all nodes in the same subnet by using the IP broadcast address. |
||||
Uses information provided by a HiveMQ extension to discover cluster nodes |
Additional custom mechanisms can be implemented with extension discovery. |
Static Discovery
When you use the static cluster discovery mechanism, your configuration must list each HiveMQ broker node that is intended to form the cluster as well as the IP address and port the node uses for TCP cluster transport.
Each HiveMQ node regularly checks all nodes that are included in the static discovery list and tries to include them in the cluster.
Inclusion of the HiveMQ broker node’s own IP and port in the cluster node list is recommended but not required |
When the focus is on providing high availability for the MQTT broker, static discovery with a fixed-size HiveMQ cluster deployment is often utilized.
Static discovery is also a viable option if multicast and broadcast are not available in the environment on which the HiveMQ cluster is deployed.
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<tcp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.1</bind-address>
<bind-port>7800</bind-port>
</tcp>
</transport>
<discovery>
<static>
<node>
<!-- replace this IP with the IP address of your interface -->
<host>192.168.1.1</host>
<port>7800</port>
</node>
<node>
<!-- replace this IP with the IP address of another node -->
<host>192.168.1.2</host>
<port>7800</port>
</node>
<node>
<!-- replace this IP with the IP address of another node -->
<host>192.168.1.3</host>
<port>7801</port>
</node>
</static>
</discovery>
</cluster>
...
</hivemq>
For increased failure resistance and stability, we recommend that you provide a full list of all HiveMQ broker nodes to each individual node. |
Multicast Discovery
Multicast discovery is a dynamic discovery mechanism that utilizes an IP multicast address configured with your UDP transport.
When you use multicast discovery, nodes on the defined IP multicast address regularly check for other nodes that listen on the same IP address.
If you have control over your network infrastructure or require an automatic discovery mechanism that detects cluster nodes when they start up in the same network, UDP multicast can work well.
Since multicast discovery is a master-master scenario, there is no single point of failure on the HiveMQ node side. With multicast discovery, you can simply add new HiveMQ nodes to quickly scale up your HiveMQ cluster.
UDP can provide a good starting point for evaluating the functionality of a HiveMQ cluster in your local test or development environment.
Before you set up a HiveMQ cluster with multicast discovery, make sure that multicast is enabled and configured correctly. Keep in mind that most cloud providers, including AWS, do not permit IP multicast (including UDP multicast). |
The following configuration example enables HiveMQ clustering with UDP multicast. HiveMQ instances automatically form a cluster when the nodes discover each other:
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<listeners>
<tcp-listener>
<port>1883</port>
<bind-address>0.0.0.0</bind-address>
</tcp-listener>
</listeners>
<cluster>
<enabled>true</enabled>
<transport>
<udp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.10</bind-address>
<bind-port>8000</bind-port>
<multicast-enabled>true</multicast-enabled>
<!-- replace this IP with the multicast IP address of your interface -->
<multicast-address>228.8.8.8</multicast-address>
<multicast-port>45588</multicast-port>
</udp>
</transport>
<discovery>
<multicast/>
</discovery>
</cluster>
</hivemq>
Cluster configuration is a complex subject. Various factors outside of HiveMQ can cause errors.
|
Broadcast Discovery
Broadcast discovery is a dynamic discovery mechanism that sends discovery messages over the IP broadcast address to find other cluster nodes in the same IP subnet.
The following broadcast discovery parameters are available:
Parameter | Default value | Description |
---|---|---|
broadcast-address |
255.255.255.255 |
Broadcast address to be used. This should be configured to the broadcast address of your subnet. Example: 192.168.1.255. |
port |
8555 |
Port on which the nodes exchange discovery messages. This port must be in the same port-range on all nodes. |
port-range |
5 |
Number of additional ports to check for other nodes. The range goes from |
When HiveMQ is deployed in an environment that allows broadcasting, broadcast discovery can be a viable option to create an elastic HiveMQ broker cluster with relative ease.
Typical use cases include testing, development, and integration environments as well as on premise infrastructure.
Most cloud providers do not allow broadcasting.
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<tcp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.1</bind-address>
<bind-port>7800</bind-port>
</tcp>
</transport>
<discovery>
<broadcast>
<!-- replace this IP with the broacast IP address of your subnet -->
<broadcast-address>192.168.1.255</broadcast-address>
<port>8555</port>
<port-range>5</port-range>
</broadcast>
</discovery>
</cluster>
...
</hivemq>
Broadcast discovery only works if all your nodes are in the same IP subnet. |
Extension Discovery
Extension discovery delegates the discovery of cluster nodes to a HiveMQ extension.
You can use extension discovery to extend HiveMQ with custom discovery logic to fulfill the needs of a specific use cases.
For more information on creating a custom discovery extension, see our Extension Guide.
Discovery extensions for common use cases such as dynamic discovery on AWS or Docker are available from the HiveMQ Marketplace.
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
...
<transport>
<tcp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.1</bind-address>
<bind-port>7800</bind-port>
</tcp>
</transport>
<discovery>
<extension/>
</discovery>
...
</cluster>
...
</hivemq>
If you want to use an extension for discovery, the discovery setting of your HiveMQ cluster configuration must be set to extension and a discovery extension must be installed.
|
Cluster Transport
Cluster transport determines the network protocol that is used to transfer information between the HiveMQ broker nodes in a cluster.
HiveMQ supports TCP, UDP, and TLS for cluster transport.
Transport type | Description |
---|---|
Communication within the cluster is done via TCP. |
|
Communication within the cluster is done via UDP. |
|
Communication within the cluster is done via TLS. |
TCP is the recommended transport protocol. |
TCP Transport
TCP (Transmission Control Protocol) is a standard Internet protocol that provides reliable, ordered, data delivery with error detection. Due to its reliability and widespread availability across all environments, TCP is the recommended transport protocol for HiveMQ broker clusters.
TCP transport can be configured with the following parameters:
Parameter | Default value | Description |
---|---|---|
bind-address |
null |
The network address to bind to. Example: 192.168.28.12 |
bind-port |
8000 |
The network port to listen on. |
external-address |
null |
The external address to use if the node is behind some kind of NAT (Network Address Translation). |
external-port |
0 |
The external port to use if the node is behind some kind of NAT. |
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<tcp>
<!-- replace this IP with the local IP address of your interface -->
<bind-address>192.168.1.2</bind-address>
<bind-port>8000</bind-port>
<!-- replace this IP with the external IP address of your interface -->
<external-address>10.10.10.10</external-address>
<external-port>8008</external-port>
</tcp>
</transport>
<discovery>
<broadcast/>
</discovery>
</cluster>
...
</hivemq>
TCP is the recommended transport protocol for HiveMQ clusters. |
UDP Transport
UDP (User Datagram Protocol) is a network protocol that uses a simple connectionless communication model.
UDP does not have mechanisms to handle unreliable networks, guarantee message delivery, or provide ordering or duplicate protection.
Since most cloud providers prohibit the use of multicast, UDP cannot be used as the transport protocol in these environments.
Because UDP multicast is a quick way to configure and establish a HiveMQ broker cluster, UDP can be useful as the HiveMQ cluster transport method for local testing and development environments.
However, UDP transport is not suitable for production environments.
UDP transport can be configured with the following parameters:
Parameter | Default Value | Description |
---|---|---|
bind-address |
null |
The network bind address. For example, 192.168.28.12. |
bind-port |
8000 |
The network listening port. |
external-address |
null |
If the node is behind Network Address Translation (NAT) such as a firewall, the external bind address. |
external-port |
0 |
If the node is behind NAT, the external bind port. |
multicast-enabled |
true |
If UDP multicast is used, multicast discovery must be enabled. |
mulitcast-address |
228.8.8.8 |
The multicast network bind address. |
multicast-port |
45588 |
The multicast listening port. |
The only cluster discovery method that can be used with UDP transport is multicast discovery.
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<udp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.2</bind-address>
<bind-port>8000</bind-port>
<multicast-enabled>true</multicast-enabled>
<!-- replace this IP with the multicast IP address of your interface -->
<multicast-address>228.8.8.8</multicast-address>
<multicast-port>45588</multicast-port>
</udp>
</transport>
<discovery>
<multicast/>
</discovery>
</cluster>
...
</hivemq>
We do not recommend the use of UDP transport for production environments. |
Secure TCP Transport with TLS
TLS is a cryptographic protocol that provides communications security. Taking advantage of this feature requires the use of TCP as cluster transport and adding a <tls>
configuration to it.
The primary purpose of using a secure TLS connection is to provide an encrypted connection between the HiveMQ broker nodes. This ensures privacy and data integrity. More details on the subject can be found in the Security chapter.
Encryption requires complex mathematical operations and calculation. This creates additional computing load on any service.
As HiveMQ is no exception to this, best performances can be achieved by providing a security layer on top of the TCP transport for the cluster, like a secure network zone for the HiveMQ cluster nodes.
Typical use cases for TLS encrypted TCP connection as cluster transport revolve around individual security requirements.
TLS transport can be configured using the following parameters:
Parameter | Default value | Required | Description |
---|---|---|---|
enabled |
false |
Enables TLS in the cluster transport. |
|
protocols |
All JVM enabled protocols |
Enables specific protocols. |
|
cipher-suites |
All JVM enabled cipher suites |
Enables specific cipher-suites |
|
server-keystore |
null |
The JKS key store configuration for the server certificate. |
|
server-certificate-truststore |
null |
The JKS trust store configuration, for trusting server certificates. |
|
client-authentication-mode |
NONE |
The client authentication mode, possibilities are NONE, OPTIONAL (client certificate is used if presented), REQUIRED (client certificate is required). |
|
client-authentication-keystore |
null |
The JKS key store configuration for the client authentication certificate. |
|
client-certificate-truststore |
null |
The JKS trust store configuration, for trusting client certificates. |
Parameter | Default value | Required | Description |
---|---|---|---|
path |
null |
The path to the JKS trust store. A key store where trusted certificates are stored. |
|
password |
null |
The password for the key store. |
|
private-key-password |
null |
The password for the private key. |
Parameter | Default value | Required | Description |
---|---|---|---|
path |
null |
The path for the JKS trust store that includes trusted certificates. |
|
password |
null |
The password for the trust store. |
When you use the same certificate for all your HiveMQ cluster nodes, you can use the same JKS (Java KeyStore) as key store and trust store. |
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<tcp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.2.31</bind-address>
<bind-port>7800</bind-port>
<tls>
<enabled>true</enabled>
<server-keystore>
<path>/path/to/the/key/server.jks</path>
<password>password-keystore</password>
<private-key-password>password-private-key</private-key-password>
</server-keystore>
<server-certificate-truststore>
<path>/path/to/the/trust/server.jks</path>
<password>password-truststore</password>
</server-certificate-truststore>
</tls>
</tcp>
</transport>
</cluster>
...
</hivemq>
Only the transport can be done over TLS. Discovery always uses plain TCP. |
Cluster Failure Detection
For added cluster stability and fault tolerance, HiveMQ provides several means of failure detection. To ensure fail over scenarios are detected quickly and efficiently, both failure detection mechanisms are enabled by default.
Mechanism | Description |
---|---|
Continuously sends a heartbeat between nodes. |
|
Holds an open TCP connection between nodes. |
Default values are suited for most HiveMQ deployments. Only apply changes, if there is a specific reason for it. |
Heartbeat
To ensure all currently connected cluster nodes are available and responding, a continuous heartbeat is sent between all nodes of the HiveMQ broker cluster.
A node that does not respond to a heartbeat within the configured time will be suspected as unavailable and removed from the cluster by the node that sent the initial heartbeat.
You can configure the heartbeat with the following parameters:
Parameter | Default value | Description |
---|---|---|
enabled |
true |
Enables the heartbeat. |
interval |
3000 (TCP) / 8000 (UDP) |
The interval in which a heartbeat message is sent to other nodes. |
timeout |
9000 (TCP) / 40000 (UDP) |
Amount of time that is tolerated for the response to a heartbeat message before a node is temporarily removed from the cluster. |
The port used for heartbeat can not be configured. The transport port will be used for this mechanism. (Default: 8000) |
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<udp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.1</bind-address>
<bind-port>7800</bind-port>
</udp>
</transport>
<discovery>
<multicast />
</discovery>
<failure-detection>
<heartbeat>
<enabled>true</enabled>
<interval>5000</interval>
<timeout>15000</timeout>
</heartbeat>
</failure-detection>
</cluster>
...
</hivemq>
TCP Health Check
The TCP health check holds an open TCP connection to other nodes for the purpose of recognizing a disconnecting node much faster than the heartbeat could. Additionally, the TCP health check enables nodes to disconnect immediately from a cluster.
You can configure the TCP health check with the following parameters:
Parameter | Default value | Description |
---|---|---|
enabled |
true |
Enables the TCP health check. |
bind-address |
null |
The network address to bind to. |
bind-port |
0 |
The port to bind to. 0 uses an ephemeral port. |
external-address |
null |
The external address to bind to if the node is behind some kind of NAT. |
external-port |
0 |
The external port to bind to if the node is behind some kind of NAT. |
port-range |
50 |
Port range to check on other nodes. |
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
<enabled>true</enabled>
<transport>
<udp>
<!-- replace this IP with the IP address of your interface -->
<bind-address>192.168.1.1</bind-address>
<bind-port>7800</bind-port>
</udp>
</transport>
<discovery>
<multicast />
</discovery>
<failure-detection>
<tcp-health-check>
<enabled>true</enabled>
<!-- replace this IP with the local IP address of your interface -->
<bind-address>1.2.3.4</bind-address>
<bind-port>0</bind-port>
<!-- replace this IP with the external IP address of your interface -->
<external-address>10.10.2.30</external-address>
<external-port>0</external-port>
<port-range>50</port-range>
</tcp-health-check>
</failure-detection>
</cluster>
...
</hivemq>
Cluster Replicas
HiveMQ broker clusters replicate stored data across nodes dynamically. Replication guarantees that each piece of persistent data is available on more than one node.
You can configure the number of replicas that must be persisted across the nodes of a HiveMQ cluster before each node deems the replication sufficient.
Each time the HiveMQ cluster size changes, all persisted data is redistributed to ensure the configured replica count is upheld.
The default replica count for your HiveMQ cluster is 2
. When the replica count value is 2
, HiveMQ ensures that two copies of each persisted data set are available in the cluster at all times (one original and one replica).
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
...
<cluster>
...
<replication>
<replica-count>2</replica-count>
</replication>
</cluster>
...
</hivemq>
The maximum replica count for a cluster equals the number of nodes in the cluster. For example, the maximum replica count for a 3-node HiveMQ cluster is 3 . When your replica count matches the size of your cluster, full replication is established and all pieces of persistent data are stored on every node.Typically, the number of nodes in the cluster is higher than the replica count that you configure. If the replica count in your configuration exceeds the number of nodes in the cluster, HiveMQ automatically reduces replication to the maximum. When you configure a replica count, keep in mind that the number of nodes in elastic HiveMQ cluster deployments can increase and decrease. |
Rolling Upgrade
Before you upgrade your cluster, check our Upgrade Guide to see if additional or manual steps are required to upgrade from your current HiveMQ version to your desired HiveMQ version.
HiveMQ uses semantic versioning (major.minor.patch). Before you upgrade to a new minor version, we recommend that you upgrade your cluster to the latest version of your current minor version. For example, upgrade to the most recent version of 4.1.x before you upgrade to 4.2.x.
Rolling upgrades are supported from one minor version to the next (i.e. 4.1.x to 4.2.x). In case you want to upgrade across multiple minor versions, make sure to do so in iterative steps of one minor version. (i.e. to get from 4.1.x to 4.4.x the steps are: 4.1.x to 4.2.x to 4.3.x to 4.4.x) |
Rolling Upgrade of Your Cluster
-
Add a node with the desired new version to your cluster. We recommend this optional step. For more information, see steady resource utilization.
-
Shut down one cluster node that currently runs the old version.
-
Update the node that you shut down with the desired new version.
-
Restart the updated node.
-
Repeat steps 2-4 of this procedure with each node in the cluster that runs the old version.
-
Once all nodes in the cluster are updated to the desired new version, remove the optional node that you added at the start of this procedure.
Maintain steady resource utilization during your upgrade
Before you shut down a node with an old version, we recommend that you add a node to your cluster that is configured with the target version. This method is particularly important for clusters that have high resource utilization. For example, if your cluster runs with over 60% CPU usage. When you remove a node from your cluster during the upgrade process, the workload of the node you remove is redistributed over the nodes that remain in the cluster. This redistributed workload increases the amount of work each remaining node must do. The addition of a new node, before you remove any of the existing nodes, helps you maintain a steady workload in your cluster as you upgrade each node to the new version. Once all previously-existing nodes are upgraded, you can remove the node that you added to the cluster. |
Node Synchronization
Before you shut down or start nodes in your cluster, check your HiveMQ log files to make sure that the synchronization of the node with the cluster is complete.
Wait until the log file of the node that you started or added to a cluster shows the INFO - Started HiveMQ in …
log statement.
This statement indicates that the node is synchronized with the cluster.
During the synchronization process, the following log entries show you the current state of the node:
Log Entry (INFO Level) | Description |
---|---|
Starting cluster join process. This may take a while. Please do not shut down HiveMQ. |
Node has started to join the cluster. |
Cluster join process is still ongoing. Please do not shut down HiveMQ. |
Node is in the process of joining the cluster. |
Finished cluster join process successfully. |
Node has completed the join process. |
Starting cluster merge process. This may take a while. Please do not shut down HiveMQ. |
Node that is already part of the cluster, but was temporarily unreachable (i.e. through network split) is back and needs synchronization. |
Cluster merge process is still ongoing. Please do not shut down HiveMQ. |
Node is merging with the cluster. |
Finished cluster merge process successfully. |
Node has completed the merge process. |
Starting cluster replication process. This may take a while. Please do not shut down HiveMQ. |
Node is starting to leave the cluster. Preparation of cluster replication has begun. |
Replication is still in progress. Please do not shut down HiveMQ. |
Node is in the process of leaving the cluster. Cluster replication in progress. |
Finished cluster replication successfully in {}ms. |
Node is ready to leave the cluster. Cluster replication is successful. |
Restart the Cluster with Persistent Data
On startup, HiveMQ automatically moves the persistent data to the hivemq/data/cluster-backup
folder.
Persistent data
The persistent data includes the following items:
|
This automatic backup ensures that no data is lost when a node restarts. Each piece of persistent data is replicated to the remaining nodes. The replica-count setting controls the number of copies for each piece of persistent data the cluster maintains. By default the HiveMQ replica count is 2 (one original and one copy). To ensure that there is no loss of persistent data when you shut down or restart nodes in your cluster, verify that your replica count is set correctly:
Replica count for persistent data
If your cluster has persistent data, we highly recommended a |
Shut Down Your Cluster
To ensure proper replication of data and avoid data loss, always shut down your cluster one node at a time.
The last node that you shut down must have enough disk space to store the data of the entire cluster.
When you restart the cluster, the last node that you shut down is the first node to start.
Restart Your Cluster
Go to the hivemq/bin
folder of the last instance that you shut down. Execute the recovery.sh
file.
The recovery.sh
file starts HiveMQ but does not move persistent data to a backup folder.
As soon as the first instance is running, you can start the other instances with the run.sh
file as usual.
We do not recommend or support starting more than one HiveMQ instance with the recovery.sh
file.
Use of the recovery.sh
file to start multiple HiveMQ instances can create inconsistency in your cluster.