HiveMQ MQTT Broker Clusters

One of the outstanding features of HiveMQ is the ability to form resilient, highly available, and ultra-scalable MQTT broker clusters.

An MQTT broker cluster is a distributed system that acts as a single logical MQTT broker for connected MQTT clients. Typically, the nodes of an MQTT cluster are installed on separate physical or virtual machines and connected over a network.

Clustering ensures that your MQTT communication has no single point of failure and is indispensable for building a service with high availability. If one MQTT broker node in the cluster is no longer available, the remaining nodes can handle the traffic from your MQTT clients.

HiveMQ employs a sophisticated cluster design that is specifically designed for MQTT. HiveMQ MQTT broker clusters implement a distributed masterless cluster architecture that provides true horizontal scalability. Each HiveMQ broker node can handle hundreds of thousands to millions of concurrently connected MQTT clients.

Each HiveMQ cluster can grow and shrink elastically at runtime with no loss of data or decrease in availability.

We highly recommend HiveMQ clusters for your production IoT deployments.

Prerequisites

To form a HiveMQ cluster, the following features must be enabled on each broker node:

Clustering
Cluster discovery
Cluster transport

Enable Clustering

When you enable a HiveMQ cluster, it is important to select the type of transport (TCP/UDP) and discovery that is right for your individual use case.
For more information, see Cluster Discovery and Cluster Transport.

The following example configuration uses static discovery and TCP transport to form a HiveMQ broker cluster:

Example HiveMQ cluster configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>
        <transport>
           <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
           </tcp>
        </transport>
        <discovery>
            <static>
                <node>
                    <!-- replace this IP with the IP address of your interface -->
                    <host>192.168.1.1</host>
                    <port>7800</port>
                </node>
                <node>
                    <!-- replace this IP with the IP address of another node -->
                    <host>192.168.1.2</host>
                    <port>7800</port>
                </node>
            </static>
        </discovery>

    </cluster>
    ...
</hivemq>

This example shows the minimal configuration to enable a HiveMQ cluster with the default values:

Example minimal HiveMQ cluster configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>
    </cluster>
    ...
</hivemq>

If you do not explicitly define the transport and discovery in your cluster configuration, HiveMQ uses UDP for data transport and multicast as the discovery mechanism.

The minimal HiveMQ cluster configuration is only recommended for testing purposes in local networks.

Cluster Discovery

Discovery refers to the mechanism that individual HiveMQ broker nodes use to find each other and form a HiveMQ broker cluster.
Both static and dynamic discovery mechanisms are available.
Dynamic discovery methods are ideal for increasing the size of the HiveMQ broker cluster during runtime.
Static discovery is useful for HiveMQ deployments that maintain a fixed cluster size.
HiveMQ supports several cluster discovery mechanisms.

Not all discovery mechanisms and transport protocols are compatible.

Table 1. Available cluster discovery mechanisms
Name	UDP transport	TCP transport	Dynamic	Description
static				Checks a static list of nodes with their TCP bind ports and IP addresses
multicast				Finds all nodes that use the same multicast address and port
broadcast				Finds all nodes in the same subnet by using the IP broadcast address.
extension				Uses information provided by a HiveMQ extension to discover cluster nodes

Additional custom mechanisms can be implemented with extension discovery.

Static Discovery

When you use the static cluster discovery mechanism, your configuration must list each HiveMQ broker node that is intended to form the cluster as well as the IP address and port the node uses for TCP cluster transport.
Each HiveMQ node regularly checks all nodes that are included in the static discovery list and tries to include them in the cluster.

Inclusion of the HiveMQ broker node’s own IP and port in the cluster node list is recommended but not required

When the focus is on providing high availability for the MQTT broker, static discovery with a fixed-size HiveMQ cluster deployment is often utilized.
Static discovery is also a viable option if multicast and broadcast are not available in the environment on which the HiveMQ cluster is deployed.

Example static discovery configuration for a 3-node HiveMQ cluster

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>

        <enabled>true</enabled>

        <transport>
            <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <static>
                <node>
                    <!-- replace this IP with the IP address of your interface -->
                    <host>192.168.1.1</host>
                    <port>7800</port>
                </node>
                <node>
                    <!-- replace this IP with the IP address of another node -->
                    <host>192.168.1.2</host>
                    <port>7800</port>
                </node>
                <node>
                    <!-- replace this IP with the IP address of another node -->
                    <host>192.168.1.3</host>
                    <port>7801</port>
                </node>
            </static>
        </discovery>

    </cluster>
    ...
</hivemq>

For increased failure resistance and stability, we recommend that you provide a full list of all HiveMQ broker nodes to each individual node.

Multicast Discovery

Multicast discovery is a dynamic discovery mechanism that utilizes an IP multicast address configured with your UDP transport.
When you use multicast discovery, nodes on the defined IP multicast address regularly check for other nodes that listen on the same IP address.

If you have control over your network infrastructure or require an automatic discovery mechanism that detects cluster nodes when they start up in the same network, UDP multicast can work well.
Since multicast discovery is a master-master scenario, there is no single point of failure on the HiveMQ node side. With multicast discovery, you can simply add new HiveMQ nodes to quickly scale up your HiveMQ cluster.

UDP can provide a good starting point for evaluating the functionality of a HiveMQ cluster in your local test or development environment.

Before you set up a HiveMQ cluster with multicast discovery, make sure that multicast is enabled and configured correctly.
Keep in mind that most cloud providers, including AWS, do not permit IP multicast (including UDP multicast).

The following configuration example enables HiveMQ clustering with UDP multicast. HiveMQ instances automatically form a cluster when the nodes discover each other:

Example configuration for an elastic UDP multicast cluster

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <listeners>
        <tcp-listener>
            <port>1883</port>
            <bind-address>0.0.0.0</bind-address>
        </tcp-listener>
    </listeners>

    <cluster>
        <enabled>true</enabled>
        <transport>
            <udp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.10</bind-address>
                <bind-port>8000</bind-port>
                <multicast-enabled>true</multicast-enabled>
                <!-- replace this IP with the multicast IP address of your interface -->
                <multicast-address>228.8.8.8</multicast-address>
                <multicast-port>45588</multicast-port>
            </udp>
        </transport>
        <discovery>
            <multicast/>
        </discovery>
    </cluster>

</hivemq>

Cluster configuration is a complex subject. Various factors outside of HiveMQ can cause errors.
Here are some common obstacles to HiveMQ clustering with UDP multicast:

A firewall is enabled and configured in a way that prevents clustering. (TIP: disable the firewall for testing)
The cluster nodes are not in the same network.
Multicast is not enabled on your machine.
The switch/router that is in use does not support multicast.

Broadcast Discovery

Broadcast discovery is a dynamic discovery mechanism that sends discovery messages over the IP broadcast address to find other cluster nodes in the same IP subnet.

The following broadcast discovery parameters are available:

Parameter Default Value Description

Parameter	Default Value	Description
broadcast-address	255.255.255.255	Broadcast address to be used. This should be configured to the broadcast address of your subnet. Example: 192.168.1.255.
port	8555	Port on which the nodes exchange discovery messages. This port must be in the same port-range on all nodes.
port-range	5	Number of additional ports to check for other nodes. The range goes from `port` to `port+range`.

broadcast-address

255.255.255.255

Broadcast address to be used. This should be configured to the broadcast address of your subnet. Example: 192.168.1.255.

port

8555

Port on which the nodes exchange discovery messages. This port must be in the same port-range on all nodes.

port-range

Number of additional ports to check for other nodes. The range goes from port to port+range.

When HiveMQ is deployed in an environment that allows broadcasting, broadcast discovery can be a viable option to create an elastic HiveMQ broker cluster with relative ease.
Typical use cases include testing, development, and integration environments as well as on premise infrastructure.
Most cloud providers do not allow broadcasting.

Example broadcast discovery configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>

        <transport>
            <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <broadcast>
                <!-- replace this IP with the broacast IP address of your subnet -->
                <broadcast-address>192.168.1.255</broadcast-address>
                <port>8555</port>
                <port-range>5</port-range>
            </broadcast>
        </discovery>

    </cluster>
    ...
</hivemq>

Broadcast discovery only works if all your nodes are in the same IP subnet.

Extension Discovery

Extension discovery delegates the discovery of cluster nodes to a HiveMQ extension.

You can use extension discovery to extend HiveMQ with custom discovery logic to fulfill the needs of a specific use cases.
For more information on creating a custom discovery extension, see our Extension Guide.

Discovery extensions for common use cases such as dynamic discovery on AWS or Docker are available from the HiveMQ website.

Example extension discovery configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>
        ...

        <transport>
            <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <extension/>
        </discovery>

        ...
    </cluster>
    ...
</hivemq>

If you want to use an extension for discovery, the discovery setting of your HiveMQ cluster configuration must be set to extension and a discovery extension must be installed.

Cluster Transport

Cluster transport determines the network protocol that is used to transfer information between the HiveMQ broker nodes in a cluster.
HiveMQ supports TCP, UDP, and TLS for cluster transport.

Table 2. Available cluster transport types
Transport type	Description
TCP	Communication within the cluster is done via TCP.
UDP	Communication within the cluster is done via UDP.
TLS	Communication within the cluster is done via TLS.

TCP is the recommended transport protocol.

TCP Transport

TCP (Transmission Control Protocol) is a standard Internet protocol that provides reliable, ordered, data delivery with error detection. Due to its reliability and widespread availability across all environments, TCP is the recommended transport protocol for HiveMQ broker clusters.

TCP transport can be configured with the following parameters:

Table 3. Available TCP transport parameters
Parameter	Default Value	Description
bind-address	null	The network address to bind to. Example: 192.168.28.12
bind-port	8000	The network port to listen on.
external-address	null	The external address to use if the node is behind some kind of NAT (Network Address Translation).
external-port	0	The external port to use if the node is behind some kind of NAT.

Example TCP transport configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>

        <enabled>true</enabled>

        <transport>
            <tcp>
                <!-- replace this IP with the local IP address of your interface -->
                <bind-address>192.168.1.2</bind-address>
                <bind-port>8000</bind-port>

                <!-- replace this IP with the external IP address of your interface -->
                <external-address>10.10.10.10</external-address>
                <external-port>8008</external-port>
            </tcp>
        </transport>

        <discovery>
            <broadcast/>
        </discovery>

    </cluster>
    ...
</hivemq>

TCP is the recommended transport protocol for HiveMQ clusters.

UDP Transport

UDP (User Datagram Protocol) is a network protocol that uses a simple connectionless communication model. UDP does not have mechanisms to handle unreliable networks, guarantee message delivery, or provide ordering or duplicate protection.
Since most cloud providers prohibit the use of multicast, UDP cannot be used as the transport protocol in these environments.
Because UDP multicast is a quick way to configure and establish a HiveMQ broker cluster, UDP can be useful as the HiveMQ cluster transport method for local testing and development environments. However, UDP transport is not suitable for production environments.

UDP transport can be configured with the following parameters:

Table 4. Available UDP transport parameters
Parameter	Default Value	Description
bind-address	null	The network bind address. For example, 192.168.28.12.
bind-port	8000	The network listening port.
external-address	null	If the node is behind Network Address Translation (NAT) such as a firewall, the external bind address.
external-port	0	If the node is behind NAT, the external bind port.
multicast-enabled	true	If UDP multicast is used, multicast discovery must be enabled.
mulitcast-address	228.8.8.8	The multicast network bind address.
multicast-port	45588	The multicast listening port.

The only cluster discovery method that can be used with UDP transport is multicast discovery.

Example UDP transport configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>

        <enabled>true</enabled>

        <transport>
            <udp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.2</bind-address>
                <bind-port>8000</bind-port>

                <multicast-enabled>true</multicast-enabled>
                <!-- replace this IP with the multicast IP address of your interface -->
                <multicast-address>228.8.8.8</multicast-address>
                <multicast-port>45588</multicast-port>
            </udp>
        </transport>

        <discovery>
            <multicast/>
        </discovery>

    </cluster>
    ...

</hivemq>

We do not recommend the use of UDP transport for production environments.

Secure TCP Transport with TLS

TLS is a cryptographic protocol that provides communications security. Taking advantage of this feature requires the use of TCP as cluster transport and adding a <tls> configuration to it.
The primary purpose of using a secure TLS connection is to provide an encrypted connection between the HiveMQ broker nodes. This ensures privacy and data integrity. More details on the subject can be found in the Security chapter.
Encryption requires complex mathematical operations and calculation. This creates additional computing load on any service.
As HiveMQ is no exception to this, best performances can be achieved by providing a security layer on top of the TCP transport for the cluster, like a secure network zone for the HiveMQ cluster nodes.
Typical use cases for TLS encrypted TCP connection as cluster transport revolve around individual security requirements.

TLS transport can be configured using the following parameters:

Table 5. Available TLS transport parameters
Parameter	Default Value	Description
enabled	false	Enables TLS in the cluster transport.
protocols	All JVM enabled protocols	Enables specific protocols.
cipher-suites	All JVM enabled cipher suites	Enables specific cipher-suites
server-keystore	null	The JKS key store configuration for the server certificate.
server-certificate-truststore	null	The JKS trust store configuration, for trusting server certificates.
client-authentication-mode	NONE	The client authentication mode, possibilities are NONE, OPTIONAL (client certificate is used if presented), REQUIRED (client certificate is required).
client-authentication-keystore	null	The JKS key store configuration for the client authentication certificate.
client-certificate-truststore	null	The JKS trust store configuration, for trusting client certificates.

Table 6. Available TLS key store parameters
Parameter	Default Value	Description
path	null	The path to the JKS trust store. A key store where trusted certificates are stored.
password	null	The password for the key store.
private-key-password	null	The password for the private key.

Table 7. Available TLS trust store parameters
Parameter	Default Value	Required	Description
path	null		The path for the JKS trust store that includes trusted certificates.
password	null		The password for the trust store.

When you use the same certificate for all your HiveMQ cluster nodes, you can use the same JKS (Java KeyStore) as key store and trust store.

Example TLS transport configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    ...
    <cluster>
        <enabled>true</enabled>
        <transport>
           <tcp>
                <!-- replace this IP with the IP address of your interface -->
               <bind-address>192.168.2.31</bind-address>
               <bind-port>7800</bind-port>
               <tls>
                 <enabled>true</enabled>
                 <server-keystore>
                    <path>/path/to/the/key/server.jks</path>
                    <password>password-keystore</password>
                    <private-key-password>password-private-key</private-key-password>
                 </server-keystore>
                 <server-certificate-truststore>
                    <path>/path/to/the/trust/server.jks</path>
                    <password>password-truststore</password>
                 </server-certificate-truststore>
               </tls>
           </tcp>
        </transport>
    </cluster>
    ...
</hivemq>

Only the transport can be done over TLS. Discovery always uses plain TCP.

Cluster Failure Detection

For added cluster stability and fault tolerance, HiveMQ provides several means of failure detection. To ensure fail over scenarios are detected quickly and efficiently, both failure detection mechanisms are enabled by default.

Table 8. Failure detection mechanisms
Mechanism	Description
Heartbeat	Continuously sends a heartbeat between nodes.
TCP health check	Holds an open TCP connection between nodes.

Default values are suited for most HiveMQ deployments. Only apply changes, if there is a specific reason for it.

Heartbeat

To ensure all currently connected cluster nodes are available and responding, a continuous heartbeat is sent between all nodes of the HiveMQ broker cluster.
A node that does not respond to a heartbeat within the configured time will be suspected as unavailable and removed from the cluster by the node that sent the initial heartbeat.

You can configure the heartbeat with the following parameters:

Table 9. Available heartbeat parameters
Parameter	Default Value	Description
enabled	true	Enables the heartbeat.
interval	3000 (TCP) / 8000 (UDP)	The interval in which a heartbeat message is sent to other nodes.
timeout	9000 (TCP) / 40000 (UDP)	Amount of time that is tolerated for the response to a heartbeat message before a node is temporarily removed from the cluster.

The port used for heartbeat can not be configured. The transport port will be used for this mechanism. (Default: 8000)

Example heartbeat configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>

        <transport>
            <udp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </udp>
        </transport>

        <discovery>
            <multicast />
        </discovery>

        <failure-detection>
            <heartbeat>
                <enabled>true</enabled>
                <interval>5000</interval>
                <timeout>15000</timeout>
            </heartbeat>
        </failure-detection>
    </cluster>
    ...
</hivemq>

TCP Health Check

The TCP health check holds an open TCP connection to other nodes for the purpose of recognizing a disconnecting node much faster than the heartbeat could. Additionally, the TCP health check enables nodes to disconnect immediately from a cluster.

You can configure the TCP health check with the following parameters:

Table 10. Available TCP health check parameters
Parameter	Default Value	Description
enabled	true	Enables the TCP health check.
bind-address	null	The network address to bind to.
bind-port	0	The port to bind to. 0 uses an ephemeral port.
external-address	null	The external address to bind to if the node is behind some kind of NAT.
external-port	0	The external port to bind to if the node is behind some kind of NAT.
port-range	50	Port range to check on other nodes.

Example TCP health check configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>

        <transport>
            <udp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </udp>
        </transport>

        <discovery>
            <multicast />
        </discovery>

        <failure-detection>
            <tcp-health-check>
                <enabled>true</enabled>
                <!-- replace this IP with the local IP address of your interface -->
                <bind-address>1.2.3.4</bind-address>
                <bind-port>0</bind-port>
                <!-- replace this IP with the external IP address of your interface -->
                <external-address>10.10.2.30</external-address>
                <external-port>0</external-port>
                <port-range>50</port-range>
            </tcp-health-check>
        </failure-detection>
    </cluster>
    ...
</hivemq>

Cluster Replicas

HiveMQ broker clusters replicate stored data across nodes dynamically. Replication guarantees that each piece of persistent data is available on more than one node.
You can configure the number of replicas that must be persisted across the nodes of a HiveMQ cluster before each node deems the replication sufficient.
Each time the HiveMQ cluster size changes, all persisted data is redistributed to ensure the configured replica count is upheld.

The default replica count for your HiveMQ cluster is 2. When the replica count value is 2, HiveMQ ensures that two copies of each persisted data set are available in the cluster at all times (one original and one replica).

Example cluster replication configuration

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        ...
        <replication>
            <replica-count>2</replica-count>
        </replication>
    </cluster>
    ...
</hivemq>

The maximum replica count for a cluster equals the number of nodes in the cluster. For example, the maximum replica count for a 3-node HiveMQ cluster is 3. When your replica count matches the size of your cluster, full replication is established and all pieces of persistent data are stored on every node.
Typically, the number of nodes in the cluster is higher than the replica count that you configure.
If the replica count in your configuration exceeds the number of nodes in the cluster, HiveMQ automatically reduces replication to the maximum.
When you configure a replica count, keep in mind that the number of nodes in elastic HiveMQ cluster deployments can increase and decrease.

Rolling Upgrade

Before you upgrade your cluster, check our Upgrade Guide to see if additional or manual steps are required to upgrade from your current HiveMQ version to your desired HiveMQ version.

HiveMQ uses semantic versioning (major.minor.patch).

Each monthly release is compatible with the next 18 monthly releases which automatically includes the next LTS (long-term support) release. For more information, see HiveMQ Rolling Upgrade Policy.

Rolling Upgrade of Your Cluster

Add a node with the desired new version to your cluster. We recommend this optional step. For more information, see steady resource utilization.
Shut down one cluster node that currently runs the old version.
Update the node that you shut down with the desired new version.
Restart the updated node.
Repeat steps 2-4 of this procedure with each node in the cluster that runs the old version.
Once all nodes in the cluster are updated to the desired new version, remove the optional node that you added at the start of this procedure.

Maintain steady resource utilization during your upgrade

Before you shut down a node with an old version, we recommend that you add a node to your cluster that is configured with the target version. This method is particularly important for clusters that have high resource utilization. For example, if your cluster runs with over 60% CPU usage. When you remove a node from your cluster during the upgrade process, the workload of the node you remove is redistributed over the nodes that remain in the cluster. This redistributed workload increases the amount of work each remaining node must do. The addition of a new node, before you remove any of the existing nodes, helps you maintain a steady workload in your cluster as you upgrade each node to the new version. Once all previously-existing nodes are upgraded, you can remove the node that you added to the cluster.

Node Synchronization

Before you shut down or start nodes in your cluster, check your HiveMQ log files to make sure that the synchronization of the node with the cluster is complete.

Wait until the log file of the node that you started or added to a cluster shows the INFO - Started HiveMQ in … log statement. This statement indicates that the node is synchronized with the cluster.

During the synchronization process, the following log entries show you the current state of the node:

Table 11. Synchronization process:
Log Entry (INFO Level)	Description
Starting cluster join process. This may take a while. Please do not shut down HiveMQ.	Node has started to join the cluster.
Cluster join process is still ongoing. Please do not shut down HiveMQ.	Node is in the process of joining the cluster.
Finished cluster join process successfully.	Node has completed the join process.
Starting cluster merge process. This may take a while. Please do not shut down HiveMQ.	Node that is already part of the cluster, but was temporarily unreachable (i.e. through network split) is back and needs synchronization.
Cluster merge process is still ongoing. Please do not shut down HiveMQ.	Node is merging with the cluster.
Finished cluster merge process successfully.	Node has completed the merge process.
Starting cluster replication process. This may take a while. Please do not shut down HiveMQ.	Node is starting to leave the cluster. Preparation of cluster replication has begun.
Replication is still in progress. Please do not shut down HiveMQ.	Node is in the process of leaving the cluster. Cluster replication in progress.
Finished cluster replication successfully in {}ms.	Node is ready to leave the cluster. Cluster replication is successful.

Cluster Disaster Recovery

In the event that a HiveMQ cluster experiences a partial or full loss of state, HiveMQ disaster recovery mechanisms ensure that you can quickly restore the availability of your cluster and prevent permanent loss of persistent data.

Disaster Detection

HiveMQ immediately logs a warning notification if all nodes or more nodes than your currently configured replication count fail in a HiveMQ cluster. The automatic warning message alerts you to the fact that not all replicas in the effected cluster are currently reachable due to the number of nodes that rapidly left the cluster. The log message also tells you the replication count that is currently configured for the cluster and the local time when the event was detected.

In most cases, a loss of state in your cluster is temporary and your HiveMQ cluster automatically recovers after a short period of time without any issues. For example, if the cluster experiences brief interruptions due to a loss of network connectivity.

When the number of nodes that fail and permanently leave a cluster exceeds the necessary replication count, persistent data can become unavailable in the cluster.

For more information on how the HiveMQ broker dynamically replicates data across cluster nodes, see Cluster Replicas.

Based on your individual use case, you can decide the level of action the warning notification requires.

Live Backup Import

Since HiveMQ 4.9.0, it is possible to import your HiveMQ backup into a running HiveMQ cluster that contains data and has active client sessions. This capability ensures that you can get your cluster up and running quickly and then restore your persistent data without any permanent loss of data from your backup files.

To learn more bout the way HiveMQ resolves possible conflicts in the live import of persistent data, see Restore from backup file.

HiveMQ Recovery Tool

The HiveMQ Recovery Tool is a command line (CLI) tool that transforms the persisted data from your HiveMQ cluster into a format that you can live-import into a running HiveMQ cluster. The resulting files do not contain duplicates that can result from the replication of the data in the cluster.

The recovery tool takes the data folders from the nodes of your HiveMQ cluster as input and produces a HiveMQ backup file that is ready for live import into the running HiveMQ cluster.

Requirements

The HiveMQ Recovery Tool must run on a machine that has access to the data that you want to transform.

For the best performance, do not save the data from your HiveMQ cluster and run the HiveMQ Recovery Tool on the same machine as the HiveMQ cluster that you want to restore.

Installation

The HiveMQ Recovery Tool is provided as a zip file. Run scripts are located in the bin folder of the zip archive.

Configuration

To run the HiveMQ Recovery Tool, execute the following string in the console and configure the recovery tool parameters as needed to match your individual use case:

hivemq-recovery-tool <parameters>

The recovery tool supports the following parameters:

Table 12. Available HiveMQ Recovery Tool parameters
Parameter	Required	Description
`-i` or `--import`		Defines the absolute paths to the data folders extracted from the HiveMQ broker nodes. Separate individual paths with a whitespace character. If the path contains a whitespace, enclose the path with quotation marks.
`-e` or `--export`		The absolute path to the directory that is used to store the exported data.
`-t` or `--threads`		Optional parameter to define the number of threads HiveMQ uses to conduct the export. The default value is the processor count. The default setting demonstrated the best performance during testing. You can adjust the threads parameter to suite your hardware.
`-so` or `--sessions-only`		Optional setting to export only sessions and subscriptions. When this parameter is set to `true`, the recovery tool does not export queued messages, retained messages, or shared subscriptions. The default setting is `false`
`-io` or `--internal-option`		Optional setting to specify internal options that are set in your HiveMQ `config.xml` file. For example, "-io=key=value".

Example Recovery Tool Usage

The following command creates a zip archive with sessions and subscriptions data only in the $HOME/export folder and uses eight threads to conduct the export.

Example recovery tool configuration

<recovery-tool-home>/bin/hivemq-recovery-tool -i $HOME/data1 -i $HOME/data2 -i $HOME/data3 -e $HOME/export

Recovery Quick Start Guide

The following guide shows you the basic steps to restore the availability of your cluster, save the persistent data from all failed nodes to a backup file, and import your data back into a running HiveMQ cluster.

For mission-critical deployments that require high durability as well as high availability, best practice is to have an appropriate recovery run book on hand. You can use this recovery guide as a starting point to create a procedure that fits your needs or contact us for additional expert advice and support.

Step 1: Save persistent data and restore cluster availability

To prevent data from being overwritten, move your current data folder (<HIVEMQ-HOME>/data) to another location. For example, to a <HIVEMQ-HOME>/data-backup folder.
Restart the entire cluster or the failed nodes to restore cluster availability.

HiveMQ automatically keeps one previous copy of the data folder (in case you miss the step to move your existing data folder).

Step 2: Recover persistent data

Start another machine with sufficient CPU and memory (minimum 4CPU, 4GB RAM) and sufficient disk space (at least 2.5 times the size of all your saved data folders combined). Available hardware resources, especially disks (SSDs preferred) have a direct effect on the duration of the recovery process.
Copy all the data folders you saved to the selected machine. For example, use SCP (secure copy).
Download the HiveMQ Recovery Tool from the HiveMQ website.
Run the HiveMQ Recovery Tool and specify all your saved data folders as parameters. This step can take multiple minutes up to hours depending on the amount of data that has to be restored.
To run the recovery tool, enter the following command:
Recovery tool command pattern
```
<recovery-tool-home>/bin/hivemq-recovery-tool -i <path-to>/data1 -i <path-to>/data2 -i <path-to>/data3 -e <path-to>/your-output-folder
```
Example recovery tool command:
```
<recovery-tool-home>/bin/hivemq-recovery-tool -i /home/user1/files/data1 -i /home/user1/files/data2 -i /home/user1/files/data3 -e /home/user1/recovered
```
For more information on HiveMQ Recovery Tool options, see Recovery Tool Parameters.

Step 3: Restore persistent data to your running cluster (HiveMQ Control Center or REST API)

Make sure the user who needs to run the restore process has the correct permissions to access the backup file and perform the required actions.

Restore from HiveMQ Control Center

To restore data to a running HiveMQ cluster through your HiveMQ Control Center, open your HiveMQ Control Center and navigate to the Admin | Backup page via your web browser.
On the Backup page, upload the backup file that you generated with the recovery tool.
After you upload, you can track the progress of your backup on the Backup page of the Control Center.

Restore with HiveMQ REST API

To restore data to a running HiveMQ cluster with the HiveMQ REST API, copy the backup file that you generated with the recovery tool to one of the running HiveMQ nodes in your cluster.
Place the backup file in the <HIVEMQ_HOME>/backup/<BACKUP_TIMESTAMP>/<BACKUP_FILE> folder of your HiveMQ instance.

Example backup file path format :
For the /opt/hivemq/ ` HiveMQ home folder and the `20220827-155127.hivemq-4.9.0.backup generated import file, the
path to copy file to is /opt/hivemq/backup/20220827-155127/20220827-155127.hivemq-4.9.0.backup.
Call the REST API endpoint of HiveMQ to import a backup and use <BACKUP_TIMESTAMP> as backupId for the REST call.

Example REST API call to import a backup:
Generated file: 20220827-155127.hivemq-4.9.0.backup
REST API call: POST /api/v1/management/backups/20220827-155127
You can view the progress of your backup import on the Admin | Backup page of your HiveMQ Control Center or request backup import information for the REST API endpoint.
Use <BACKUP_TIMESTAMP> as backupId for your REST call.
The state field of the REST API response shows the current state of the import.

The backup import is in progress as long as the state RESTORE_IN_PROGRESS is returned. Your backup import is finished when the state changes to RESTORE_COMPLETED.

Example REST API call to view import progress:
For the generated import file: 20220827-155127.hivemq-4.9.0.backup the API call is as follows:
GET /api/v1/management/backups/20220827-155127

Restart the Cluster with Persistent Data

On startup, HiveMQ automatically moves the persistent data to the hivemq/data/cluster-backup folder.

Persistent data

The persistent data includes the following items:

Retained messages
Subscriptions of persistent-session clients
Queued messages of persistent-session clients
The current state of each individual message transmission that is not yet completed
Schemas and policies

This automatic backup ensures that no data is lost when a node restarts. Each piece of persistent data is replicated to the remaining nodes. The replica-count setting controls the number of copies for each piece of persistent data the cluster maintains. By default the HiveMQ replica count is 2 (one original and one copy). To ensure that there is no loss of persistent data when you shut down or restart nodes in your cluster, verify that your replica count is set correctly:

Replica count for persistent data

If your cluster has persistent data, we highly recommended a replica-count of at least two (default).
If the replica-count on any node in your cluster is set to less than two, restarting a node will cause loss of persistent data.
For more information, see Cluster Replicas.

Shut Down Your Cluster

To ensure proper replication of data and avoid data loss, always shut down your cluster one node at a time.
The last node that you shut down must have enough disk space to store the data of the entire cluster.
When you restart the cluster, the last node that you shut down is the first node to start.

Restart Your Cluster

Go to the hivemq/bin folder of the last instance that you shut down. Execute the recovery.sh file.
The recovery.sh file starts HiveMQ but does not move persistent data to a backup folder.
As soon as the first instance is running, you can start the other instances with the run.sh file as usual.
We do not recommend or support starting more than one HiveMQ instance with the recovery.sh file.
Use of the recovery.sh file to start multiple HiveMQ instances can create inconsistency in your cluster.