Cluster Overload Protection

HiveMQ provides built-in cluster overload protection that allows the HiveMQ broker to restrict incoming traffic if the rate of MQTT messages in the cluster becomes too high. Overload protection gives each HiveMQ instance in your cluster the ability to temporarily prohibit traffic from individual MQTT clients that are significantly increasing cluster load. The selective application of back pressure on specific MQTT clients during periods of exceptionally high load improves the reliability and resiliency of your cluster. HiveMQ cluster overload protection mechanisms ensure that your cluster can recover from stressful situations without notable service degradation for most MQTT clients.

A HiveMQ cluster consists of multiple HiveMQ broker nodes. Various factors can cause the nodes in a cluster to experience different stress levels at any given time. Factors that increase the processing load for an individual broker node include the current number of MQTT PUBLISH, SUBSCRIBE, and UNSUBSCRIBE messages, retained messages, client connect rates, and queued messages.

Based on the current load, each HiveMQ instance (node) in the cluster determines its own overload protection level and notifies the other HiveMQ instances in the cluster about this level. Nodes recalculate their overload protection level every 100 milliseconds and broadcast changes to the other nodes in the cluster. Upon notification, all nodes in the cluster support the highest overload protection level reported in the cluster.

HiveMQ also provides configurable dynamic connection rate throttling. For more information, see Configurable Dynamic Connection Rate Throttling.

Overload Protection Levels and Client Throttling

Client throttling is based on the amount of MQTT packets that the connected client has in-flight at the same time at the broker. Each such packet has a base cost of 1 associated with it. With an increase of Overload Protection Level, the amount of credits taken by each packet increases linearly. At the maximal Overload Protection Level, one packet will take all the credits allocated to the channel.

HiveMQ deducts credits for each MQTT PUBLISH, SUBSCRIBE, or UNSUBSCRIBE packet a client sends to the broker from the total pool of credits the client currently has available. The number of credits HiveMQ deducts for each message packet is calculated with an internal algorithm:

The maximum number of credits a client can accumulate is 100.
Client credits are deducted for every incoming PUBLISH, SUBSCRIBE, or UNSUBSCRIBE MQTT packet.
Client credits are calculated equally for each type of MQTT packet.
When the inbound processing of the MQTT packet finishes, the client credits are returned.

When overload protection is enabled, HiveMQ internally sets overload protection throttling levels from Lowest (1) to CRITICAL (10). The higher the Overload Protection Level, the higher the amount of credits the incoming packet takes. When the credit value for a client reaches zero, HiveMQ forbids all traffic from the client until the necessary credit threshold is re-established.

While a client is restrained from sending messages, HiveMQ stops reading from the TCP socket of the client. The blocking of the TCP socket creates TCP backpressure on the MQTT client.
While in the restrained state, the HiveMQ broker does not accept any MQTT messages (including PINGREQ messages) from the client. When the client is in a restrained state, the HiveMQ broker keeps the connection to the client open even if the keep-alive interval is exceeded.

Once enough credits return to the credit pool of the client, backpressure ends and the connection is maintained with the configured keep-alive as usual. That is, if the time span during which no messages are read from the socket exceeds the configured keep-alive value of the client, the broker automatically disconnects the client.

Cluster topology changes can create a temporary increase in the Overload Protection Level of individual nodes because the exchange of data between nodes temporarily increases the workload of each node.

Configuration

Cluster Overload Protection is enabled by default.
The feature can be enabled and disabled as shown in the following configuration examples.

Example enabled overload protection (default)

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <overload-protection>
        <enabled>true</enabled>
    </overload-protection>
    ...
</hivemq>

Example disabled overload protection

<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <overload-protection>
        <enabled>false</enabled>
    </overload-protection>
    ...
</hivemq>

We do not recommend disabling cluster overload protection. If you disable the protection mechanism, high stress on the cluster can cause unresponsive cluster nodes and JVM out-of-memory (OOM) errors.

Overriding the overload protection

It is possible to override your HiveMQ overload protection on a per-client basis.
For more information, see Modifiable Client Settings.

Overload Protection Monitoring

HiveMQ provides several overload protection metrics that offer valuable insights into the operation of your HiveMQ cluster.

The clients.backpressure-active metric shows how many clients overload protection currently throttles.

The overload.protection.level of an individual node directly impacts the amount of credits calculated for each incoming MQTT packet.

The following overload protection metrics can be monitored:

Table 1. Available overload protection metrics
Metric	Type	Description
`com.hivemq.overload.protection.level`	`Gauge`	The current level of overload protection. Value from 0 (lowest) to 10 (highest)
`com.hivemq.overload-protection.clients.using-credits`	`Gauge`	The current number of clients that have less than the maximum number of credits. The default maximum is 100 credits.
`com.hivemq.overload-protection.clients.backpressure-active`	`Gauge`	The total number of clients for which Cluster Overload Protection currently applies backpressure.