Cluster Overload Protection

HiveMQ provides built-in cluster overload protection that allows the HiveMQ broker to restrict incoming traffic if the rate of MQTT messages in the cluster becomes too high. Overload protection gives each HiveMQ instance in your cluster the ability to temporarily prohibit traffic from individual MQTT clients that are significantly increasing cluster load. The selective application of back pressure on specific MQTT clients during periods of exceptionally high load improves the resiliency of your cluster. HiveMQ cluster overload protection mechanisms ensure that your cluster can recover from stressful situations without notable service degradation for most MQTT clients.

A HiveMQ cluster consists of several individual HiveMQ broker nodes. Various factors can cause the nodes in a cluster to experience different stress levels at any given time. Factors that increase the processing load for an individual broker node include the current number of MQTT PUBLISH messages, retained messages, client connect rates, and queued messages.

Based on the current load, each HiveMQ instance (node) in the cluster determines its own overload protection level and notifies the other HiveMQ instances in the cluster about this level. Nodes recalculate their overload protection level every 100 milliseconds and broadcast changes to the other nodes in the cluster. Upon notification, all nodes in the cluster support the highest overload protection rate reported in the cluster.

Overload Protection Levels and Client Throttling

Client throttling is realized with a credit system based on the current Overload Protection Level and the calculated cost of the MQTT PUBLISH messages that are sent per topic. HiveMQ deducts credits for each MQTT PUBLISH packet a client sends to the broker from the total pool of credits the client currently has available. The number of credits HiveMQ deducts for each message packet is calculated with an internal algorithm:

  • The maximum number of credits a client can accumulate is 50,000.

  • Client credits regenerate at 200 milliseconds intervals.

When overload protection is enabled, HiveMQ internally sets overload protection throttling levels from Lowest (1) to CRITICAL (10). The higher the Overload Protection Level, the lower the number of credits per interval and the slower the credit pool refills. When the credit value for a client reaches zero, HiveMQ forbids all traffic from the client until the necessary credit threshold is re-established.

While a client is restrained from sending messages, HiveMQ stops reading from the TCP socket of the client. This blocking of the TCP socket creates TCP back pressure on the MQTT client.
While in the restrained state, the MQTT client cannot send any MQTT messages (including PINGREQ messages). If the time span during which no messages are read from the socket exceeds the configured keep-alive value of the client, the broker automatically disconnects the client. Loss of connection typically only occurs for clients that cause unusually expensive operations on the broker or if the broker cluster is under very high load for an extended period of time.

Cluster topology changes can create a temporary increase in the Overload Protection Level of individual nodes.

Configuration

Cluster Overload Protection is enabled by default.
The feature can be enabled and disabled as shown in the following configuration examples.

Example enabled overload protection (default)
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
   <overload-protection>
       <enabled>true</enabled>
   </overload-protection>
    ...
</hivemq>
Example disabled overload protection
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
   <overload-protection>
       <enabled>false</enabled>
   </overload-protection>
    ...
</hivemq>
We do not recommend disabling cluster overload protection. If you disable the protection mechanism, high stress on the cluster can cause unresponsive cluster nodes and JVM out-of-memory (OOM) errors.

Overriding the overload protection

It is possible to override your HiveMQ overload protection on a per-client basis.
For more information, see Modifiable Client Settings.

Overload Protection Monitoring

HiveMQ provides several overload protection metrics that offer valuable insights into the operation of your HiveMQ cluster.

Some of the most important information about overload protection is provided in the clients.backpressure-active metric. This metric shows how many clients the overload protection mechanism currently throttles.

The speed at which the back pressure is handled is directly dependent on the credits.per-tick value.
The higher the credits.per-tick value, the more credits a client gets per tick interval, and the quicker back pressure on the MQTT client decreases.

The overload.protection.level of an individual node directly impacts the credits.per-tick value.
The higher the overload protection level, the lower the credits.per-tick value. A lower credits.per-tick value gives the client fewer credits per tick interval and releases the back pressure on the MQTT client more slowly.

The following overload protection metrics can be monitored:

Table 1. Available overload protection metrics
Metric Type Description

com.hivemq.overload.protection.level

Gauge

The current level of overload protection. Value from 0 (lowest) to 10 (highest)

com.hivemq.overload-protection.credits.per-tick

Gauge

The current number of credits a client receives per tick interval. The default tick interval is 200 milliseconds.

com.hivemq.overload-protection.clients.average-credits

Gauge

The average number of available credits for all clients.

com.hivemq.overload-protection.clients.using-credits

Gauge

The current number of clients that have less than the maximum number of credits. The default maximum is 50,000 credits.

com.hivemq.overload-protection.clients.backpressure-active

Gauge

The total number of clients for which Cluster Overload Protection currently applies backpressure.