Cluster Overload Protection
This documentation is for the HiveMQ 3.4 legacy version. For up-to-date information on the current version of HiveMQ, please switch to the latest version of our HiveMQ Platform documentation and update your bookmarks as needed. |
HiveMQ provides a built-in cluster overload protection. Each HiveMQ cluster node is able to reduce the rate of incoming messages from message producing MQTT clients that significantly contribute to the overload of the cluster. This mechanism improves the resiliency of a HiveMQ cluster dramatically as individual MQTT clients can be throttled in case the HiveMQ cluster experiences an overload. With this mechanism, HiveMQ is able to recover itself from stress situations without notable service degradation for most MQTT clients.
A HiveMQ MQTT broker cluster consists of several individual HiveMQ nodes. Each node may experience a different stress level at any given time due to the number of MQTT PUBLISH messages to process, retained messages, client connect rates, queued messages and other operations that may cause overload for an individual broker.
Each MQTT broker calculates its own Overload Protect Level and notifies other cluster members about this level.
Client throttling is realized with a credit-based system based on the Overload Protection Level and the calculated costs of an MQTT PUBLISH message per topic. For each MQTT PUBLISH message sent by a client to the broker, the calculated credits for the MQTT message get decremented for the client. Credits for clients regenerate over time. The higher the Overload Protect Level is, the slower credits regenerate. As soon as a client’s credit value reaches zero, the client is restrained from publishing additional MQTT packets until it reaches a threshold.
When a client is blocked from sending messages, HiveMQ will no longer read from the client’s TCP socket, resulting in TCP back pressure for the MQTT client. In this state the client can’t send any MQTT messages, including PINGREQ messages. This can result in a connection loss, should the timespan during which no messages are read from the socket exceed the configured keepAlive value of the client. This typically only happens for clients, which cause expensive operations on the broker if the broker cluster is under very high stress for a long period of time.
Cluster topology changes may also result in the increase of the Overload Protect Level of individual nodes for short periods of time.
Configuration
Cluster Overload Protection is enabled by default. The feature can be enabled/disabled in the configuration, see the configuration examples below.
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="../../hivemq-config.xsd">
...
<overload-protection>
<enabled>true</enabled>
</overload-protection>
...
</hivemq>
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="../../hivemq-config.xsd">
...
<overload-protection>
<enabled>false</enabled>
</overload-protection>
...
</hivemq>
Disabling Cluster Overload Protection
It’s not recommended to disable the Cluster Overload Protection. If you disable the mechanism, high stress on the cluster may result in unresponsive cluster nodes or JVM OutOfMemoryErrors.
|
Monitoring
The following overload protection metrics can be monitored: (e.g. with JMX)
Some of the most important information about overload protection is the clients.backpressure-active
metric.
It shows how many clients are currently slowed down by the overload protection mechanism.
The speed, in which the back pressure is handled, is directly dependent on the credits.per-tick
value.
The higher this value is, the quicker back pressure will be decreased from the clients.
credits.per-tick
is directly influenced by the overload.protection.level
of an individual node. The higher the level, the lower the credits regenerate.
The maximum credits a client can accumulate is 50,000.
A client regenerates credits every 200ms.
Metric Name | Description |
---|---|
|
Current overload protection level. Value from 0 (lowest) to 10 (highest) |
|
Current amount of credits a client receives per tick (per 200ms) |
|
Average amount of available credits between all clients |
|
Current amount of clients having less than the full amount (50,000) of credits |
|
Current amount of clients for which backpressure is applied by the Cluster Overload Protection |