Health API

The endpoints of the HiveMQ Health API provide operational information about your HiveMQ broker components and extensions.

With the Health API, you can capture snapshots that show the current state of health for each node in your HiveMQ cluster. The well-structured information the API provides helps you quickly identify potential issues and maintain the smooth operation of your HiveMQ platform deployment.

Configuration

Example Health API configuration
<hivemq>
  <!-- ... -->
  <health-api>
    <enabled>true</enabled>
    <listeners>
      <http>
        <port>8889</port>
        <name>health-api-listener</name>
        <bind-address>127.0.0.1</bind-address>
      </http>
    </listeners>
  </health-api>
</hivemq>
Table 1. .Health API configuration parameters
Parameter Default Value Required Description

enabled

false

Enables or disables the use of the HiveMQ Health API. The Health API is disabled by default. To allow access to the Health API, set the enabled tag to true.

listeners

Configures one or more HTTP listeners to provide access to the HiveMQ Health API.

  • port: The port on the local machine that listens for HiveMQ Health API requests.
    The default port for Health API HTTP listeners is 8889.
    The port can be changed.

  • name: Optional setting to define a name for the listener. Custom-defined listener names can be helpful when multiple listeners are in use.
    If no name is specified, HiveMQ uses the type of listener plus the port. For example, http-listener-8889.

  • bind-address: The address on the local machine that accepts HiveMQ Health API requests.
    The default bind address is 127.0.0.1. The bind address can be changed.

When the Health API is enabled, multiple endpoints are exposed on each configured HTTP listener:

The Health API also provides two health group endpoints that assemble data from sets of components:

Health API endpoints are HTTP only and do not support TLS.

HTTP response

When called, each endpoint of the Health API returns an HTTP status code and a human-readable JSON response body with additional information.

The HTTP status code reflects the overall health of the selected health component or health group.
The response body contains a JSON payload with structured information.

Example response body JSON
{
  "status": "<status>",
  "details": {
    "<key>": "<value>"
  },
  "components": {
    "<component-name>": {
      "status": "<status>",
      "components": {
        "<component-name>": {
          "status": "<status>",
          "details": {
            "<key>": "value"
          }
        }
      }
    }
  }
}
Based on user feedback and continued development, upcoming versions of the HiveMQ Health API JSON payload are expected to include additional information about the health of individual components. These future additions can change the JSON payload.

Every health response includes a mandatory status and can optionally include additional details.

Health components can have subcomponents that follow the same structure as the component.
The status of a component is the aggregated status of all associated subcomponents.
This aggregated status of the top (root) component determines the HTTP status code of the response.
Table 2. Status values and HTTP status code mapping
Status HTTP status code Description

UP

200

The component is healthy.

UNKNOWN

200

The health status of the component is unknown.

DEGRADED

200

The component is in a degraded state.
The DEGRADED state needs to be resolved before the next restart of the affected component to ensure that the component does not progress into a DOWN state. For detailed information on how a DEGRADED state impacts different components see, Health Components.
Nodes in a DEGRADED state remain ready to accept traffic. We strongly discourage use of the status field for automated monitoring. For best practices, see Health Monitoring.

DEGRADED_SERVICE

200

The service is degraded.
The DEGRADED_SERVICE state needs to be resolved before the next restart of the affected component to ensure that the component does not progress into an OUT_OF_SERVICE state. For detailed information on how a DEGRADED state impacts different components see, Health Components.
Nodes in a DEGRADED_SERVICE state remain ready to accept traffic. We strongly discourage use of the status field for automated monitoring. For best practices, see Health Monitoring.

DOWN

503

The component is not healthy.

OUT_OF_SERVICE

503

The service is not available.
For more information, see Readiness Check.

When you set up automated monitoring and operations, we recommend the use of the HTTP status code only.

Health Monitoring

Our flexible Health API provides several ways to set up automated monitoring of your HiveMQ Platform.

Kubernetes Probes

Kubernetes has a strict semantic definition of liveness and readiness for Kubernetes containers. The Health API provides ready-to-use endpoints that return the expected HTTP status codes for Kubernetes. For more information, see Liveness Check and Readiness Check.

To manage HiveMQ Platforms in Kubernetes, we recommend using our HiveMQ Platform Operator. Use of the HiveMQ Platform Operator is a convenient way to ensure the correct configuration of your liveness and readiness probes.

If you need to deploy your HiveMQ Platform cluster manually, please use the following Kubernetes probes configuration.

Kubernetes Probes configuration
livenessProbe:
  httpGet:
    path: /api/v1/health/liveness
    port: 8889
    scheme: HTTP
  initialDelaySeconds: 15
  periodSeconds: 30
  successThreshold: 1
  failureThreshold: 240
readinessProbe:
  httpGet:
    path: /api/v1/health/readiness
    port: 8889
    scheme: HTTP
  initialDelaySeconds: 3
  periodSeconds: 5
Incorrect configuration of liveness or readiness probes can lead to cascading failures. Do not build custom queries using the status or details fields of health components for Kubernetes probes. For example, a DEGRADED component does not indicate that a container must be restarted (liveness check) or is not ready to accept traffic (readiness check).

HAProxy

HAProxy is a simple and easy-to-install load balancer. A HAProxy load balancer can distribute incoming MQTT connections to different nodes in your HiveMQ cluster.

HAProxy Load Balancing

You can configure HAProxy to use the Health API readiness endpoint of your HiveMQ cluster as follows:

Example haproxy.cfg file using the Health API
global
  stats socket /var/run/api.sock user haproxy group haproxy mode 660 level admin expose-fd listeners
  log stdout format raw local0 info

defaults
    log global
    mode tcp
    option tcplog
    maxconn 1024000
    timeout connect 30000
    timeout client 600s
    timeout server 600s
    default-server init-addr last,libc,none

frontend stats
    mode http
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST

frontend health_frontend
   mode tcp
   option tcplog
   bind *:8889
   default_backend health_backend

backend health_backend
   mode http
   server HMQ1 HMQ-node1:8889
   server HMQ2 HMQ-node2:8889
   server HMQ3 HMQ-node3:8889

frontend mqtt_frontend
   mode tcp
   option tcplog
   bind *:1883
   default_backend mqtt_backend

backend mqtt_backend
    mode tcp
    stick-table type string len 32 size 100k expire 30m
    stick on req.payload(0,0),mqtt_field_value(connect,client_identifier)
    option httpchk
    http-check send meth GET uri /api/v1/health/readiness
    server HMQ1 HMQ-node1:1883 check port 8889
    server HMQ2 HMQ-node2:1883 check port 8889
    server HMQ3 HMQ-node3:1883 check port 8889

The example configures HAProxy to route all incoming MQTT traffic on port 1883 to one of three HiveMQ cluster nodes. Health checks are enabled for the nodes by adding check port 8889 after each server line in the backend mqtt_backend section. The lines option httpchk and http-check send meth GET uri /api/v1/health/readiness specify that the health check is performed by accessing the readiness endpoint.

HAProxy periodically sends requests to the defined URI on each node. A response with a 2xx or 3xx HTTP status code indicates to HAProxy that the node is healthy, which the Health API conforms to. All other status codes are considered unhealthy, possibly indicating that the selected node is not yet operational. HAProxy stops routing traffic to unhealthy nodes until the node returns to a healthy status.

The example configuration is not intended for production use. For more information, see Using HAProxy to Load Balance HiveMQ with the New Health API.

System Health

The system health endpoint provides an aggregated status of all available health components and health groups.

Example JSON response for a healthy System component
{
  "status": "UP",
  "components": {
    "<component-name>": {
      "status": "UP"
    },
    "liveness-state": {
      "status": "UP"
    },
    "readiness-state": {
      "status": "UP"
    }
  },
  "groups": [ "liveness", "readiness" ]
}
Example JSON response for an unhealthy System component
{
  "status": "DOWN",
  "components": {
    "<component-name>": {
      "status": "DOWN"
    },
    "liveness-state": {
      "status": "UP"
    },
    "readiness-state": {
      "status": "UP"
    }
  },
  "groups": [ "liveness", "readiness" ]
}

Health Components

Health API components help you assess the health and status of various aspects of your HiveMQ deployment.

Info

The Info health component provides general information about the HiveMQ platform.
For example, the HiveMQ version, the current log level, and the epoch timestamp of the node start.

Example JSON response body for a healthy Info component
{
  "status": "UP",
  "details": {
    "cpu-count": 10,
    "log-level": "INFO",
    "started-at": 1713945889156,
    "version": "0.0.0"
  }
}

Cluster Service

The Cluster health component provides information about the connection state of your HiveMQ Cluster.

Once inter-broker communication is fully established, the node has successfully joined the cluster, and no leave replications are in progress, the cluster status reports UP. When the cluster is in an UP state, changes to the cluster topology are safe.
While a node leave replication is in progress, the cluster reports a DEGRADED health status. DEGRADED health status automatically sets the readiness status of the cluster to DEGRADED_SERVICE. To prevent potential data loss, do not change the cluster topology while the cluster is in a DEGRADED state.
Nodes in a DEGRADED state remain ready to accept traffic. We strongly discourage use of the status field for automated monitoring. For best practices, see Health Monitoring.

The information the Cluster health component provides can help you debug node synchronization. For example, to detect a node that is stuck in the join process or to detect a network split.

Example JSON response for a healthy Cluster component
{
  "status": "UP",
  "details": {
    "cluster-nodes": [
      "jS3bb",
      "C3P0X",
      "R2D2Y"
    ],
    "cluster-size": 3,
    "is-leave-replication-in-progress": false,
    "node-id": "jS3bb",
    "node-state": "RUNNING"
  }
}
Example JSON response for a degraded Cluster component
{
  "status": "DEGRADED",
  "details": {
    "cluster-nodes": [
      "jS3bb",
      "C3P0X",
      "R2D2Y"
    ],
    "cluster-size": 3,
    "is-leave-replication-in-progress": true,
    "node-id": "jS3bb",
    "node-state": "RUNNING"
  }
}
Example JSON response for an unhealthy Cluster component
{
  "status": "DOWN",
  "details": {
    "cluster-nodes": [
      "jS3bb",
      "C3P0X",
      "R2D2Y"
    ],
    "cluster-size": 3,
    "is-leave-replication-in-progress": false,
    "node-id": "jS3bb",
    "node-state": "JOINING"
  }
}

MQTT

The MQTT health component provides information about the MQTT listeners and their connection state.
The information the MQTT component provides is useful to ensure that all configured listeners are correctly started and ready to accept traffic.
Failure reasons are provided in the details of each listener component.
If a TLS-related failure occurs, the listener reports a DEGRADED health status. The failure also sets the readiness status to DEGRADED_SERVICE. In this state, the TLS connections on the listener can still work since the connection continues to use the previously loaded keystore and truststore. As long as the old certificates remain valid, the TLS listener can continue to function.
However, if the old certificates become invalid or the node is restarted, the listener can fail. Immediate action is recommended to ensure the stable operation of your HiveMQ cluster.
Nodes in a DEGRADED state remain ready to accept traffic. We strongly discourage use of the status field for automated monitoring. For best practices, see Health Monitoring.

Example JSON response for a healthy MQTT component
{
  "status": "UP",
  "components": {
    "tcp-listener-1883": {
      "status": "UP",
      "details": {
        "bind-address": "0.0.0.0",
        "is-proxy-protocol-supported": false,
        "is-running": true,
        "port": 1883,
        "type": "TCP Listener"
      }
    },
    "tls-tcp-listener-8883": {
      "status": "UP",
      "details": {
        "bind-address": "0.0.0.0",
        "is-proxy-protocol-supported": false,
        "is-running": true,
        "port": 8883,
        "type": "TCP Listener with TLS"
      }
    }
  }
}
Example JSON response for a degraded MQTT component
{
  "status": "DEGRADED",
  "components": {
    "tcp-listener-1883": {
      "status": "UP",
      "details": {
        "bind-address": "0.0.0.0",
        "is-proxy-protocol-supported": false,
        "is-running": true,
        "port": 1883,
        "type": "TCP Listener"
      }
    },
    "tls-tcp-listener-8883": {
      "status": "DEGRADED",
      "details": {
        "bind-address": "0.0.0.0",
        "is-proxy-protocol-supported": false,
        "is-running": true,
        "last-tls-failure": "com.hivemq.security.exception.SslException: Not able to open or read KeyStore '/usr/lib/jvm/11/jre/lib/security/cacerts/keystore.jks' with type 'JKS'",
        "port": 8883,
        "type": "TCP Listener with TLS"
      }
    }
  }
}
Example JSON response for an unhealthy MQTT component
{
  "status": "DOWN",
  "components": {
    "tcp-listener-1883": {
      "status": "UP",
      "details": {
        "bind-address": "0.0.0.0",
        "is-proxy-protocol-supported": false,
        "is-running": true,
        "port": 1883,
        "type": "TCP Listener"
      }
    },
    "tls-tcp-listener-8883": {
      "status": "DOWN",
      "details": {
        "bind-address": "0.0.0.0",
        "is-proxy-protocol-supported": false,
        "is-running": false,
        "last-failure": "java.io.IOException: Failed to bind to /0.0.0.0:8883",
        "port": 8883,
        "type": "TCP Listener with TLS"
      }
    }
  }
}

Control Center

The Control Center health component provides information about the Control Center. The details show the current Control Center configuration, the state of the Jetty server connector, and failure reasons in the details of each listener component (if applicable).

Example JSON response for a healthy Control Center component
{
  "status": "UP",
  "details": {
    "default-login-mechanism-enabled": true,
    "enabled": true,
    "max-session-idle-time": 14400
  },
  "components": {
    "control-center-http-listener-8080": {
      "status": "UP",
      "details": {
        "bind-address": "0.0.0.0",
        "is-connector-failed": false,
        "is-connector-open": true,
        "is-connector-running": true,
        "port": 8080
      }
    }
  }
}
Example JSON response for an unhealthy Control Center
{
  "status": "DOWN",
  "details": {
    "default-login-mechanism-enabled": true,
    "enabled": true,
    "max-session-idle-time": 14400
  },
  "components": {
    "control-center-https-listener-8443": {
      "status": "DOWN",
      "details": {
        "bind-address": "0.0.0.0",
        "is-connector-failed": false,
        "is-connector-open": true,
        "is-connector-running": true,
        "last-tls-failure": "java.io.IOException: keystore password was incorrect",
        "port": 8443
      }
    }
  }
}

REST API

The REST API health component provides information about the HiveMQ REST API.
The details show your current REST API configuration, the state of the Jetty server connector, and failure reasons in the details of each listener component (if applicable).

Example JSON response for a healthy REST API component
{
  "status": "UP",
  "details": {
    "authentication-enabled": false,
    "enabled": true
  },
  "components": {
    "http-listener-8888": {
      "status": "UP",
      "details": {
        "bind-address": "127.0.0.1",
        "is-connector-failed": false,
        "is-connector-open": true,
        "is-connector-running": true,
        "port": 8888
      }
    },
    "https-listener-8889": {
      "status": "UP",
      "details": {
        "bind-address": "127.0.0.1",
        "is-connector-failed": false,
        "is-connector-open": true,
        "is-connector-running": true,
        "port": 8889
      }
    }
  }
}
Example JSON response for an unhealthy REST API component
{
  "status": "DOWN",
  "details": {
    "authentication-enabled": false,
    "enabled": true
  },
  "components": {
    "http-listener-8888": {
      "status": "DOWN",
      "details": {
        "bind-address": "127.0.0.1",
        "is-connector-failed": false,
        "is-connector-open": true,
        "is-connector-running": true,
        "last-failure": "java.io.IOException: Failed to bind to /0.0.0.0:8888",
        "port": 8888
      }
    },
    "https-listener-8889": {
      "status": "UP",
      "details": {
        "bind-address": "127.0.0.1",
        "is-connector-failed": false,
        "is-connector-open": true,
        "is-connector-running": true,
        "port": 8889
      }
    }
  }
}

Extensions

The Extensions health component provides information about your configured custom and HiveMQ Enterprise extensions.

The details provide the extension metadata. Components provide general insights about an extension and detailed information about runtime and startup failures. For example, specific extension setup details or the reason string when an extension fails to start or the trial mode of a HiveMQ Enterprise Extension expires.

Extensions can report a DEGRADED health status in two scenarios.

When a HiveMQ extension throws an uncaught exception, the HiveMQ broker captures the exception and reports the extension status DEGRADED. The Health API Extension component provides details about the captured exception. To ensure the reliable processing of your data, review the details and address any issues as soon as possible.

HiveMQ Enterprise Extensions that support configuration hot-reload functionality can report a DEGRADED state when an invalid configuration is provided at runtime. In this case, you must resolve the cause of the DEGRADED state before the next restart of the extension or the restart will fail and the extension health will progress into a DOWN state. Immediate action is recommended to ensure the stable operation of your HiveMQ cluster.

Nodes in a DEGRADED state remain ready to accept traffic. We strongly discourage use of the status field for automated monitoring. For best practices, see Health Monitoring.

Example JSON response for a healthy Extension component
{
  "status": "UP",
  "details": {
    "author": "HiveMQ",
    "enabled": true,
    "name": "HiveMQ Enterprise Extension for Kafka",
    "priority": 1000,
    "start-priority": 1000,
    "startedAt": 1713949817551,
    "version": "4.28.0"
  },
  "components": {
    "application": {
      "status": "UP",
      "components": {
        "configuration": {
          "status": "UP",
          "details": {
            "kafka-to-mqtt-mappings-count": 1,
            "kafka-to-mqtt-transformers-count": 0,
            "mqtt-to-kafka-mappings-count": 1,
            "mqtt-to-kafka-transformers-count": 0,
            "reloaded-at": 1713949818293
          }
        }
      }
    },
    "internals": {
      "status": "UP",
      "components": {
        "entrypoint": {
          "status": "UP",
          "details": {
            "started-at": 1713949817551
          }
        },
        "license": {
          "status": "UP",
          "details": {
            "is-enterprise": true,
            "is-trial": true,
            "is-trial-expired": false
          }
        },
        "services": {
          "status": "UP"
        }
      }
    }
  }
}
Example JSON response for a degraded Extension component
{
  "status": "DEGRADED",
  "details": {
    "author": "HiveMQ",
    "enabled": true,
    "name": "HiveMQ Enterprise Extension for Kafka",
    "priority": 1000,
    "start-priority": 1000,
    "startedAt": 1713949817551,
    "version": "4.28.0"
  },
  "components": {
    "application": {
      "status": "DEGRADED",
      "components": {
        "configuration": {
          "status": "DEGRADED",
          "details": {
            "kafka-to-mqtt-mappings-count": 1,
            "kafka-to-mqtt-transformers-count": 0,
            "last-failure": "Invalid configuration",
            "mqtt-to-kafka-mappings-count": 1,
            "mqtt-to-kafka-transformers-count": 0,
            "reloaded-at": 1713949818293
          }
        }
      }
    },
    "internals": {
      "status": "UP",
      "components": {
        "entrypoint": {
          "status": "UP",
          "details": {
            "started-at": 1713949817551
          }
        },
        "license": {
          "status": "UP",
          "details": {
            "is-enterprise": true,
            "is-trial": true,
            "is-trial-expired": false
          }
        },
        "services": {
          "status": "UP"
        }
      }
    }
  }
}
Example JSON response for an unhealthy Extension component
{
  "status": "DOWN",
  "details": {
    "author": "HiveMQ",
    "enabled": false,
    "name": "HiveMQ Enterprise Extension for Kafka",
    "priority": 1000,
    "start-priority": 1000,
    "startedAt": 1713949817551,
    "version": "4.28.0"
  },
  "components": {
    "internals": {
      "status": "DOWN",
      "components": {
        "entrypoint": {
          "status": "DOWN",
          "details": {
            "last-startup-failure": "Extension startup prevented because of the following error: java.nio.file.NoSuchFileException: /opt/hivemq/extensions/hivemq-kafka-extension/conf/config.xml",
            "started-at": 1713949817551
          }
        },
        "license": {
          "status": "UP",
          "details": {
            "is-enterprise": true,
            "is-trial": true,
            "is-trial-expired": false
          }
        },
        "services": {
          "status": "UP"
        }
      }
    }
  }
}

Health Groups

It is sometimes useful to organize health components into groups that can be used for different purposes.
Health groups show the aggregated state of selected health components.
The HiveMQ liveness and readiness health groups are useful for common use cases such as liveness and readiness probes for Kubernetes containers.
For more information, see Kubernetes Probes.

Liveness Check

The liveness health group checks whether the deployed HiveMQ broker is currently operational, responsive, and reachable.

Example liveness check JSON response for a running and responsive HiveMQ broker
{ "status": "UP" }

Readiness Check

The Readiness health group checks whether the HiveMQ node is currently available to receive and process MQTT messages.

The Readiness health group aggregates the state of the Cluster and MQTT health components.

If the Cluster component and the MQTT component are both healthy, the readiness check returns the status UP and the node can accept traffic.

If one of the components in the Readiness health group is degraded, the readiness check returns the status DEGRADED_SERVICE.
The degraded service status indicates that the HiveMQ node is still operational, but might fail over time or on the next restart. Immediate action is recommended to ensure the stable operation of your HiveMQ cluster.
Nodes in a DEGRADED_SERVICE state remain ready to accept traffic. We strongly discourage use of the status field for automated monitoring. For best practices, see Health Monitoring.

If one of the components in the Readiness health group is not healthy, the readiness check returns the status OUT_OF_SERVICE.
The out-of-service status indicates that the HiveMQ node is currently unable to accept traffic.

Example readiness check JSON response for a node that is ready to accept MQTT traffic
{
  "status": "UP",
  "components": {
    "cluster": {
      "status": "UP",
      "details": {
      }
    },
    "mqtt": {
      "status": "UP",
      "components": {
      }
    }
  }
}
Example readiness check JSON response for a node that is currently unable to accept MQTT traffic
{
  "status": "OUT_OF_SERVICE",
  "components": {
    "cluster": {
      "status": "UP",
      "details": {
      }
    },
    "mqtt": {
      "status": "DOWN",
      "components": {
      }
    }
  }
}