Monitor Agent Health
After you deploy an agent, you want to know more than whether it is running. You want to know it is doing useful work. An agent can run yet fail silently. For example, it cannot reach its broker, its LLM (large language model) provider returns errors, or it receives no data. This page shows how to check an agent’s health in the HiveMQ Platform.
Before You Start
-
You have a deployed agent. If you do not, follow the Quick Start first.
-
You are logged in to the HiveMQ Platform, with the Act tab open.
Scan the Deployed Agents Tab
Open the Deployed Agents tab under Act. It opens on a fleet view designed to help you find problems quickly.
Across the top, four cards summarize the state of every agent you deployed:
| Card | What it tells you |
|---|---|
Awaiting Your Decision |
How many agents hold an action for human review. See Respond to a Feedback Request. |
Agent Health |
A warning summary across the fleet. For example, "2 down · 1 elevated error rate". |
Running |
How many agents are running, out of the total deployed. |
Total Agents |
The total number of agents deployed, with a breakdown by role. |
Below the cards, four filter dropdowns narrow the table:
-
Network: All networks, or one network.
-
Orchestrator: All orchestrators, or one orchestrator.
-
Status: Any of the agent status values listed below.
-
Role: Monitor, Analyst, Responder, Processor, Publisher, Alerter, Replayer, Orchestrator, Gateway, Collector, Technician, or Tester.
The agents table lists every matching agent with these columns: NAME, ORCHESTRATOR, NETWORK, STARTED, and STATUS. Select any row to open that agent’s detail page.
Agent Status Values
The STATUS column and the Status filter use the full lifecycle vocabulary. An agent passes through several of these statuses as it starts, runs, and stops:
| Status | What it means |
|---|---|
Pending |
Queued for the orchestrator to pick up. |
Deploying |
The orchestrator provisions the agent. |
Starting |
The agent process starts. |
Running |
Active. Cycles through Sense → Reason → Actuate → Reflect. |
Idle |
Up but between cycles. Waits on its trigger. |
Paused |
The agent does not execute cycles because a user deliberately suspended it. |
Stopping |
The agent received a stop request and winds down. |
Stopped |
Halted and not running. You can start it again. |
Stale |
No recent check-in. The orchestrator for this agent has gone silent, so the last-known status can be out of date. |
Removing |
Marked for deletion and removed from the orchestrator. |
Failed |
The agent reported an error or did not run. Check the red banner and the Logs tab. |
Error |
The agent reported an error during a cycle. Check the Logs tab. |
|
Running and unhealthy can both be true at once. An agent can show |
Open an Agent and Read the Overview
Select an agent to open its detail page. The header shows the agent name, its role badge, and four status fields (Status, State, Started, and Last seen) plus Refresh and a Stop / Start action.
The detail page has three tabs.
Overview Tab
The Overview tab summarizes the agent through four metric cards:
| Metric | What it Tells You |
|---|---|
Uptime |
How long the agent has run. |
Total Cycles |
How many Sense → Reason → Actuate → Reflect cycles the agent has completed since it started. |
Success Rate |
The percentage of cycles that completed without error. A falling success rate is the clearest sign of a degraded agent. |
Avg Cycle |
The average time one cycle takes. |
The Overview tab also links out to the agent’s Template, Network, and Orchestrator, and lists the agent’s Connections.
Diagnose a Degraded or Failed Agent
When an agent is not doing useful work, check these items in the following order:
-
Check the status and the header. A
FailedorErrorstatus, or a red "This agent reported issues" banner on the detail header, tells you the agent itself reported a problem. The banner shows the failing log line. A common one isMissing required environment variables, which means the template needed values it never received. See Troubleshoot Agent Behavior. -
Read the Success Rate metric on the Overview tab. A
Runningagent with a low success rate is failing cycles even though the process is alive. -
Open the Logs tab to see the error thrown each cycle. The log line names the stage that failed. That stage is sense (a connection unreachable), reason (an LLM error), or actuate (a downstream call rejected).
-
Confirm the configuration on the Configuration tab if the agent behaves differently than you expect. For example, a topic filter or threshold that you did not set the way you intended.
The following table lists common patterns:
| Symptom | Likely Cause and Where to Look |
|---|---|
A connection shows as failed on Overview |
Broker credentials, database location, or firewall rules changed. The connection’s last error names the failure. |
Success rate dropping over many cycles |
Open Logs to see the repeated error. Often an LLM key expired or a model ID is wrong, or a sense source stopped publishing. |
Status |
A required template parameter or |
Status |
The orchestrator stopped its check-ins. The displayed status can be out of date. Check the orchestrator. See Stale Orchestrators. |
Manage a Running Agent
From the detail header:
-
Stop halts the agent immediately, with no confirmation dialog. The status changes to
Stoppedand the button becomes Start. Start sets the status back toPending. -
Stop & delete agent (while running) or Delete agent (while stopped) removes the agent from its orchestrator. This action confirms first, and you cannot undo it.
Next Steps
-
Respond to a Feedback Request: Act on the decisions your agent holds for review.
-
Troubleshoot Agent Behavior: Fix a failed, stale, or stuck agent.
-
Set Human Oversight: Decide which actions an agent must check with you before it runs.
Reference: How the Platform Assesses Agent Health
The platform distinguishes lifecycle status (is the process running?) from health (is it doing useful work?). Agent health appears as a colored badge on the agent. It also appears in the Agent Health summary card, the Success Rate metric, and (when an agent reports a problem) the "This agent reported issues" banner on its detail header. Health degrades when an agent is alive but does not make progress. The clearest signs are a falling Success Rate and connections that report as failed.
The underlying model considers the following, after each cycle:
-
Connection health: Whether the agent’s configured connections are reachable. All reachable is healthy; some failing is degraded; all failing is critical.
-
Cycle success rate: The fraction of recent cycles that completed without error. The Success Rate metric on the Overview tab shows this fraction directly.
-
LLM availability: For agents with an LLM configured, whether recent calls to the provider succeeded. Repeated failures (expired key, invalid model ID, provider outage, rate limiting) mark the agent as impaired.
-
Sense data quality: Whether the agent receives data from its sources. Several consecutive empty cycles indicate the source stopped publishing, or a query returns nothing.
Stale Orchestrators
An agent’s status depends on regular check-ins from its orchestrator. When an orchestrator goes silent (its process is down, or the network between it and the Control Plane fails), the Control Plane marks it, and the agents it runs as Stale.
A stale orchestrator does not automatically mean its agents have failed. The agents can still run normally while the orchestrator is briefly unreachable. The platform preserves the last-known agent status and labels it as potentially outdated until the next successful check-in. When the orchestrator reconnects, the stale indicator clears on its own.
If an orchestrator stays stale, check the orchestrator host and its logs. See Troubleshoot Agent Behavior and Deploy an Orchestrator.