Monitor Agent Health

After you deploy an agent, you want to know more than whether it is running. You want to know it is doing useful work. An agent can run yet fail silently. For example, it cannot reach its broker, its LLM (large language model) provider returns errors, or it receives no data. This page shows how to check an agent’s health in the HiveMQ Platform.

Before You Start

  • You have a deployed agent. If you do not, follow the Quick Start first.

  • You are logged in to the HiveMQ Platform, with the Act tab open.

Scan the Deployed Agents Tab

Open the Deployed Agents tab under Act. It opens on a fleet view designed to help you find problems quickly.

Across the top, four cards summarize the state of every agent you deployed:

Card What it tells you

Awaiting Your Decision

How many agents hold an action for human review. See Respond to a Feedback Request.

Agent Health

A warning summary across the fleet. For example, "2 down · 1 elevated error rate".

Running

How many agents are running, out of the total deployed.

Total Agents

The total number of agents deployed, with a breakdown by role.

Below the cards, four filter dropdowns narrow the table:

  • Network: All networks, or one network.

  • Orchestrator: All orchestrators, or one orchestrator.

  • Status: Any of the agent status values listed below.

  • Role: Monitor, Analyst, Responder, Processor, Publisher, Alerter, Replayer, Orchestrator, Gateway, Collector, Technician, or Tester.

The agents table lists every matching agent with these columns: NAME, ORCHESTRATOR, NETWORK, STARTED, and STATUS. Select any row to open that agent’s detail page.

Agent Status Values

The STATUS column and the Status filter use the full lifecycle vocabulary. An agent passes through several of these statuses as it starts, runs, and stops:

Status What it means

Pending

Queued for the orchestrator to pick up.

Deploying

The orchestrator provisions the agent.

Starting

The agent process starts.

Running

Active. Cycles through Sense → Reason → Actuate → Reflect.

Idle

Up but between cycles. Waits on its trigger.

Paused

The agent does not execute cycles because a user deliberately suspended it.

Stopping

The agent received a stop request and winds down.

Stopped

Halted and not running. You can start it again.

Stale

No recent check-in. The orchestrator for this agent has gone silent, so the last-known status can be out of date.

Removing

Marked for deletion and removed from the orchestrator.

Failed

The agent reported an error or did not run. Check the red banner and the Logs tab.

Error

The agent reported an error during a cycle. Check the Logs tab.

Running and unhealthy can both be true at once. An agent can show Running yet still be impaired. For example, it cannot reach a connection, its LLM returns errors, or it receives no data. The Agent Health card and detail page exist to reveal exactly that case.

Open an Agent and Read the Overview

Select an agent to open its detail page. The header shows the agent name, its role badge, and four status fields (Status, State, Started, and Last seen) plus Refresh and a Stop / Start action.

The detail page has three tabs.

Overview Tab

The Overview tab summarizes the agent through four metric cards:

Metric What it Tells You

Uptime

How long the agent has run.

Total Cycles

How many Sense → Reason → Actuate → Reflect cycles the agent has completed since it started.

Success Rate

The percentage of cycles that completed without error. A falling success rate is the clearest sign of a degraded agent.

Avg Cycle

The average time one cycle takes.

The Overview tab also links out to the agent’s Template, Network, and Orchestrator, and lists the agent’s Connections.

Configuration Tab

The Configuration tab shows the full, deployed configuration of the agent as read-only JSON. Use it to confirm exactly what the running agent received, including connection keys, sense topics, reason rules, trigger, and any supplied parameter values.

Logs Tab

The Logs tab shows a live log stream from the agent. When the agent is between cycles it reads "Waiting for next cycle event…". New entries appear as each cycle runs. The Logs tab is the first place to look when an agent fails or its success rate drops.

Diagnose a Degraded or Failed Agent

When an agent is not doing useful work, check these items in the following order:

  1. Check the status and the header. A Failed or Error status, or a red "This agent reported issues" banner on the detail header, tells you the agent itself reported a problem. The banner shows the failing log line. A common one is Missing required environment variables, which means the template needed values it never received. See Troubleshoot Agent Behavior.

  2. Read the Success Rate metric on the Overview tab. A Running agent with a low success rate is failing cycles even though the process is alive.

  3. Open the Logs tab to see the error thrown each cycle. The log line names the stage that failed. That stage is sense (a connection unreachable), reason (an LLM error), or actuate (a downstream call rejected).

  4. Confirm the configuration on the Configuration tab if the agent behaves differently than you expect. For example, a topic filter or threshold that you did not set the way you intended.

The following table lists common patterns:

Symptom Likely Cause and Where to Look

A connection shows as failed on Overview

Broker credentials, database location, or firewall rules changed. The connection’s last error names the failure.

Success rate dropping over many cycles

Open Logs to see the repeated error. Often an LLM key expired or a model ID is wrong, or a sense source stopped publishing.

Status Failed right after deploy

A required template parameter or ${ENV_VAR} value was not supplied. See Troubleshoot Agent Behavior.

Status Stale

The orchestrator stopped its check-ins. The displayed status can be out of date. Check the orchestrator. See Stale Orchestrators.

Manage a Running Agent

From the detail header:

  • Stop halts the agent immediately, with no confirmation dialog. The status changes to Stopped and the button becomes Start. Start sets the status back to Pending.

  • Stop & delete agent (while running) or Delete agent (while stopped) removes the agent from its orchestrator. This action confirms first, and you cannot undo it.

Next Steps


Reference: How the Platform Assesses Agent Health

The platform distinguishes lifecycle status (is the process running?) from health (is it doing useful work?). Agent health appears as a colored badge on the agent. It also appears in the Agent Health summary card, the Success Rate metric, and (when an agent reports a problem) the "This agent reported issues" banner on its detail header. Health degrades when an agent is alive but does not make progress. The clearest signs are a falling Success Rate and connections that report as failed.

The underlying model considers the following, after each cycle:

  • Connection health: Whether the agent’s configured connections are reachable. All reachable is healthy; some failing is degraded; all failing is critical.

  • Cycle success rate: The fraction of recent cycles that completed without error. The Success Rate metric on the Overview tab shows this fraction directly.

  • LLM availability: For agents with an LLM configured, whether recent calls to the provider succeeded. Repeated failures (expired key, invalid model ID, provider outage, rate limiting) mark the agent as impaired.

  • Sense data quality: Whether the agent receives data from its sources. Several consecutive empty cycles indicate the source stopped publishing, or a query returns nothing.

Stale Orchestrators

An agent’s status depends on regular check-ins from its orchestrator. When an orchestrator goes silent (its process is down, or the network between it and the Control Plane fails), the Control Plane marks it, and the agents it runs as Stale.

A stale orchestrator does not automatically mean its agents have failed. The agents can still run normally while the orchestrator is briefly unreachable. The platform preserves the last-known agent status and labels it as potentially outdated until the next successful check-in. When the orchestrator reconnects, the stale indicator clears on its own.

If an orchestrator stays stale, check the orchestrator host and its logs. See Troubleshoot Agent Behavior and Deploy an Orchestrator.