Use these runbooks when an agent workflow is slow, stuck, noisy, or blocked by infrastructure. Start from the workspace, agent, session, turn, and operation IDs visible in the dashboard or API response.Documentation Index
Fetch the complete documentation index at: https://docs.akua.dev/llms.txt
Use this file to discover all available pages before exploring further.
Stuck session
Symptoms:- A session stays active but no new events arrive.
- A submitted turn is queued longer than expected.
- The dashboard reconnects but shows no state change.
- Get the session and latest turns with
GET /v1/agent_sessions/{id}andGET /v1/agent_turns?session=.... - Stream events with
GET /v1/agent_events:stream?session=...to confirm whether the event cursor is advancing. - Check for pending approvals with
GET /v1/approval_requests?session=...&state=PENDING. - If the turn is blocked by quota or runtime policy, surface the admission error to the user instead of retrying blindly.
Stuck sandbox
Symptoms:- A turn requires a retained runtime, but the runtime does not become active.
- A retained filesystem exists, but resume does not complete.
- A sandbox stays in
CREATING,STARTING,STOPPING, orDELETINGpast its expected deadline.
- Inspect the turn’s
runtime_decisionandresolved_execution_mode. - Check the sandbox state and last update time in the internal runtime view or operations dashboard.
- Confirm workspace sandbox quota and retained PVC quota are not exhausted.
- If a sandbox is stuck deleting, run the cleanup workflow rather than manually removing PVCs unless the retention policy explicitly allows it.
Provider outage
Symptoms:- Provider requests fail with repeated upstream errors.
- Turns fail before producing useful assistant output.
- Provider spend or token counters stop updating while requests continue.
- Check provider exchange metadata for upstream error codes and response timing.
- Confirm whether the agent uses Akua-managed billing or workspace BYOK.
- For BYOK, verify the referenced secret exists and has an enabled version.
- Switch model/provider only through agent policy so the decision remains auditable.
Runaway ambient trigger
Symptoms:- Many sessions start from the same signal.
- Ambient trigger usage spikes.
- Users see duplicate investigations for one resource.
- Check the agent’s trigger severity threshold and cooldown.
- Search active sessions for the same resource reference before starting new work.
- Disable the trigger or raise the minimum severity while investigating.
- Review the trigger source for repeated identical events.
Retained filesystem cleanup
Symptoms:- Retained storage grows unexpectedly.
- Old sessions still have resumable filesystems.
- PVC cleanup reports failures.
- Check whether the session is pinned.
- Confirm the workspace tier and retention policy.
- Prefer the retained cleanup workflow so state transitions, audit events, and quota counters stay consistent.
- Keep summaries, repository change requests, and git history before deleting expensive runtime state.
Related topics
Agent limits
Understand quotas and usage dimensions.
Sessions and turns
Understand runtime decisions and retained filesystem behavior.