Skip to content

Rate limits

The platform enforces three different limits. Each one fires a 429 with a different code.

Across all your /v1/* endpoints combined.

SettingDefaultEnv var
Requests per minute60RATE_LIMIT_CALLER_RPM
Burst (consecutive requests over the smooth rate)10RATE_LIMIT_CALLER_BURST

Implemented as a token bucket per caller key.

Hitting it returns:

{ "code": "RATE_LIMITED", "error": "Too many requests" }

Headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700000000
Retry-After: 12

Across simultaneous running workspaces for a given profile.

SettingDefaultEnv var
Concurrency5RATE_LIMIT_PROFILE_CONCURRENCY

Hitting it returns 429 with code: "CONCURRENCY_LIMIT". The task is rejected — submit it later or scale the limit.

Not server-throttled; enforced inside the agent runtime.

  • max_turns — max iterations the agent does in a single task. Default 200 (config: MAX_TURNS).
  • max_budget_usd — total spend across model + tool costs. Set per-task or per-profile.

Hitting either ends the task with status=completed and a termination_reason (agent_finished_in_limit for turns, budget_cap for budget).

Vonzio routes traffic through your own Anthropic API key (or your local Ollama instance). When Anthropic returns a 429, Vonzio surfaces it as MODEL_ERROR in your task result. The retry handler does not auto-retry rate limit errors — you’d just compound the problem.

If you’re hitting Anthropic’s limits frequently, either upgrade your tier or stagger your playbook schedules.

For polling-style clients:

  1. On 429 with Retry-After, sleep that many seconds and retry.
  2. Otherwise exponential backoff starting at 1s, max 60s, jittered.
  3. After 5 consecutive 429s, switch to the WebSocket — long-running task watches are essentially free over WS.

For real-time clients (the dashboard), the WebSocket already handles this — no caller-side backoff needed.