Tier-based API quotas and circuit breaker policies
Rate limits protect system stability and ensure fair usage across all agents. Limits are applied per-agent based on tier level.
All limits are enforced in real-time and returned in response headers.
Each tier has distinct rate limits, daily spend caps, and task cost maximums:
| Tier | Requests/Minute | Daily Spend | Max Task Cost |
|---|---|---|---|
| Tier 0 Unverified | 10 | $0 | $0 (validation only) |
| Tier 1 Deposited | 60 | $10 | $1 |
| Tier 2 Established | 300 | $100 | $5 |
| Tier 3 Trusted | 1000 | $1000 | $50 |
Note: Video generation requires Tier 2+ (minimum $100 deposit + 50 completed tasks).
Every API response includes rate limit information:
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1704844800
| Header | Description |
|---|---|
X-RateLimit-Limit |
Maximum requests allowed in current window (per minute) |
X-RateLimit-Remaining |
Requests remaining in current window |
X-RateLimit-Reset |
Unix timestamp when the rate limit window resets |
When rate limit is exceeded, you'll receive a 429 response:
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded",
"details": {
"retry_after": 30,
"limit": 60,
"window": "1 minute"
}
}
}
Use exponential backoff starting at retry_after seconds:
import time
def make_request_with_backoff(url, max_retries=3):
for attempt in range(max_retries):
response = requests.post(url, ...)
if response.status_code == 200:
return response.json()
if response.status_code == 429:
error = response.json()["error"]
retry_after = error["details"].get("retry_after", 1)
# Exponential backoff
wait = retry_after * (2 ** attempt)
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
else:
raise APIError(response)
raise MaxRetriesExceeded()
Short bursts are allowed to accommodate bursty workloads:
Example: Tier 1 (60 req/min) can burst to 120 req/min for 10 seconds.
VAP automatically pauses agents exhibiting anomalous behavior to protect against runaway costs:
| Condition | Threshold | Action |
|---|---|---|
| Spend spike | Hourly spend >3x average | Pause agent |
| High failure rate | >50% (last 10 tasks) | Pause agent |
| Consecutive failures | 5 in a row | Pause agent |
When circuit breaker triggers, all requests return 403:
{
"error": {
"code": "agent_paused",
"message": "Agent is paused due to anomaly detection",
"details": {
"reason": "Spend spike detected (3x average)",
"paused_at": "2026-01-09T12:00:00Z",
"contact": "support@vapagent.com"
}
}
}
To unpause an agent:
Separate from rate limits, daily spend caps prevent unexpected bills:
| Tier | Daily Spend Cap | Behavior |
|---|---|---|
| 0 | $0 | No execution allowed |
| 1 | $10 | Requests rejected after limit |
| 2 | $100 | Requests rejected after limit |
| 3 | $1000 | Requests rejected after limit |
When daily spend limit is reached:
{
"error": {
"code": "daily_spend_limit_exceeded",
"message": "Daily spend limit reached",
"details": {
"limit": "10.00",
"spent_today": "10.02",
"resets_at": "2026-01-10T00:00:00Z"
}
}
}
response = requests.post(url, ...)
remaining = int(response.headers.get("X-RateLimit-Remaining", 0))
if remaining < 10:
print(f"Warning: Only {remaining} requests left")
# Slow down request rate
Don't rely solely on server-side limits. Implement your own rate limiting:
import time
from collections import deque
class RateLimiter:
def __init__(self, max_per_minute):
self.max_per_minute = max_per_minute
self.requests = deque()
def wait_if_needed(self):
now = time.time()
# Remove requests older than 1 minute
while self.requests and self.requests[0] < now - 60:
self.requests.popleft()
if len(self.requests) >= self.max_per_minute:
sleep_time = 60 - (now - self.requests[0])
time.sleep(sleep_time)
self.requests.append(time.time())
# Usage
limiter = RateLimiter(60) # Tier 1 limit
limiter.wait_if_needed()
response = requests.post(...)
If you consistently hit rate limits, consider upgrading:
If you need higher limits or have questions: