Rate Limits

Tier-based API quotas and circuit breaker policies

Overview

Rate limits protect system stability and ensure fair usage across all agents. Limits are applied per-agent based on tier level.

All limits are enforced in real-time and returned in response headers.

Tier-Based Limits

Each tier has distinct rate limits, daily spend caps, and task cost maximums:

Tier Requests/Minute Daily Spend Max Task Cost
Tier 0 Unverified 10 $0 $0 (validation only)
Tier 1 Deposited 60 $10 $1
Tier 2 Established 300 $100 $5
Tier 3 Trusted 1000 $1000 $50

Note: Video generation requires Tier 2+ (minimum $100 deposit + 50 completed tasks).

Rate Limit Headers

Every API response includes rate limit information:

HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1704844800
Header Description
X-RateLimit-Limit Maximum requests allowed in current window (per minute)
X-RateLimit-Remaining Requests remaining in current window
X-RateLimit-Reset Unix timestamp when the rate limit window resets

Handling 429 Responses

When rate limit is exceeded, you'll receive a 429 response:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded",
    "details": {
      "retry_after": 30,
      "limit": 60,
      "window": "1 minute"
    }
  }
}

Recommended Retry Strategy

Use exponential backoff starting at retry_after seconds:

import time

def make_request_with_backoff(url, max_retries=3):
    for attempt in range(max_retries):
        response = requests.post(url, ...)

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            error = response.json()["error"]
            retry_after = error["details"].get("retry_after", 1)

            # Exponential backoff
            wait = retry_after * (2 ** attempt)
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
        else:
            raise APIError(response)

    raise MaxRetriesExceeded()

Burst Limits

Short bursts are allowed to accommodate bursty workloads:

  • Up to 2x the rate limit for 10 seconds
  • After burst window, rate is throttled to normal limit
  • Burst allowance resets every minute

Example: Tier 1 (60 req/min) can burst to 120 req/min for 10 seconds.

Circuit Breaker

VAP automatically pauses agents exhibiting anomalous behavior to protect against runaway costs:

Trigger Conditions

Condition Threshold Action
Spend spike Hourly spend >3x average Pause agent
High failure rate >50% (last 10 tasks) Pause agent
Consecutive failures 5 in a row Pause agent

Paused Agent Response

When circuit breaker triggers, all requests return 403:

{
  "error": {
    "code": "agent_paused",
    "message": "Agent is paused due to anomaly detection",
    "details": {
      "reason": "Spend spike detected (3x average)",
      "paused_at": "2026-01-09T12:00:00Z",
      "contact": "support@vapagent.com"
    }
  }
}

Recovery

To unpause an agent:

  • Contact support@vapagent.com with your agent ID
  • Explain the cause of the anomaly
  • Support will review and manually unpause

Daily Spend Limits

Separate from rate limits, daily spend caps prevent unexpected bills:

Tier Daily Spend Cap Behavior
0 $0 No execution allowed
1 $10 Requests rejected after limit
2 $100 Requests rejected after limit
3 $1000 Requests rejected after limit

When daily spend limit is reached:

{
  "error": {
    "code": "daily_spend_limit_exceeded",
    "message": "Daily spend limit reached",
    "details": {
      "limit": "10.00",
      "spent_today": "10.02",
      "resets_at": "2026-01-10T00:00:00Z"
    }
  }
}

Best Practices

1. Monitor Rate Limit Headers

response = requests.post(url, ...)

remaining = int(response.headers.get("X-RateLimit-Remaining", 0))
if remaining < 10:
    print(f"Warning: Only {remaining} requests left")
    # Slow down request rate

2. Implement Client-Side Rate Limiting

Don't rely solely on server-side limits. Implement your own rate limiting:

import time
from collections import deque

class RateLimiter:
    def __init__(self, max_per_minute):
        self.max_per_minute = max_per_minute
        self.requests = deque()

    def wait_if_needed(self):
        now = time.time()

        # Remove requests older than 1 minute
        while self.requests and self.requests[0] < now - 60:
            self.requests.popleft()

        if len(self.requests) >= self.max_per_minute:
            sleep_time = 60 - (now - self.requests[0])
            time.sleep(sleep_time)

        self.requests.append(time.time())

# Usage
limiter = RateLimiter(60)  # Tier 1 limit
limiter.wait_if_needed()
response = requests.post(...)

3. Upgrade Tier for Higher Limits

If you consistently hit rate limits, consider upgrading:

  • Tier 1 → Tier 2: Deposit $100, complete 50 tasks (5x rate limit)
  • Tier 2 → Tier 3: Deposit $1000, complete 500 tasks (3.3x rate limit)

Need Help?

If you need higher limits or have questions: