Skip to main content
Understanding rate limits and pagination helps you build robust integrations with the Avala API.

Rate Limits

Default Limits

ScopeLimitDescription
Authenticated requests100/minPer-user rate for all standard API endpoints
Burst20/secPer-user burst protection to prevent request spikes
Anonymous requests30/minFor unauthenticated requests (e.g., health checks)
Inference10/minPer-user rate for AI inference endpoints (/inference/invoke/)
Upload requests10Concurrent upload connections
Export requests5Concurrent export jobs
Rate limits are configurable per deployment. The values above are defaults. Always check the X-RateLimit-* response headers for your current limits.

Rate Limit Headers

All responses include rate limit headers so you can track your usage programmatically.
HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the current window
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp when the window resets
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1705312260

Handling Rate Limits

When rate limited, the API returns 429 Too Many Requests with a Retry-After header:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{
  "detail": "Request was throttled. Expected available in 30 seconds."
}
Implement exponential backoff to handle rate limits gracefully:
import time
import requests

def fetch_with_retry(url, headers, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 429:
            retry_after = response.headers.get("Retry-After")
            wait_time = int(retry_after) if retry_after else 2 ** attempt
            print(f"Rate limited. Retrying in {wait_time}s...")
            time.sleep(wait_time)
            continue

        response.raise_for_status()
        return response

    raise Exception("Max retries exceeded")
Always respect Retry-After headers when present. Ignoring rate limits may result in your API key being temporarily suspended.

Pagination

List endpoints use cursor-based pagination. Each response includes a next URL that you follow to retrieve the next page of results.

Query Parameters

ParameterTypeDescription
cursorstringPagination cursor from a previous response
limitintegerNumber of results per page (default varies by endpoint)

Response Format

{
  "next": "https://api.avala.ai/api/v1/datasets/johndoe/list/?cursor=cD0yMDI0...",
  "previous": null,
  "results": [...]
}
FieldTypeDescription
nextstring | nullURL for the next page, or null if this is the last page
previousstring | nullURL for the previous page, or null if this is the first page
resultsarrayArray of resource objects for the current page

Pagination Example

import requests

BASE_URL = "https://api.avala.ai/api/v1"
headers = {"X-Avala-Api-Key": "YOUR_API_KEY"}

def fetch_all_datasets(owner):
    all_datasets = []
    url = f"{BASE_URL}/datasets/{owner}/list/"

    while url:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        data = response.json()

        all_datasets.extend(data["results"])
        url = data.get("next")

    return all_datasets

Best Practices

Respect Rate Limits

  • Check X-RateLimit-Remaining before making bursts of requests
  • Implement exponential backoff with jitter when limits are reached
  • Spread requests over time for bulk operations instead of sending them all at once

Efficient Pagination

  • Use the next URL directly rather than constructing cursor values manually
  • Process results as you paginate instead of loading everything into memory
  • Set a reasonable limit parameter to balance between fewer requests and smaller payloads

Caching

  • Cache responses for resources that change infrequently (e.g., dataset metadata, project configurations)
  • Use the updated_at timestamp to determine when cached data is stale
  • Avoid caching paginated list responses since the underlying data may change between requests

Concurrent Requests

  • Stay within concurrent connection limits for uploads (10) and exports (5)
  • Use a semaphore or connection pool to manage concurrent requests in your application
  • Queue requests that exceed concurrency limits rather than dropping them