Rate Limits

Understanding rate limits and pagination helps you build robust integrations with the Avala API.

Default Limits

Scope	Limit	Description
Authenticated requests	100/min	Per-user rate for all standard API endpoints
Burst	20/sec	Per-user burst protection to prevent request spikes
Anonymous requests	30/min	For unauthenticated requests (e.g., health checks)
Inference	10/min	Per-user rate for AI inference endpoints (`/inference/invoke/`)
Upload requests	10	Concurrent upload connections
Export requests	5	Concurrent export jobs

Rate limits are configurable per deployment. The values above are defaults. Always check the X-RateLimit-* response headers for your current limits.

Rate Limit Headers

All responses include rate limit headers so you can track your usage programmatically.

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed in the current window
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1705312260

Handling Rate Limits

When rate limited, the API returns 429 Too Many Requests with a Retry-After header:

HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json

{
  "detail": "Request was throttled. Expected available in 30 seconds."
}

Implement exponential backoff to handle rate limits gracefully:

import time
import requests

def fetch_with_retry(url, headers, max_retries=5):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 429:
            retry_after = response.headers.get("Retry-After")
            wait_time = int(retry_after) if retry_after else 2 ** attempt
            print(f"Rate limited. Retrying in {wait_time}s...")
            time.sleep(wait_time)
            continue

        response.raise_for_status()
        return response

    raise Exception("Max retries exceeded")

Always respect Retry-After headers when present. Ignoring rate limits may result in your API key being temporarily suspended.

Pagination

List endpoints use cursor-based pagination. Each response includes a next URL that you follow to retrieve the next page of results.

Query Parameters

Parameter	Type	Description
`cursor`	string	Pagination cursor from a previous response
`limit`	integer	Number of results per page (default varies by endpoint)

Response Format

{
  "next": "https://api.avala.ai/api/v1/datasets/johndoe/list/?cursor=cD0yMDI0...",
  "previous": null,
  "results": [...]
}

Field	Type	Description
`next`	string \| null	URL for the next page, or `null` if this is the last page
`previous`	string \| null	URL for the previous page, or `null` if this is the first page
`results`	array	Array of resource objects for the current page

Pagination Example

import requests

BASE_URL = "https://api.avala.ai/api/v1"
headers = {"X-Avala-Api-Key": "YOUR_API_KEY"}

def fetch_all_datasets(owner):
    all_datasets = []
    url = f"{BASE_URL}/datasets/{owner}/list/"

    while url:
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        data = response.json()

        all_datasets.extend(data["results"])
        url = data.get("next")

    return all_datasets

Best Practices

Respect Rate Limits

Check X-RateLimit-Remaining before making bursts of requests
Implement exponential backoff with jitter when limits are reached
Spread requests over time for bulk operations instead of sending them all at once

Efficient Pagination

Use the next URL directly rather than constructing cursor values manually
Process results as you paginate instead of loading everything into memory
Set a reasonable limit parameter to balance between fewer requests and smaller payloads

Caching

Cache responses for resources that change infrequently (e.g., dataset metadata, project configurations)
Use the updated_at timestamp to determine when cached data is stale
Avoid caching paginated list responses since the underlying data may change between requests

Concurrent Requests

Stay within concurrent connection limits for uploads (10) and exports (5)
Use a semaphore or connection pool to manage concurrent requests in your application
Queue requests that exceed concurrency limits rather than dropping them

Overview

Datasets

Projects

Tasks

Exports

Organizations

Slices

Agents

Webhooks

Storage Configs

Models

Fleet

Tools

agents

exports

fleet

inference-providers

models

organizations

projects

slices

storage-configs

tasks

webhook-deliveries

webhooks

Rate Limits

Rate Limits

Default Limits

Rate Limit Headers

Handling Rate Limits

Query Parameters

Response Format

Best Practices

Respect Rate Limits

Caching

Concurrent Requests

Overview

Datasets

Projects

Tasks

Exports

Organizations

Slices

Agents

Webhooks

Storage Configs

Models

Fleet

Tools

agents

exports

fleet

inference-providers

models

organizations

projects

slices

storage-configs

tasks

webhook-deliveries

webhooks

Documentation Index

​Rate Limits

​Default Limits

​Rate Limit Headers

​Handling Rate Limits

​Pagination

​Query Parameters

​Response Format

​Pagination Example

​Best Practices

​Respect Rate Limits

​Efficient Pagination

​Caching

​Concurrent Requests

Rate Limits

Default Limits

Rate Limit Headers

Handling Rate Limits

Pagination

Query Parameters

Response Format

Pagination Example

Best Practices

Respect Rate Limits

Efficient Pagination

Caching

Concurrent Requests