Understanding rate limits and pagination helps you build robust integrations with the Avala API.
Rate Limits
Default Limits
| Scope | Limit | Description |
|---|
| Authenticated requests | 100/min | Per-user rate for all standard API endpoints |
| Burst | 20/sec | Per-user burst protection to prevent request spikes |
| Anonymous requests | 30/min | For unauthenticated requests (e.g., health checks) |
| Inference | 10/min | Per-user rate for AI inference endpoints (/inference/invoke/) |
| Upload requests | 10 | Concurrent upload connections |
| Export requests | 5 | Concurrent export jobs |
Rate limits are configurable per deployment. The values above are defaults. Always check the X-RateLimit-* response headers for your current limits.
All responses include rate limit headers so you can track your usage programmatically.
| Header | Description |
|---|
X-RateLimit-Limit | Maximum requests allowed in the current window |
X-RateLimit-Remaining | Requests remaining in the current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1705312260
Handling Rate Limits
When rate limited, the API returns 429 Too Many Requests with a Retry-After header:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/json
{
"detail": "Request was throttled. Expected available in 30 seconds."
}
Implement exponential backoff to handle rate limits gracefully:
import time
import requests
def fetch_with_retry(url, headers, max_retries=5):
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 429:
retry_after = response.headers.get("Retry-After")
wait_time = int(retry_after) if retry_after else 2 ** attempt
print(f"Rate limited. Retrying in {wait_time}s...")
time.sleep(wait_time)
continue
response.raise_for_status()
return response
raise Exception("Max retries exceeded")
Always respect Retry-After headers when present. Ignoring rate limits may result in your API key being temporarily suspended.
List endpoints use cursor-based pagination. Each response includes a next URL that you follow to retrieve the next page of results.
Query Parameters
| Parameter | Type | Description |
|---|
cursor | string | Pagination cursor from a previous response |
limit | integer | Number of results per page (default varies by endpoint) |
{
"next": "https://api.avala.ai/api/v1/datasets/johndoe/list/?cursor=cD0yMDI0...",
"previous": null,
"results": [...]
}
| Field | Type | Description |
|---|
next | string | null | URL for the next page, or null if this is the last page |
previous | string | null | URL for the previous page, or null if this is the first page |
results | array | Array of resource objects for the current page |
import requests
BASE_URL = "https://api.avala.ai/api/v1"
headers = {"X-Avala-Api-Key": "YOUR_API_KEY"}
def fetch_all_datasets(owner):
all_datasets = []
url = f"{BASE_URL}/datasets/{owner}/list/"
while url:
response = requests.get(url, headers=headers)
response.raise_for_status()
data = response.json()
all_datasets.extend(data["results"])
url = data.get("next")
return all_datasets
Best Practices
Respect Rate Limits
- Check
X-RateLimit-Remaining before making bursts of requests
- Implement exponential backoff with jitter when limits are reached
- Spread requests over time for bulk operations instead of sending them all at once
- Use the
next URL directly rather than constructing cursor values manually
- Process results as you paginate instead of loading everything into memory
- Set a reasonable
limit parameter to balance between fewer requests and smaller payloads
Caching
- Cache responses for resources that change infrequently (e.g., dataset metadata, project configurations)
- Use the
updated_at timestamp to determine when cached data is stale
- Avoid caching paginated list responses since the underlying data may change between requests
Concurrent Requests
- Stay within concurrent connection limits for uploads (10) and exports (5)
- Use a semaphore or connection pool to manage concurrent requests in your application
- Queue requests that exceed concurrency limits rather than dropping them