Documentation Index
Fetch the complete documentation index at: https://avala.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Best practices for getting the most out of the Avala platform — covering data management, API usage patterns, annotation workflow design, and cost optimization.
Data Management
Organize Datasets by Purpose
Structure your datasets around how they will be used, not just how the raw data is organized.
| Pattern | When to Use | Example |
|---|
| By task type | Different annotation workflows | pedestrian-detection, lane-segmentation |
| By data source | Multiple cameras or collection runs | front-camera-2026-02, lidar-top-2026-02 |
| By model version | Training successive model iterations | training-v1, training-v2, validation |
| By priority | Triage incoming data | urgent-review, standard-queue, backlog |
Use slices to create virtual subsets of a dataset without duplicating data. This is cheaper and more flexible than creating separate datasets.
Optimize File Sizes
Large files slow down uploads, viewer loading, and annotator productivity.
| Data Type | Recommended Max | Format Tips |
|---|
| Images | 20 MB | Use JPEG at 85-95% quality for photos; PNG only for diagrams or screenshots |
| Video | 2 GB | H.264 codec, 1080p resolution is sufficient for most annotation tasks |
| Point clouds | 500 MB per frame | Downsample to relevant density; remove ground points if not needed |
| MCAP bags | 5 GB | Split long recordings into shorter segments (2-5 minutes) |
| Gaussian Splats | 500 MB | Use compressed PLY format |
Use Cloud Storage for Large Datasets
For datasets over 10,000 items or 100 GB total, use cloud storage integration instead of direct uploads. Benefits:
- No data transfer: Avala reads directly from your S3 or GCS bucket
- Your encryption: Data stays encrypted with your KMS keys
- Your retention: Control lifecycle policies independently
- Faster onboarding: No upload step — just point Avala to your bucket
API Usage
Paginate Large Result Sets
Never fetch all records in a single request. Use cursor-based pagination to iterate through results efficiently.
from avala import Client
client = Client()
# Iterate through all datasets automatically
page = client.datasets.list()
for dataset in page:
print(dataset.name)
# The SDK handles pagination internally via CursorPage
Respect Rate Limits
Avala enforces per-endpoint rate limits. Build retry logic into your integration from the start — don’t wait for production traffic to hit limits.
import time
from avala import Client
from avala.errors import RateLimitError
client = Client()
def fetch_with_retry(func, max_retries=3):
for attempt in range(max_retries):
try:
return func()
except RateLimitError as e:
wait = e.retry_after or (2 ** attempt)
time.sleep(wait)
raise Exception("Max retries exceeded")
# Usage
dataset = fetch_with_retry(lambda: client.datasets.get("dataset-uid"))
See Rate Limits for per-endpoint limits and detailed retry examples.
Use Exports for Bulk Data Retrieval
Don’t loop through individual items to download annotations. Use the export API to generate a single export file containing all annotations for a dataset or project.
from avala import Client
client = Client()
# Create an export (much faster than fetching items individually)
export = client.exports.create(dataset="your-dataset-uid")
print(f"Export status: {export.status}")
print(f"Download URL: {export.download_url}")
Annotation Workflows
Design Projects with Clear Instructions
Well-defined annotation guidelines reduce rework and improve consistency.
Effective project setup checklist:
- Label taxonomy: Define all labels before annotating. Adding labels mid-project creates inconsistency.
- Examples: Provide 5-10 annotated examples for each label class, covering edge cases.
- Edge case rules: Document what to do with partially occluded objects, truncated objects at image boundaries, and ambiguous cases.
- Quality bar: Define what “good enough” looks like — perfect pixel-level accuracy is not always necessary.
Use Multi-Stage Review Pipelines
For production annotation workflows, use a multi-stage review pipeline:
Annotation → Spot Check (10-20%) → Targeted Review → Approved
- Spot check: Randomly review 10-20% of submissions to identify systemic issues
- Targeted review: Focus reviews on annotations flagged by AutoTag or low-confidence predictions
- Full review: Reserve for high-value or safety-critical datasets
See Quality Control for detailed setup.
Leverage Consensus for Validation
For critical datasets, have multiple annotators label the same items independently. Consensus scoring identifies:
- Items where annotators disagree (review these first)
- Annotators who consistently deviate from the group
- Label classes that are ambiguously defined
Batch Work Effectively
Group work into batches of 100-500 items for optimal throughput:
| Batch Size | Pros | Cons |
|---|
| < 50 items | Quick turnaround | High overhead per item |
| 100-500 items | Good balance of throughput and review cycles | — |
| > 1000 items | Fewest batches to manage | Long wait for review; hard to catch errors early |
See Work Batches for batch management.
Optimize Upload Throughput
For large dataset uploads, parallelize your upload requests. See Performance Tuning for detailed concurrency recommendations and code examples.
- Export by dataset, not by individual items
- Use COCO format for the fastest export generation
- For very large datasets (100K+ items), exports run asynchronously — poll the export status instead of waiting synchronously
Monitor with the MCP Server
Use the MCP server to monitor your workflows from your IDE or AI assistant:
"List my datasets and their item counts"
"Show recent exports for project X"
"What's the task completion rate for project Y?"
Cost Management
Right-Size Your Data
Not all data needs annotation. Filter before annotating:
- Remove duplicates: Deduplicate images/frames before uploading
- Sample strategically: For video, annotate every Nth frame instead of every frame (common: every 5th or 10th frame)
- Use active learning: Prioritize items where the model is least confident, not random sampling
- Pre-filter with models: Use Batch Auto-Labeling to auto-label easy cases and focus human annotation on hard cases
Minimize API Calls
| Instead of | Do this |
|---|
| Fetching items one at a time | Use list endpoints with pagination |
| Polling export status in a tight loop | Use exponential backoff (1s, 2s, 4s, 8s) |
| Re-fetching unchanged data | Cache responses with ETags or timestamps |
| Downloading all annotations | Use export API for bulk retrieval |
Security
Protect Your API Keys
- Store keys in environment variables, never in source code
- Rotate keys periodically (generate new key, update integrations, delete old key)
- Use separate keys for development and production
- Keys are only displayed once at creation — store them securely
See Authentication for key management details.
Use Cloud Storage with Least Privilege
When connecting S3 or GCS buckets, grant only the permissions Avala needs:
- Read-only for datasets:
s3:GetObject, s3:ListBucket
- Read-write for exports: Add
s3:PutObject
- Never grant
s3:DeleteObject unless absolutely necessary
See Cloud Storage for IAM policy templates.
Next Steps