Best Practices

Best practices for getting the most out of the Avala platform — covering data management, API usage patterns, annotation workflow design, and cost optimization.

Data Management

Organize Datasets by Purpose

Structure your datasets around how they will be used, not just how the raw data is organized.

Pattern	When to Use	Example
By task type	Different annotation workflows	`pedestrian-detection`, `lane-segmentation`
By data source	Multiple cameras or collection runs	`front-camera-2026-02`, `lidar-top-2026-02`
By model version	Training successive model iterations	`training-v1`, `training-v2`, `validation`
By priority	Triage incoming data	`urgent-review`, `standard-queue`, `backlog`

Use slices to create virtual subsets of a dataset without duplicating data. This is cheaper and more flexible than creating separate datasets.

Optimize File Sizes

Large files slow down uploads, viewer loading, and annotator productivity.

Data Type	Recommended Max	Format Tips
Images	20 MB	Use JPEG at 85-95% quality for photos; PNG only for diagrams or screenshots
Video	2 GB	H.264 codec, 1080p resolution is sufficient for most annotation tasks
Point clouds	500 MB per frame	Downsample to relevant density; remove ground points if not needed
MCAP bags	5 GB	Split long recordings into shorter segments (2-5 minutes)
Gaussian Splats	500 MB	Use compressed PLY format

Use Cloud Storage for Large Datasets

For datasets over 10,000 items or 100 GB total, use cloud storage integration instead of direct uploads. Benefits:

No data transfer: Avala reads directly from your S3 or GCS bucket
Your encryption: Data stays encrypted with your KMS keys
Your retention: Control lifecycle policies independently
Faster onboarding: No upload step — just point Avala to your bucket

API Usage

Paginate Large Result Sets

Never fetch all records in a single request. Use cursor-based pagination to iterate through results efficiently.

from avala import Client

client = Client()

# Iterate through all datasets automatically
page = client.datasets.list()
for dataset in page:
    print(dataset.name)

# The SDK handles pagination internally via CursorPage

Respect Rate Limits

Avala enforces per-endpoint rate limits. Build retry logic into your integration from the start — don’t wait for production traffic to hit limits.

import time
from avala import Client
from avala.errors import RateLimitError

client = Client()

def fetch_with_retry(func, max_retries=3):
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError as e:
            wait = e.retry_after or (2 ** attempt)
            time.sleep(wait)
    raise Exception("Max retries exceeded")

# Usage
dataset = fetch_with_retry(lambda: client.datasets.get("dataset-uid"))

See Rate Limits for per-endpoint limits and detailed retry examples.

Use Exports for Bulk Data Retrieval

Don’t loop through individual items to download annotations. Use the export API to generate a single export file containing all annotations for a dataset or project.

from avala import Client

client = Client()

# Create an export (much faster than fetching items individually)
export = client.exports.create(dataset="your-dataset-uid")

print(f"Export status: {export.status}")
print(f"Download URL: {export.download_url}")

Annotation Workflows

Design Projects with Clear Instructions

Well-defined annotation guidelines reduce rework and improve consistency. Effective project setup checklist:

Label taxonomy: Define all labels before annotating. Adding labels mid-project creates inconsistency.
Examples: Provide 5-10 annotated examples for each label class, covering edge cases.
Edge case rules: Document what to do with partially occluded objects, truncated objects at image boundaries, and ambiguous cases.
Quality bar: Define what “good enough” looks like — perfect pixel-level accuracy is not always necessary.

Use Multi-Stage Review Pipelines

For production annotation workflows, use a multi-stage review pipeline:

Annotation → Spot Check (10-20%) → Targeted Review → Approved

Spot check: Randomly review 10-20% of submissions to identify systemic issues
Targeted review: Focus reviews on annotations flagged by AutoTag or low-confidence predictions
Full review: Reserve for high-value or safety-critical datasets

See Quality Control for detailed setup.

Leverage Consensus for Validation

For critical datasets, have multiple annotators label the same items independently. Consensus scoring identifies:

Items where annotators disagree (review these first)
Annotators who consistently deviate from the group
Label classes that are ambiguously defined

Batch Work Effectively

Group work into batches of 100-500 items for optimal throughput:

Batch Size	Pros	Cons
< 50 items	Quick turnaround	High overhead per item
100-500 items	Good balance of throughput and review cycles	—
> 1000 items	Fewest batches to manage	Long wait for review; hard to catch errors early

See Work Batches for batch management.

Performance Optimization

Optimize Upload Throughput

For large dataset uploads, parallelize your upload requests. See Performance Tuning for detailed concurrency recommendations and code examples.

Optimize Export Performance

Export by dataset, not by individual items
Use COCO format for the fastest export generation
For very large datasets (100K+ items), exports run asynchronously — poll the export status instead of waiting synchronously

Monitor with the MCP Server

Use the MCP server to monitor your workflows from your IDE or AI assistant:

"List my datasets and their item counts"
"Show recent exports for project X"
"What's the task completion rate for project Y?"

Cost Management

Right-Size Your Data

Not all data needs annotation. Filter before annotating:

Remove duplicates: Deduplicate images/frames before uploading
Sample strategically: For video, annotate every Nth frame instead of every frame (common: every 5th or 10th frame)
Use active learning: Prioritize items where the model is least confident, not random sampling
Pre-filter with models: Use Batch Auto-Labeling to auto-label easy cases and focus human annotation on hard cases

Minimize API Calls

Instead of	Do this
Fetching items one at a time	Use list endpoints with pagination
Polling export status in a tight loop	Use exponential backoff (1s, 2s, 4s, 8s)
Re-fetching unchanged data	Cache responses with ETags or timestamps
Downloading all annotations	Use export API for bulk retrieval

Security

Protect Your API Keys

Store keys in environment variables, never in source code
Rotate keys periodically (generate new key, update integrations, delete old key)
Use separate keys for development and production
Keys are only displayed once at creation — store them securely

See Authentication for key management details.

Use Cloud Storage with Least Privilege

When connecting S3 or GCS buckets, grant only the permissions Avala needs:

Read-only for datasets: s3:GetObject, s3:ListBucket
Read-write for exports: Add s3:PutObject
Never grant s3:DeleteObject unless absolutely necessary

See Cloud Storage for IAM policy templates.

Next Steps

Quickstart — Get up and running in under 60 seconds
Examples — Code examples for common workflows
Rate Limits — Understand API limits and retry strategies
Cloud Storage — Connect your own S3 or GCS bucket

Resources

Framework Guides

Updates

Data Management

Organize Datasets by Purpose

Optimize File Sizes

Use Cloud Storage for Large Datasets

API Usage

Paginate Large Result Sets

Respect Rate Limits

Use Exports for Bulk Data Retrieval

Annotation Workflows

Design Projects with Clear Instructions

Use Multi-Stage Review Pipelines

Leverage Consensus for Validation

Batch Work Effectively

Performance Optimization

Optimize Upload Throughput

Optimize Export Performance

Monitor with the MCP Server

Cost Management

Right-Size Your Data

Minimize API Calls

Security

Protect Your API Keys

Use Cloud Storage with Least Privilege

Next Steps

Resources

Framework Guides

Updates

Documentation Index

​Data Management

​Organize Datasets by Purpose

​Optimize File Sizes

​Use Cloud Storage for Large Datasets

​API Usage

​Paginate Large Result Sets

​Respect Rate Limits

​Use Exports for Bulk Data Retrieval

​Annotation Workflows

​Design Projects with Clear Instructions

​Use Multi-Stage Review Pipelines

​Leverage Consensus for Validation

​Batch Work Effectively

​Performance Optimization

​Optimize Upload Throughput

​Optimize Export Performance

​Monitor with the MCP Server

​Cost Management

​Right-Size Your Data

​Minimize API Calls

​Security

​Protect Your API Keys

​Use Cloud Storage with Least Privilege

​Next Steps

Data Management

Organize Datasets by Purpose

Optimize File Sizes

Use Cloud Storage for Large Datasets

API Usage

Paginate Large Result Sets

Respect Rate Limits

Use Exports for Bulk Data Retrieval

Annotation Workflows

Design Projects with Clear Instructions

Use Multi-Stage Review Pipelines

Leverage Consensus for Validation

Batch Work Effectively

Performance Optimization

Optimize Upload Throughput

Optimize Export Performance

Monitor with the MCP Server

Cost Management

Right-Size Your Data

Minimize API Calls

Security

Protect Your API Keys

Use Cloud Storage with Least Privilege

Next Steps