Documentation Index Fetch the complete documentation index at: https://avala.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Avala produces labeled datasets. Your training pipeline consumes them. This page covers how to connect the two — from exporting annotations in the right format to building automated training loops that re-import model predictions for active learning.
Avala supports exporting annotations in standard ML formats. Choose the format that matches your training framework.
Format Annotation Types Frameworks Avala JSON All types Custom pipelines, Avala SDK COCO Bounding boxes, polygons, keypoints, segmentation Detectron2, MMDetection, PyTorch YOLO Bounding boxes Ultralytics, YOLOv5/v8 Pascal VOC Bounding boxes TensorFlow, older pipelines Segmentation masks Semantic/instance segmentation Any framework (PNG masks) KITTI 3D cuboids, 2D boxes Autonomous driving pipelines
Creating an Export
Python SDK
TypeScript SDK
cURL
from avala import Client
client = Client()
export = client.exports.create(
project = "prj_abc123" ,
format = "coco" ,
include_approved_only = True
)
# Poll for completion
export = client.exports.get(export.uid)
print ( f "Status: { export.status } " )
print ( f "Download: { export.download_url } " )
Export Filtering
Control exactly which annotations are included in your export:
Filter Description include_approved_onlyOnly include annotations that passed QC review dataset_uidsLimit export to specific datasets within the project slice_uidsExport only items in specific slices label_filterInclude only specific object classes
PyTorch Integration
Loading COCO Exports with torchvision
import torch
from torchvision.datasets import CocoDetection
from torchvision import transforms
# Download and extract your Avala COCO export
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize( mean = [ 0.485 , 0.456 , 0.406 ],
std = [ 0.229 , 0.224 , 0.225 ])
])
dataset = CocoDetection(
root = "exports/avala-coco/images" ,
annFile = "exports/avala-coco/annotations.json" ,
transform = transform
)
dataloader = torch.utils.data.DataLoader(
dataset,
batch_size = 16 ,
shuffle = True ,
num_workers = 4 ,
collate_fn = lambda x : tuple ( zip ( * x))
)
for images, targets in dataloader:
# Train your model
pass
Loading with Detectron2
from detectron2.data.datasets import register_coco_instances
from detectron2.config import get_cfg
from detectron2.engine import DefaultTrainer
# Register your Avala export as a Detectron2 dataset
register_coco_instances(
"avala_train" ,
{},
"exports/avala-coco/annotations.json" ,
"exports/avala-coco/images"
)
cfg = get_cfg()
cfg. DATASETS . TRAIN = ( "avala_train" ,)
cfg. DATALOADER . NUM_WORKERS = 4
cfg. MODEL . ROI_HEADS . NUM_CLASSES = 5 # Match your Avala label count
trainer = DefaultTrainer(cfg)
trainer.resume_or_load( resume = False )
trainer.train()
For a complete PyTorch training example, see the PyTorch framework guide .
Hugging Face Integration
Loading with Hugging Face Datasets
from datasets import load_dataset
# Load COCO-format export
dataset = load_dataset(
"json" ,
data_files = "exports/avala-coco/annotations.json"
)
# Or load directly from Avala using a custom script
from avala import Client
client = Client()
export = client.exports.create(
project = "prj_abc123" ,
format = "coco" ,
include_approved_only = True
)
# Wait for export, then load
# See the Hugging Face guide for a full example
For detailed Hugging Face integration, see the Hugging Face framework guide .
Training Loop Automation
End-to-End Pipeline
Combine Avala exports with webhooks to trigger training automatically when new annotations are approved.
Annotators submit work
→ QC review approves annotations
→ Webhook fires: export.completed
→ Your pipeline downloads the export
→ Model trains on new data
→ Model predictions imported back to Avala
→ Annotators review and correct predictions
→ Repeat
Webhook-Triggered Training
# webhook_handler.py
from flask import Flask, request
from avala import Client
import subprocess
app = Flask( __name__ )
client = Client()
@app.route ( "/webhook" , methods = [ "POST" ])
def handle_webhook ():
event = request.json
if event[ "event_type" ] == "export.completed" :
export_uid = event[ "data" ][ "export_uid" ]
export = client.exports.get(export_uid)
# Download the export
subprocess.run([
"wget" , "-O" , "latest_export.zip" ,
export.download_url
])
# Trigger training
subprocess.run([
"python" , "train.py" ,
"--data" , "latest_export.zip"
])
return { "status" : "ok" }
Scheduling Periodic Exports
For pipelines that do not need real-time triggers, schedule periodic exports:
# scheduled_export.py
from avala import Client
import time
client = Client()
def export_and_download ( project_uid : str , output_path : str ) -> str :
"""Create an export and wait for it to complete."""
export = client.exports.create(
project = project_uid,
format = "coco" ,
include_approved_only = True
)
# Poll until complete
while export.status != "completed" :
time.sleep( 10 )
export = client.exports.get(export.uid)
if export.status == "failed" :
raise RuntimeError ( f "Export failed: { export.uid } " )
return export.download_url
Active Learning Loop
Use model predictions to prioritize which data gets annotated next, creating a feedback loop between your model and your annotation team.
How It Works
Train an initial model on a small labeled dataset
Run inference on unlabeled data
Score uncertainty — identify items where the model is least confident
Import predictions into Avala as pre-annotations
Prioritize uncertain items for human annotation using work batches
Annotators review and correct the model predictions (faster than labeling from scratch)
Export the corrected annotations and retrain
Importing Model Predictions
Use batch auto-labeling to import model predictions as pre-annotations:
from avala import Client
client = Client()
# After running inference, import predictions
# See the Batch Auto-Labeling guide for format details
Measuring Improvement
Track these metrics across active learning iterations:
Metric Description Goal Model mAP Mean average precision on held-out test set Increasing each iteration Annotation time per item Average time annotators spend per item Decreasing (pre-annotations save time) Correction rate % of pre-annotations that need human correction Decreasing each iteration Items labeled per iteration Number of new items added to training set Depends on budget
Dataset Versioning
Keep track of which data was used to train which model.
Using Slices for Versioning
Slices let you create named subsets of a dataset without duplicating data:
from avala import Client
client = Client()
# Create a slice for training data v1
slice_v1 = client.slices.create(
name = "training-v1" ,
dataset_uid = "ds_abc123"
)
# Add items to the slice
client.slices.add_items(
slice_uid = slice_v1.uid,
item_uids = [ "itm_001" , "itm_002" , "itm_003" ]
)
# Export only this slice for training
export = client.exports.create(
project = "prj_abc123" ,
format = "coco" ,
slice_uids = [slice_v1.uid]
)
Versioning Best Practices
Practice Benefit Create a new slice for each training run Reproducible experiments Include the model version in the slice name Easy cross-reference Export with include_approved_only=True Only train on reviewed data Keep a held-out test slice Consistent evaluation across versions
Next Steps
PyTorch Guide
Complete integration guide for PyTorch and Detectron2.
Hugging Face Guide
Load Avala exports into Hugging Face Datasets and Transformers.
Batch Auto-Labeling
Import model predictions as pre-annotations for review.
Exports API
Full API reference for creating and managing exports.