Skip to main content

Documentation Index

Fetch the complete documentation index at: https://avala.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Avala provides multiple ways to ingest data depending on your dataset size, infrastructure, and automation needs. This page covers each import method, when to use it, and how to build automated data pipelines.

Import Methods Overview

MethodBest ForMax SizeAutomationSetup
Mission Control uploadSmall datasets, one-off imports10 GB per userManualNone
Presigned URL uploadProgrammatic uploads from any language10 GB per userFullAPI key
Cloud storage (S3/GCS)Large datasets, zero-copy accessUnlimitedFullBucket config
MCAP importMulti-sensor robotics data10 GB per fileFullAPI key
SDK bulk uploadMedium datasets with progress tracking10 GB per userFullSDK installed

Mission Control Upload

The simplest way to get data into Avala. Drag and drop files directly in the web interface.

Steps

  1. Go to Mission Control > Datasets > Create Dataset
  2. Name your dataset and select the data type
  3. Drag files into the upload area or click Browse
  4. Wait for processing to complete

Limitations

  • Browser-based upload is limited by your connection speed and browser memory
  • Not suitable for datasets with more than 1,000 files
  • No resumable uploads — interrupted uploads must restart
For datasets larger than a few hundred files, use the SDK or presigned URL approach instead.

Presigned URL Upload

Presigned URLs let you upload files directly to Avala’s storage from any HTTP client. This is the most flexible programmatic upload method and works from any language or tool that can make HTTP requests.

How It Works

  1. Request a presigned upload URL from the Avala API
  2. Upload your file directly to the presigned POST URL
  3. Create the dataset from the uploaded files

Example: Upload with cURL

# Step 1: Get a presigned upload URL
curl -X POST https://api.avala.ai/api/v1/datasets/manual-upload/file-upload-url/ \
  -H "X-Avala-Api-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset_name": "robot-run-001",
    "file_path_in_dataset": "frame_001.jpg",
    "content_length": 1024000
  }'

# Response:
# { "method": "POST", "url": "https://s3.amazonaws.com/...", "fields": { ... } }

# Step 2: Upload the file with the returned POST fields
curl -X POST "https://s3.amazonaws.com/..." \
  -F "key=..." \
  -F "policy=..." \
  -F "file=@frame_001.jpg"

# Step 3: Create the dataset from uploaded files
curl -X POST https://api.avala.ai/api/v1/datasets/manual-upload/ \
  -H "X-Avala-Api-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "robot-run-001",
    "slug": "robot-run-001",
    "data_type": "image",
    "visibility": "private",
    "industry": 123,
    "license": 456
  }'

Example: Upload with the CLI

# Handles presigned URLs, direct storage upload, and dataset creation.
# Limit: 10 GiB total per local-upload dataset.
avala datasets upload \
  --source data/images \
  --name robot-run-001 \
  --slug robot-run-001 \
  --data-type image \
  --industry 123 \
  --license 456

Cloud Storage Integration

For large-scale datasets, connect your own S3 or GCS bucket so Avala reads data directly from your storage — no file transfers, no copies.

When to Use Cloud Storage

ScenarioUse Cloud Storage?
Dataset > 10,000 itemsYes
Dataset > 100 GB totalYes
Data must stay in your infrastructureYes
Quick prototype with < 100 itemsNo — direct upload is faster
Data is spread across multiple bucketsYes — connect multiple storage configs

Setup

  1. Configure your bucket with the appropriate IAM policy (see Cloud Storage guide)
  2. Add the storage configuration in Mission Control > Settings > Storage
  3. Create a dataset and select your connected storage as the data source
  4. Reference items by their storage paths

Example: Create Dataset from S3

from avala import Client

client = Client()

# Create a dataset backed by cloud storage
dataset = client.datasets.create(
    name="driving-data-2026-02",
    data_type="image",
    storage_config_uid="stg_your_config_uid"
)

# Register items by their S3 paths
items = [
    {"path": "s3://your-bucket/captures/frame_001.jpg"},
    {"path": "s3://your-bucket/captures/frame_002.jpg"},
    {"path": "s3://your-bucket/captures/frame_003.jpg"},
]

for item in items:
    client.datasets.create_item(
        dataset_uid=dataset.uid,
        source_url=item["path"]
    )
Cloud storage datasets load faster in the annotation editor because images are served directly from your bucket’s region, avoiding cross-region transfers.

MCAP Import

MCAP files contain synchronized multi-sensor data (cameras, LiDAR, IMU). Avala parses MCAP files to extract and align sensor streams for annotation.

Supported Message Types

Message TypeDescription
sensor_msgs/ImageCamera images
sensor_msgs/CompressedImageCompressed camera images
sensor_msgs/PointCloud2LiDAR point clouds
sensor_msgs/ImuIMU readings
geometry_msgs/TransformStampedSensor transforms (TF)
sensor_msgs/NavSatFixGPS coordinates

Import Workflow

  1. Upload MCAP files via the SDK or presigned URLs
  2. Avala processes the file, extracting camera frames and point cloud scans
  3. Sensor streams are synchronized by timestamp
  4. Camera images and projected LiDAR data appear together in the annotation editor
For detailed MCAP setup, see the MCAP / ROS integration guide.

Building Import Pipelines

For production workflows, automate data ingestion so new data flows into Avala as it is collected.

Pipeline Architecture

Data Source                  Avala
┌──────────────┐            ┌─────────────────┐
│ Collection   │            │ Dataset          │
│ System       │──upload──→ │ (items created)  │
│ (cameras,    │            │                  │
│  sensors)    │            │ Project          │
└──────────────┘            │ (tasks assigned) │
                            └────────┬────────┘

                              webhook │

                            ┌─────────────────┐
                            │ Your Pipeline    │
                            │ (export, train)  │
                            └─────────────────┘

Example: Automated Ingestion with Webhooks

Combine the CLI upload with webhooks to build a fully automated pipeline:
# upload_pipeline.py
import os
import subprocess
from datetime import datetime

INDUSTRY_ID = os.environ["AVALA_INDUSTRY_ID"]
LICENSE_ID = os.environ["AVALA_LICENSE_ID"]

def ingest_batch(data_directory: str) -> str:
    """Upload a directory snapshot and create a new Avala dataset."""
    batch_name = f"camera-batch-{datetime.utcnow():%Y%m%d-%H%M%S}"

    subprocess.run(
        [
            "avala", "datasets", "upload",
            "--source", data_directory,
            "--name", batch_name,
            "--slug", batch_name,
            "--data-type", "image",
            "--industry", INDUSTRY_ID,
            "--license", LICENSE_ID,
        ],
        check=True,
    )
    return batch_name

if __name__ == "__main__":
    dataset_name = ingest_batch("/data/incoming")
    print(f"Created dataset {dataset_name}")
Schedule this script with cron, Airflow, or any task scheduler to periodically ingest new data.

Example: Watch Directory and Upload

#!/bin/bash
# watch_and_upload.sh - Upload new files as they appear

WATCH_DIR="/data/incoming"
INDUSTRY_ID="123"
LICENSE_ID="456"

inotifywait -m -e create "$WATCH_DIR" --format '%f' | while read filename; do
    if [[ "$filename" == *.jpg || "$filename" == *.png ]]; then
        dataset_name="camera-file-$(date +%Y%m%d-%H%M%S)"
        avala datasets upload \
          --source "$WATCH_DIR/$filename" \
          --name "$dataset_name" \
          --slug "$dataset_name" \
          --data-type image \
          --industry "$INDUSTRY_ID" \
          --license "$LICENSE_ID"
        echo "Created dataset from: $filename"
    fi
done

Choosing an Import Method

Use this decision tree to select the right approach:
QuestionIf YesIf No
Fewer than 100 files?Mission Control uploadContinue
Data already in S3/GCS?Cloud storage integrationContinue
MCAP or ROS bag files?MCAP importContinue
Need automation?SDK bulk upload or presigned URLsMission Control upload
Using Python or TypeScript?SDK bulk uploadPresigned URL (any language)

Next Steps

Cloud Storage

Detailed S3 and GCS configuration for bring-your-own-storage.

MCAP / ROS

Import multi-sensor recordings with camera, LiDAR, and IMU data.

Python SDK

Install the Python SDK and start uploading data programmatically.

Webhooks

Set up event notifications to trigger downstream pipelines.