Import Methods Overview
| Method | Best For | Max Size | Automation | Setup |
|---|---|---|---|---|
| Mission Control upload | Small datasets, one-off imports | 5 GB | Manual | None |
| Presigned URL upload | Programmatic uploads from any language | 5 GB per file | Full | API key |
| Cloud storage (S3/GCS) | Large datasets, zero-copy access | Unlimited | Full | Bucket config |
| MCAP import | Multi-sensor robotics data | 10 GB per file | Full | API key |
| SDK bulk upload | Medium datasets with progress tracking | 5 GB per file | Full | SDK installed |
Mission Control Upload
The simplest way to get data into Avala. Drag and drop files directly in the web interface.Steps
- Go to Mission Control > Datasets > Create Dataset
- Name your dataset and select the data type
- Drag files into the upload area or click Browse
- Wait for processing to complete
Limitations
- Browser-based upload is limited by your connection speed and browser memory
- Not suitable for datasets with more than 1,000 files
- No resumable uploads — interrupted uploads must restart
Presigned URL Upload
Presigned URLs let you upload files directly to Avala’s storage from any HTTP client. This is the most flexible programmatic upload method and works from any language or tool that can make HTTP requests.How It Works
- Request a presigned upload URL from the Avala API
- Upload your file directly to the presigned URL using an HTTP PUT request
- Confirm the upload to register the item in the dataset
Example: Upload with cURL
Example: Upload with Python SDK
Cloud Storage Integration
For large-scale datasets, connect your own S3 or GCS bucket so Avala reads data directly from your storage — no file transfers, no copies.When to Use Cloud Storage
| Scenario | Use Cloud Storage? |
|---|---|
| Dataset > 10,000 items | Yes |
| Dataset > 100 GB total | Yes |
| Data must stay in your infrastructure | Yes |
| Quick prototype with < 100 items | No — direct upload is faster |
| Data is spread across multiple buckets | Yes — connect multiple storage configs |
Setup
- Configure your bucket with the appropriate IAM policy (see Cloud Storage guide)
- Add the storage configuration in Mission Control > Settings > Storage
- Create a dataset and select your connected storage as the data source
- Reference items by their storage paths
Example: Create Dataset from S3
MCAP Import
MCAP files contain synchronized multi-sensor data (cameras, LiDAR, IMU). Avala parses MCAP files to extract and align sensor streams for annotation.Supported Message Types
| Message Type | Description |
|---|---|
sensor_msgs/Image | Camera images |
sensor_msgs/CompressedImage | Compressed camera images |
sensor_msgs/PointCloud2 | LiDAR point clouds |
sensor_msgs/Imu | IMU readings |
geometry_msgs/TransformStamped | Sensor transforms (TF) |
sensor_msgs/NavSatFix | GPS coordinates |
Import Workflow
- Upload MCAP files via the SDK or presigned URLs
- Avala processes the file, extracting camera frames and point cloud scans
- Sensor streams are synchronized by timestamp
- Camera images and projected LiDAR data appear together in the annotation editor
Building Import Pipelines
For production workflows, automate data ingestion so new data flows into Avala as it is collected.Pipeline Architecture
Example: Automated Ingestion with Webhooks
Combine the SDK upload with webhooks to build a fully automated pipeline:Example: Watch Directory and Upload
Choosing an Import Method
Use this decision tree to select the right approach:| Question | If Yes | If No |
|---|---|---|
| Fewer than 100 files? | Mission Control upload | Continue |
| Data already in S3/GCS? | Cloud storage integration | Continue |
| MCAP or ROS bag files? | MCAP import | Continue |
| Need automation? | SDK bulk upload or presigned URLs | Mission Control upload |
| Using Python or TypeScript? | SDK bulk upload | Presigned URL (any language) |
Next Steps
Cloud Storage
Detailed S3 and GCS configuration for bring-your-own-storage.
MCAP / ROS
Import multi-sensor recordings with camera, LiDAR, and IMU data.
Python SDK
Install the Python SDK and start uploading data programmatically.
Webhooks
Set up event notifications to trigger downstream pipelines.