What is Avala?
A robot did something wrong yesterday — and the labels that would explain it don't exist yet. This is the data problem Avala was built to close.
Here is a situation every robotics team knows. A robot did something wrong yesterday. The evidence arrives as a folder: video from three cameras, a LiDAR sweep, joint states in a custom schema, a calibration file, GPS — none of it time-aligned, none of it labeled. Someone spends the afternoon writing a script to line the streams up. By the time the plots render, it's tomorrow.
Most tools stop there. They help you look at the folder. But looking at raw data is the easy ten percent. The hard ninety percent is turning it into something a model can learn from: consistent, verified, four-dimensional ground truth. That's the part that decides whether your next training run actually improves the robot — and it's the part that doesn't exist until someone creates it.
Avala is the data engine that creates it. It is Physical AI Infrastructure-as-a-Service: a single platform that ingests raw fleet data, reconstructs it in 4D, auto-labels it with models trained on your own data, has human experts verify the hard cases, versions the result, and feeds production-ready datasets straight into training — then watches what the deployed model gets wrong and routes those failures back to the start. One loop, one API.
Why "ground truth" is the whole game
A vision-language-action model is only as good as the labels it trains on. Internet text was free and roughly uniform; physical-interaction data is neither. It has to be collected one interaction at a time, fused across sensors that each run at different rates, and labeled in three spatial dimensions across time. Get the labels wrong — a cuboid that drifts a few centimeters across frames, a segmentation mask that flickers — and the model learns the error.
Avala treats ground truth as deterministic infrastructure, not a best-effort afterthought. Annotations are made against a unified 4D reconstruction of the scene, so a label is consistent across every camera and every timestamp at once. The result is data you can actually trust to train a safety-critical system.
What you'll build in this course
This course follows one dataset through the entire loop. You'll start with a raw multi-sensor recording and finish with a verified, training-ready dataset and a clear picture of how the loop closes after deployment. The stages:
- Ingest & visualize — turn an MCAP/ROS recording into a synchronized 4D scene in the browser.
- Annotate — auto-label at scale, then verify the hard cases with human experts. This is the chapter no visualization tool can write.
- Curate — organize, query, slice, and trace your data with full lineage.
- Train — move a verified dataset into your training pipeline without fighting your tooling.
- Deploy — monitor the fleet, catch failures, and route them back into the loop.
You don't need to memorize the vocabulary yet — datasets, projects, tasks, slices, consensus. We'll introduce each concept where it earns its place, and link to the reference docs when you want to go deeper.
Try it yourself as you go
Every lesson ends with something you can actually run. If you want to follow along in code, you only need two things:
pip install avala
export AVALA_API_KEY="avk_your_api_key"
from avala import Client
client = Client()
for ds in client.datasets.list():
print(ds.name)
That's the entire on-ramp. From here, the rest of the course is about everything that happens between a raw recording and a model you can trust.
Next: The Ground-Truth Loop →