Skip to main content
Avala supports five data modalities, each with GPU-accelerated visualization and purpose-built annotation workflows. This page covers what you can see, what you can label, and the supported formats for each type.

Image

Visualization: Images are displayed in the image viewer with pan, zoom, and pixel-level inspection. When part of an MCAP recording, images appear in dedicated Image panels synchronized with other sensor streams. Annotation: Each image is annotated independently as a single frame. All seven 2D annotation tools are available: bounding boxes, polygons, segmentation masks, polylines, keypoints, classification, and object tracking (in video sequences). Supported formats: JPEG, PNG, WebP, BMP Use cases: Object detection, instance segmentation, semantic segmentation, image classification, keypoint detection, pose estimation.

Video

Visualization: Videos are automatically converted to frame sequences on upload, enabling frame-by-frame playback with timeline scrubbing. Navigate forward and backward through frames, jump to specific timestamps, or play back at configurable speeds. Annotation: Annotators work frame-by-frame with object tracking across the timeline. Object IDs persist across frames for consistent identity assignment. All 2D annotation tools are available on each frame. Supported formats: MP4, MOV Use cases: Object tracking, action recognition, temporal event detection, driving scene labeling, behavior analysis.
Video processing happens in the background after upload. Large videos may take several minutes to convert. You can monitor sequence status in Mission Control or via the API.

LiDAR / Point Cloud

Visualization: Point clouds are rendered in a 3D viewer with bird’s-eye view, perspective view, and side views. Six visualization modes let you color points by different properties:
ModeDescription
NeutralSingle uniform color for structural overview
IntensityReturn strength — highlights reflective surfaces
RainbowTemporal or sequential coloring
LabelSemantic class coloring from annotations
PanopticInstance-level coloring for individual objects
Image ProjectionCamera imagery projected onto the point cloud
The viewer uses WebGPU compute shaders for frustum culling and level-of-detail rendering, keeping frame rates high even on dense scans. Annotation: Annotators place 3D cuboids with full position, dimension, and rotation control. Cuboid projections are displayed in synchronized camera views when camera calibration data is available. Supported formats: PCD, PLY Use cases: 3D object detection, autonomous driving perception, robotics navigation, scene reconstruction, HD map creation.

MCAP

Visualization: MCAP recordings are displayed in the multi-sensor viewer with up to eight panel types: Image, 3D / Point Cloud, Plot, Raw Messages, Log, Map, Gauge, and State Transitions. Avala automatically detects message types in the recording and assigns topics to the appropriate panel type. All panels share a synchronized timeline for coordinated playback of camera, LiDAR, radar, IMU, and other sensor streams. The layout composer automatically builds an optimized panel arrangement based on the topics in your recording, or you can customize the layout manually. Navigate frame-by-frame, scrub to any timestamp, or play back the full recording with configurable speed. Annotation: Avala parses MCAP files to extract and synchronize sensor streams. Camera images are displayed alongside projected LiDAR data, enabling multi-camera annotation with 3D context. Annotators place 3D cuboids that project consistently across all camera views. Supported formats: MCAP (with ROS message support) Use cases: Multi-sensor fusion, surround-view perception, autonomous vehicle data labeling, robotics sensor calibration, fleet data review.
MCAP support includes automatic extraction of camera intrinsics and extrinsics for accurate LiDAR-to-camera projection. Both pinhole and double-sphere (fisheye) camera models are supported. See the MCAP / ROS integration guide for setup details.

Splat

Visualization: Gaussian Splat scenes are rendered in a WebGPU-accelerated 3D viewer. Navigate freely through photorealistic 3D scene reconstructions with smooth camera controls. The renderer uses GPU radix sorting, buffer pooling, and pipeline precompilation for real-time performance. Annotation: Annotators navigate the reconstructed environment and place 3D annotations directly in the scene. Classification labels can be applied to the full scene or individual regions. Supported formats: Gaussian Splat Use cases: 3D scene understanding, novel view synthesis annotation, spatial AI training data, environment mapping.

Capabilities Comparison

The following table shows visualization and annotation capabilities for each data type:
CapabilityImageVideoPoint CloudMCAPSplat
Visualization
2D Image ViewerYesYesYes
3D Point Cloud ViewerYesYes
3D Splat ViewerYes
Multi-Panel LayoutYes
Timeline PlaybackYesYesYes
Visualization Modes (6)YesYes
Annotation
Bounding BoxYesYes
PolygonYesYes
3D CuboidYesYesYes
SegmentationYesYes
PolylineYesYes
KeypointsYesYes
ClassificationYesYesYesYesYes
Object TrackingYesYesYes

Upload Requirements

PropertyLimit
Max file size (images)20 MB per file
Max file size (video)2 GB per file
Max file size (point cloud)500 MB per file
Max file size (MCAP)5 GB per file
Supported image formatsJPEG, PNG, WebP, BMP
Supported video formatsMP4, MOV
Supported point cloud formatsPCD, PLY
Supported multi-sensor formatsMCAP
Upload limits may vary depending on your plan. Contact support@avala.ai if you need to upload files that exceed these limits.

Next Steps

Managing Datasets

Upload, organize, and manage your data in Mission Control.

MCAP / ROS Integration

Set up multi-sensor data pipelines with MCAP and ROS.

Core Concepts

Understand viewers, panels, layouts, and other platform concepts.

Architecture

Learn how the visualization engine and backend services work together.