Data Collection

Overview

Collecting high-quality training data is crucial for building effective AI models. This guide covers best practices for data collection in Qualia based on LeRobotDataset v3.0 .

Qualia supports various data collection methods for training vision-language-action (VLA) models:

Manual data collection - Record demonstrations manually
Automated collection - Use scripts to gather data at scale
Import existing datasets - Bring your own data

Installation

LeRobotDataset v3.0 will be included in lerobot >= 0.4.0. Until that stable release, you can use the main branch by following the build from source instructions.

LeRobotDataset v3.0 is a standardized format for robot learning data. It provides unified access to multi-modal time-series data, sensorimotor signals and multi‑camera video, as well as rich metadata for indexing, search, and visualization on the Hugging Face Hub.

Record a dataset

Run the command below to record a dataset with the SO-101 and push to the Hub:

lerobot-record \
  --robot.type=so101_follower \
  --robot.port=/dev/tty.usbmodem585A0076841 \
  --robot.id=my_awesome_follower_arm \
  --robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \
  --teleop.type=so101_leader \
  --teleop.port=/dev/tty.usbmodem58760431551 \
  --teleop.id=my_awesome_leader_arm \
  --display_data=true \
  --dataset.repo_id=${HF_USER}/record-test \
  --dataset.num_episodes=5 \
  --dataset.single_task="Grab the black cube"

More information about LeRobotDataset v3.0

For a better explanation on the characteristic OF LeRobotDataset v3.0 we encourage you to read more about it on their official HUgginFace page.