Skip to content

Data Collection

Collecting high-quality training data is crucial for building effective AI models. This guide covers best practices for data collection in Qualia based on LeRobotDataset v3.0 .

Qualia supports various data collection methods for training vision-language-action (VLA) models:

  • Manual data collection - Record demonstrations manually
  • Automated collection - Use scripts to gather data at scale
  • Import existing datasets - Bring your own data

LeRobotDataset v3.0 will be included in lerobot >= 0.4.0. Until that stable release, you can use the main branch by following the build from source instructions.

LeRobotDataset v3.0 is a standardized format for robot learning data. It provides unified access to multi-modal time-series data, sensorimotor signals and multi‑camera video, as well as rich metadata for indexing, search, and visualization on the Hugging Face Hub.

Run the command below to record a dataset with the SO-101 and push to the Hub:

Terminal window
lerobot-record \
--robot.type=so101_follower \
--robot.port=/dev/tty.usbmodem585A0076841 \
--robot.id=my_awesome_follower_arm \
--robot.cameras="{ front: {type: opencv, index_or_path: 0, width: 1920, height: 1080, fps: 30}}" \
--teleop.type=so101_leader \
--teleop.port=/dev/tty.usbmodem58760431551 \
--teleop.id=my_awesome_leader_arm \
--display_data=true \
--dataset.repo_id=${HF_USER}/record-test \
--dataset.num_episodes=5 \
--dataset.single_task="Grab the black cube"

More information about LeRobotDataset v3.0

Section titled “More information about LeRobotDataset v3.0”

For a better explanation on the characteristic OF LeRobotDataset v3.0 we encourage you to read more about it on their official HUgginFace page.