LeRobot SO-ARM100 Train Guide
Data Collection and Local Training with Hugging Face and LeRobot
1.Registration and Login
If you haven't registered a HUGGINGFACE_TOKEN, you need to visit and register/login at:
Hugging Face official website: https://huggingface.co/login?next=%2Fsettings%2Ftokens

After successful registration and login:
Create a new dataset token

Check all available options

Click to create token

Save your Token securely for future use

2.Data Collection
Start collecting data:
For example, if my token is: ${HUGGINGFACE_TOKEN}
Replace ${HUGGINGFACE_TOKEN} below with your own token:
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential
huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credential
Example: huggingface-cli login --token ${HUGGINGFACE_TOKEN} --add-to-git-credentialPossible error: FileNotFoundError: [Errno 2] No such file or directory: 'git’

Solution:
sudo apt install git
After installation, re-execute the previous huggingface-cli command. Successful execution with red warnings is normal.

May get stuck at login page as shown below:

This is due to network issues (Solution: Use VPN)
Store your Hugging Face repository name in a variable to run these commands:
HF_USER=$(huggingface-cli whoami | head -n 1)
echo $HF_USER
3. Data Collection Process
Preparation phase begins with prompt tone. First recording starts after tone, right click "→" to end recording. Rest phase begins (images will freeze), restore the object scene and press "→" to skip rest. Data compression and saving will begin automatically. After saving completes, second recording starts automatically.
Record 2 episodes and upload your dataset to the hub:
Possible error:

Solution:
When using miniconda, if you don't have ffmpeg in your environment:
conda install ffmpegCheck software versions in your conda environment:
ffmpeg -codecs | grep 264
Locate the file and modify as shown:

Re-running data recording may encounter error:
FileExistsError: [Errno 17] File exists: '/home/skl/.cache/huggingface/lerobot/pdd46465/so100_test'Solution: Locate and delete the file

After deletion, re-run the process.

As shown above, the dataset has been successfully uploaded. To add more data to an existing dataset, use the following command:
To avoid uploading to huggingface hub, change --control.push_to_hub=true to --control.push_to_hub=false
Begin local dataset training with the following command:
The following images indicate training has started:


If encountering the following error (CUDA out of memory), it indicates insufficient GPU memory:

Solution: Modify the train.py configuration file at lerobot_main/lerobot/config/, around line 54 as shown:

num_workers: Affects data preloading parallelism
batch_size: Number of samples per training iteration (preferably even numbers like 2, 4, 16, 32 etc.)
Reducing these parameter values can resolve insufficient memory issues.
