Pi0 & Pi0.5 On Robocasa

Infrastructure

Robocasa Dataset to Lerobot Dataset Converter

HDF5格式的Robocasa数据集需要转换为Lerobot数据集格式以供Pi0和Pi0.5模型使用。

Pi0 & Pi0.5 Inference on Robocasa

在Openpi仓库的基础上,我们加入了src/policies/robocasa_policy.py用于适配robocasa环境,构建了/examples/robocasa的推理环境。

Robocasa结果

针对robocasa数据集,在dp、pi0、pi0.5三个模型上进行微调。

基础微调结果比对

在模型微调实验中,分别设置了30个、100个以及300个demos作为训练样本以探索数据规模对模型性能的影响。各模型所使用的训练轮数设置如下:dp模型训练80k轮,pi0模型在100个demos条件下训练300k轮,在300个demos条件下训练150k轮,pi0.5模型在300个demos条件下训练60k轮。在模型训练过程中,以监测到训练损失稳定为结束训练的依据。

评估过程中,我们发现对于OpenDoubleDoors这个任务,模型的准确率为0%,其主要原因并非模型无法完成任务,而是测试的时间范围不足以使模拟器判定任务完成。因此,将评估过程中的action horizon从默认值调整为1000以扩大任务完成判定时间范围。

为了确保公平性,对100demos和300demos的dp在评估时进行了action horizon的调整,结果显示,dp模型在使用100和300个demos的微调实验中的成功率与之前未调整action horizon的实验相比并未显著提升。

Tasks dp
(30demos)
horizon:600
dp
(100demos)
horizon:1000
dp
(300demos)
horizon:1000
pi0
(30demos)
horizon:600
pi0
(100demos)
horizon:600
pi0
(300demos)
horizon:600
pi05
(300demos)
horizon:1000
Close Double Door 18 42 46 58.42% 88.00% 62.00% 78.18%
Close Drawer 84 96 88 64.36% 68.80% 67.00% 100.00%
Close Single Door 48 68 78 85.15% 97.60% 86.00% 100.00%
Coffee Press Button 38 38 24 70.30% 72.80% 65.00% 76.36%
Coffee Serve Mug 8 14 14 45.54% 66.40% 48.00% 69.09%
Coffee Setup Mug 0 0 2 13.86% 17.60% 20.00% 29.09%
Open Double Door 8 4 6 0.00% 0.80% 0.00% 82.00%
Open Drawer 12 28 32 7.89% 35.20% 12.00% 65.45%
Open Single Door 24 16 44 87.13% 80.80% 77.00% 92.73%
PnP from Cab to Counter 4 8 10 12.87% 10.40% 12.00% 43.64%
PnP from Counter to Cab 6 4 10 20.79% 26.40% 34.00% 45.45%
PnP from Counter to Microwave 0 2 4 15.84% 25.60% 19.00% 29.09%
PnP from Counter to Sink 2 2 8 12.87% 30.40% 37.00% 60.00%
PnP from Counter to Stove 0 0 2 9.90% 12.00% 23.00% 49.09%
PnP from Microwave to Counter 0 2 4 6.93% 12.00% 17.00% 29.09%
PnP from Sink to Counter 2 4 4 19.80% 16.80% 26.00% 49.09%
PnP from Stove to Counter 4 4 8 19.80% 20.00% 31.00% 45.45%
Turn Off Microwave 48 68 34 38.61% 88.80% 71.00% 89.09%
Turn Off Sink Faucet 62 68 72 64.36% 74.40% 81.00% 92.73%
Turn Off Stove 6 12 15 9.90% 20.80% 20.00% 34.55%
Turn On Microwave 30 26 35 62.38% 53.60% 36.00% 58.18%
Turn On Sink Faucet 26 38 52 74.26% 71.20% 69.00% 83.64%
Turn On Stove 10 28 27 45.54% 46.40% 41.00% 76.36%
Turn Sink Spout 28 40 46 82.18% 90.40% 93.00% 96.36%
RoboCasa Average 19.5 25.5 27.70833333 38.695 46.9666667 43.625 65.61

数据集调整训练

方法

基于以上微调实验结果,针对模型性能不足的问题采用了数据集调整的方法以进行优化训练。在调整数据集的过程中,根据RoboCasa数据集中的 原子任务(Atomic Tasks) 特性,将任务动作归类为七大类混合任务,包括:Pick and Place TasksDoor TasksDrawer TasksTurning Lever TasksTwisting Knob TasksInsert Tasks以及Pressing Button Tasks

在构造上述混合任务的数据集时,采用加权的方式。例如在pi05模型的300demos训练中,Pressing Button Tasks的任务数据集来源于:Coffee Press Button、Turn On Microwave、Turn Off Microwave各300demos,以及其他数据集各50demos,以尽量避免训练过程中动作遗忘的问题。

调整后的混合任务数据集采用同样的norm_state.json,以保证正确归一化。

在训练时,依次对每个混合任务训10k epochs,旨在期望能够强化模型学习到的任务相关的动作(例如插入、扭动杠杆等),随后对全任务集合进行整合训练。

结果

针对300demos的数据集,pi05在基础训练和加权训练下的结果。

Tasks base weighted
Close Double Door 78.18% 75
Close Drawer 100.00% 100
Close Single Door 100.00% 92
Coffee Press Button 76.36% 80
Coffee Serve Mug 69.09% 78
Coffee Setup Mug 29.09% 34
Open Double Door 82.00% 82
Open Drawer 65.45% 66
Open Single Door 92.73% 92
PnP from Cab to Counter 43.64% 48
PnP from Counter to Cab 45.45% 68
PnP from Counter to Microwave 29.09% 25
PnP from Counter to Sink 60.00% 72
PnP from Counter to Stove 49.09% 60
PnP from Microwave to Counter 29.09% 22
PnP from Sink to Counter 49.09% 44
PnP from Stove to Counter 45.45% 66
Turn Off Microwave 89.09% 98
Turn Off Sink Faucet 92.73% 84
Turn Off Stove 34.55% 46
Turn On Microwave 58.18% 80
Turn On Sink Faucet 83.64% 88
Turn On Stove 76.36% 84
Turn Sink Spout 96.36% 92
RoboCasa Average 65.61% 69.83333333