HDF5格式的Robocasa数据集需要转换为Lerobot数据集格式以供Pi0和Pi0.5模型使用。
在Openpi仓库的基础上,我们加入了src/policies/robocasa_policy.py用于适配robocasa环境,构建了/examples/robocasa的推理环境。
针对robocasa数据集,在dp、pi0、pi0.5三个模型上进行微调。
在模型微调实验中,分别设置了30个、100个以及300个demos作为训练样本以探索数据规模对模型性能的影响。各模型所使用的训练轮数设置如下:dp模型训练80k轮,pi0模型在100个demos条件下训练300k轮,在300个demos条件下训练150k轮,pi0.5模型在300个demos条件下训练60k轮。在模型训练过程中,以监测到训练损失稳定为结束训练的依据。
评估过程中,我们发现对于OpenDoubleDoors这个任务,模型的准确率为0%,其主要原因并非模型无法完成任务,而是测试的时间范围不足以使模拟器判定任务完成。因此,将评估过程中的action horizon从默认值调整为1000以扩大任务完成判定时间范围。
为了确保公平性,对100demos和300demos的dp在评估时进行了action horizon的调整,结果显示,dp模型在使用100和300个demos的微调实验中的成功率与之前未调整action horizon的实验相比并未显著提升。
| Tasks | dp (30demos) horizon:600 |
dp (100demos) horizon:1000 |
dp (300demos) horizon:1000 |
pi0 (30demos) horizon:600 |
pi0 (100demos) horizon:600 |
pi0 (300demos) horizon:600 |
pi05 (300demos) horizon:1000 |
|---|---|---|---|---|---|---|---|
| Close Double Door | 18 | 42 | 46 | 58.42% | 88.00% | 62.00% | 78.18% |
| Close Drawer | 84 | 96 | 88 | 64.36% | 68.80% | 67.00% | 100.00% |
| Close Single Door | 48 | 68 | 78 | 85.15% | 97.60% | 86.00% | 100.00% |
| Coffee Press Button | 38 | 38 | 24 | 70.30% | 72.80% | 65.00% | 76.36% |
| Coffee Serve Mug | 8 | 14 | 14 | 45.54% | 66.40% | 48.00% | 69.09% |
| Coffee Setup Mug | 0 | 0 | 2 | 13.86% | 17.60% | 20.00% | 29.09% |
| Open Double Door | 8 | 4 | 6 | 0.00% | 0.80% | 0.00% | 82.00% |
| Open Drawer | 12 | 28 | 32 | 7.89% | 35.20% | 12.00% | 65.45% |
| Open Single Door | 24 | 16 | 44 | 87.13% | 80.80% | 77.00% | 92.73% |
| PnP from Cab to Counter | 4 | 8 | 10 | 12.87% | 10.40% | 12.00% | 43.64% |
| PnP from Counter to Cab | 6 | 4 | 10 | 20.79% | 26.40% | 34.00% | 45.45% |
| PnP from Counter to Microwave | 0 | 2 | 4 | 15.84% | 25.60% | 19.00% | 29.09% |
| PnP from Counter to Sink | 2 | 2 | 8 | 12.87% | 30.40% | 37.00% | 60.00% |
| PnP from Counter to Stove | 0 | 0 | 2 | 9.90% | 12.00% | 23.00% | 49.09% |
| PnP from Microwave to Counter | 0 | 2 | 4 | 6.93% | 12.00% | 17.00% | 29.09% |
| PnP from Sink to Counter | 2 | 4 | 4 | 19.80% | 16.80% | 26.00% | 49.09% |
| PnP from Stove to Counter | 4 | 4 | 8 | 19.80% | 20.00% | 31.00% | 45.45% |
| Turn Off Microwave | 48 | 68 | 34 | 38.61% | 88.80% | 71.00% | 89.09% |
| Turn Off Sink Faucet | 62 | 68 | 72 | 64.36% | 74.40% | 81.00% | 92.73% |
| Turn Off Stove | 6 | 12 | 15 | 9.90% | 20.80% | 20.00% | 34.55% |
| Turn On Microwave | 30 | 26 | 35 | 62.38% | 53.60% | 36.00% | 58.18% |
| Turn On Sink Faucet | 26 | 38 | 52 | 74.26% | 71.20% | 69.00% | 83.64% |
| Turn On Stove | 10 | 28 | 27 | 45.54% | 46.40% | 41.00% | 76.36% |
| Turn Sink Spout | 28 | 40 | 46 | 82.18% | 90.40% | 93.00% | 96.36% |
| RoboCasa Average | 19.5 | 25.5 | 27.70833333 | 38.695 | 46.9666667 | 43.625 | 65.61 |
基于以上微调实验结果,针对模型性能不足的问题采用了数据集调整的方法以进行优化训练。在调整数据集的过程中,根据RoboCasa数据集中的 原子任务(Atomic Tasks) 特性,将任务动作归类为七大类混合任务,包括:Pick and Place Tasks、Door Tasks、Drawer Tasks、Turning Lever Tasks、Twisting Knob Tasks、Insert Tasks以及Pressing Button Tasks。
在构造上述混合任务的数据集时,采用加权的方式。例如在pi05模型的300demos训练中,Pressing Button Tasks的任务数据集来源于:Coffee Press Button、Turn On Microwave、Turn Off Microwave各300demos,以及其他数据集各50demos,以尽量避免训练过程中动作遗忘的问题。
调整后的混合任务数据集采用同样的norm_state.json,以保证正确归一化。
在训练时,依次对每个混合任务训10k epochs,旨在期望能够强化模型学习到的任务相关的动作(例如插入、扭动杠杆等),随后对全任务集合进行整合训练。
针对300demos的数据集,pi05在基础训练和加权训练下的结果。
| Tasks | base | weighted |
|---|---|---|
| Close Double Door | 78.18% | 75 |
| Close Drawer | 100.00% | 100 |
| Close Single Door | 100.00% | 92 |
| Coffee Press Button | 76.36% | 80 |
| Coffee Serve Mug | 69.09% | 78 |
| Coffee Setup Mug | 29.09% | 34 |
| Open Double Door | 82.00% | 82 |
| Open Drawer | 65.45% | 66 |
| Open Single Door | 92.73% | 92 |
| PnP from Cab to Counter | 43.64% | 48 |
| PnP from Counter to Cab | 45.45% | 68 |
| PnP from Counter to Microwave | 29.09% | 25 |
| PnP from Counter to Sink | 60.00% | 72 |
| PnP from Counter to Stove | 49.09% | 60 |
| PnP from Microwave to Counter | 29.09% | 22 |
| PnP from Sink to Counter | 49.09% | 44 |
| PnP from Stove to Counter | 45.45% | 66 |
| Turn Off Microwave | 89.09% | 98 |
| Turn Off Sink Faucet | 92.73% | 84 |
| Turn Off Stove | 34.55% | 46 |
| Turn On Microwave | 58.18% | 80 |
| Turn On Sink Faucet | 83.64% | 88 |
| Turn On Stove | 76.36% | 84 |
| Turn Sink Spout | 96.36% | 92 |
| RoboCasa Average | 65.61% | 69.83333333 |