0

分享

[学术论文] Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manip

149 0
发表于 2025-4-2 14:11:23 | 显示全部楼层 阅读模式


Learning visuomotor policy for multi-task robotic manipulation has been a long-standing challenge for the robotics community. The difficulty lies in the diversity of action space: typically, a goal can be accomplished in multiple ways, resulting in a multimodal action distribution for a single task. The complexity of action distribution escalates as the number of tasks increases.

In this work, we propose \textbf{Discrete Policy}, a robot learning method for training universal agents capable of multi-task manipulation skills. Discrete Policy employs vector quantization to map action sequences into a discrete latent space, facilitating the learning of task-specific codes. These codes are then reconstructed into the action space conditioned on observations and language instruction. We evaluate our method on both simulation and multiple real-world embodiments, including both single-arm and bimanual robot settings.

We demonstrate that our proposed Discrete Policy outperforms a well-established Diffusion Policy baseline and many state-of-the-art approaches, including ACT, Octo, and OpenVLA. For example, in a real-world multi-task training setting with five tasks, Discrete Policy achieves an average success rate that is 26\% higher than Diffusion Policy and 15\% higher than OpenVLA.

As the number of tasks increases to 12, the performance gap between Discrete Policy and Diffusion Policy widens to 32.5\%, further showcasing the advantages of our approach. Our work empirically demonstrates that learning multi-task policies within the latent space is a vital step toward achieving general-purpose agents.

arxiv : https://arxiv.org/abs/2409.18707


本帖子中包含更多资源

您需要 登录 才可以下载或查看,没有账号?立即注册

×

回复

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

加入群聊

Copyright © 2021-2026 Discuz! X 版权所有 All Rights Reserved.

相关侵权、举报、投诉及建议等,请发 E-mail:admin@discuz.vip

Powered by Discuz! X5.0

在本版发帖返回顶部
快速回复 返回顶部 返回列表