


Without a lifetime of experience to build on like humans have (and totally take for granted), robots that want to learn a new skill often have to start from scratch. Reinforcement learning lets robots learn new skills through trial and error but, especially in the case of end-to-end vision-based control policies, it takes a lot of time: The real world is a weirdly lit, friction-filled, obstacle-y mess that robots can’t understand without a frequently impractical amount of effort.


Roboticists at the University of California at Berkeley have vastly sped up this process by doing the same kind of cheating that humans do—instead of starting from scratch, you start with some previous experience that helps get you going. By leveraging a “foundation model” that was pretrained on robots driving themselves around, the researchers were able to get a small-scale robotic rally car to teach itself to race around indoor and outdoor tracks, matching human performance after just 20 minutes of practice.



That first pretraining stage happens at your leisure, by manually driving a robot (that isn’t necessarily the one that will be doing the task you care about) around different environments. The goal isn’t to teach the robot to drive fast around a course but rather the basics of not running into stuff.


With that pretrained foundation model in place, when you then move over to the little robotic rally car, it no longer has to start from scratch. Instead, you can plop it onto the course you want it to learn, drive it around once slowly to show it where you want it to go, and then let it go fully autonomous, training itself to drive faster and faster. With a low-resolution, front-facing camera and some basic state estimation, the robot attempts to reach the next checkpoint on the course as quickly as possible, leading to some interesting emergent behaviors:

The system learns the concept of a “racing line,” finding a smooth path through the lap and maximizing its speed through tight corners and chicanes. The robot learns to carry its speed into the apex, then brakes sharply to turn and accelerates out of the corner, to minimize the driving duration. With a low-friction surface, the policy learns to oversteer slightly when turning, drifting into the corner to achieve fast rotation without braking during the turn. In outdoor environments, the learned policy is also able to distinguish ground characteristics, preferring smooth, high-traction areas on and around concrete paths over areas with tall grass that impedes the robot’s motion.


The other clever bit here is the reset feature, which is necessary in real-world training. When training in simulation, it’s super easy to reset a robot that fails, but outside of simulation, a failure can (by definition) end the training if the robot gets itself stuck. That’s not a big deal if you want to spend all your time minding the robot while it learns, but if you have something better to do, the robot needs to be able to train autonomously from start to finish. In this case, if the robot hasn’t moved at least 0.5 meters in the previous 3 seconds, it knows that it’s stuck, and it will execute the simple behaviors of turning randomly, backing up, and then trying to drive forward again, which gets it unstuck eventually.


During indoor and outdoor experiments, the robot was able to learn aggressive driving comparable to that of a human expert after just 20 minutes of autonomous practice, which the researchers say “provides strong validation that deep reinforcement learning can indeed be a viable tool for learning real-world policies even from raw images, when combined with appropriate pretraining and implemented in the context of an autonomous training framework.” It’s going to take a lot more work to implement this sort of thing safely on a larger platform, but this little car is taking the first few laps in the right direction just as quickly as it possibly can.


“FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing,” by Kyle Stachowicz, Arjun Bhorkar, Dhruv Shah, Ilya Kostrikov, and Sergey Levine from UC Berkeley, is available on arXiv.

加州大学伯克利分校的Kyle Stachowicz、Arjun Bhorkar、Dhruv Shah、Ilya Kostrikov和Sergey Levine的《FastRLAP:通过深度RL和自主练习学习高速驾驶的系统》可在arXiv上获得。


400 099 1872文章来源地址https://www.toymoban.com/news/detail-568111.html


本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用


  • 文献学习-3-机器人操控辅助的全方位增强现实内腔介入远程手术

    5.1.3 Robotic Manipulator‐Assisted Omnidirectional Augmented Reality for Endoluminal Intervention Telepresence Key Words : omnidirectional augmented reality, robotic intervention, robotic telepresence Authors : Zecai Lin, Zheng Xu, Huanghua Liu, Xuyang Wang, Xiaojie Ai, Cheng Zhou, Bidan Huang, Weidong Chen, Guang-Zhong Yang, and Anzhu Gao Source : Advance

  • 学习记录-自动驾驶与机器人中的SLAM技术

    以下所有内容均为高翔大神所注的《自动驾驶与机器人中的SLAM技术》中的内容 作者实现了一个2D 的ICP,包含了点到线的处理方式 实现了一个似然场法的配准,介绍了相关公式,使用了高斯牛顿法和g2o进行求解,其中g2o中有对核函数的使用 实现了一个 并发的ICP 配准 实现了点

  • ChatGPT +工业机器人/自动驾驶控制器的一些尝试

    ChatGPT 的功能目前已扩展到机器人领域,可以用语言直观控制如机械臂、无人机、家庭辅助机器人等的多个平台。这会改变人机交互的未来形式吗? 你可曾想过用自己的话告诉机器人该做什么,就像对人说话那样? 比如说,只要告诉你的家庭助理机器人「请帮我热一下午餐」

  • 2022 RoboCom 世界机器人开发者大赛-本科组(省赛)-- 第三题 跑团机器人 (已完结)

    其它题目 RC-u3 跑团机器人 在桌面角色扮演游戏(TRPG,俗称“跑团”)中,玩家需要掷出若干个骰子,根据掷出的结果推进游戏进度。在线上同样可以跑团,方法是由玩家们向机器人发出指令,由机器人随机产生每个需要掷出的骰子的结果。 玩家向机器人发出的指令是一个仅

  • 2022 RoboCom 世界机器人开发者大赛-本科组(省赛)

    1、不要浪费金币 哲哲最近在玩一个游戏,击杀怪物能获得金币 —— 这里记击杀第 i 个怪物获得的金币数量为 P i ​ 。 然而这个游戏允许拥有的金币数量是有上限的,当超过时,超过上限的部分就会被系统光明正大地吃掉,哲哲就拿不到了。 为了不浪费金币,哲哲决定,当

  • 2021 RoboCom 世界机器人开发者大赛-本科组(复赛)

    官方题解 分数 20 7-1 冒险者分队 一个莫名其妙的思维 分数 25 7-2 拼题A打卡奖励 01背包的变形,在面临超时的情况下,明智的选择另一种作为限制 分数 25 7-3 快递装箱 大模拟,没拿到满分,就十六分,不想改了,累了 分数 30 7-4 塔防游戏 头一次写二位最短路

  • 2021 RoboCom 世界机器人开发者大赛-本科组(初赛)

    比赛介绍 比赛信息 比赛官网:https://www.robocom.com.cn/ 报名流程:https://www.robocom.com.cn/content.html?cid=386 工信部发文:https://www.robocom.com.cn/content.html?cid=367 中国教育学会清单:https://m.cahe.edu.cn/site/content/14825.html 编程赛道通知:https://www.robocom.com.cn/content.html?cid=369 赛制说明: CAIA数

  • 2021 RoboCom 世界机器人开发者大赛-本科组(决赛)

    1.绿地围栏 思路 模拟题目,主要是记住最后要把原点加入到目标点当中,不然最后一个测试点过不了。 代码 2.队列插入 思路× 不太会,每理解大佬的思路,以后有机会补 代码× 3.账户安全预警 输入样例1 输出样例1 输入样例2 输出样例2 思路 嵌套map,用外层map的键表示邮箱,

  • 2022 RoboCom 世界机器人开发者大赛-高职组(省赛)

    RC-v1 您好呀 分数 5 本届比赛的主题是“智能照护”,那么就请你首先为智能照护机器人写一个最简单的问候程序 —— 无论遇见谁,首先说一句“您好呀~”。 输入格式: 本题没有输入 输出格式: 在一行中输出问候语的汉语拼音  Nin Hao Ya ~ 。 输入样例: 输出样例:  提交

  • 2022 RoboCom 世界机器人开发者大赛-本科组(国赛)

    1、智能红绿灯 为了最大化通行效率同时照顾老年人穿行马路,在某养老社区前,某科技公司设置了一个智能红绿灯。 这个红绿灯是这样设计的: 路的两旁设置了一个按钮,老年人希望通行马路时会按下按钮; 在没有人按按钮的时候,红绿灯一直为绿灯; 当红绿灯为绿灯时










