petoi-camp-logo.png
Chinese
ZH
Chinese
ZH
English
EN
  • 主页

  • 论坛

  • Shop

  • 简介

  • 图鉴

  • 资源

    • Documentation Center
    • GitHub 源代码
    • 教程视频
    • NyBoard V0_1 手册
    • NyBoard V0_2 手册
    • Smartphone App
    • GitHub/CatMini
  • 社区

    • 成员
  • More

    Use tab to navigate through the menu items.
    Rafael Marín
    1d

    Error moving joints simultaneously in Mobile app

    2
    0
    Tomi Ade
    4d

    Trouble Uploading to Nyboard - stk500_recv(): programmer is not responding

    3
    0
    Phillip Brush
    7d

    On Definition Rule Issues

    4
    0
    若要查看作用方式,請前往您的即時網站。
    • 類別
    • 所有文章
    • 我的文章
    Gero
    2021年3月31日
    已編輯: 2021年4月01日

    Reinforcement Learning - OpenCat Gym

    在 Software

    Hi there,


    in the recent years there has been a lot of progress concerning deep reinforcement learning and many publications are available that prove that machine learning can create stable gaits for robots. Mainly interesting and relatively understandable papers are for example https://arxiv.org/abs/1804.10332, where the authors created a walking and also a galloping gait based on training in simulation and a later application on the robot. In a later publication they further advanced and made it possible to learn new gaits via reinforcement directly on the robot without simulation in less than 2 hours https://arxiv.org/abs/1812.11103 There are a lot more examples and different approaches to this.


    Nevertheless, this is very remarkable and this made me thinking, whether we can get there also with Nybble and Bittle.


    But let's slow down a little. What's reinforcement learning in particular? When you look at the graph below you can get a basic understanding how this works: The Agent, in our case Nybble/Bittle is set in an Environment (e.g. flat floor). There it performs Actions like moving the limbs somehow and trying to get a Reward from the Programmer. The Reward is only given when the State is what we actually want, e.g. moving forward. Trapped in this loop our robot will try to maximize the Reward in every iteration becoming better and better in the movement.



    Source: https://en.wikipedia.org/wiki/Reinforcement_learning#/media/File:Reinforcement_learning_diagram.svg



    So what I tried to do now, is to use a simulation environment with a flat ground together with a simulation model of Nybble and wanted to make it move forward. I implemented this in the gym training environment in PyBullet with the reinforcement library Stable-Baselines3 https://stable-baselines3.readthedocs.io/en/master/index.html# . There are a lot of functional learning algorithms one can use for reinforcement learning. So for training in my case I tried an algorithm called SAC (Soft Actor-Critic) that seems to be the current state-of-the-art algorithm for reinforcement learning and applied it on Nybble to see how it performs. And the results is definitely still more a crawling than a walking gait, but it shows the potential.


    This is a result of reinforcement training only without any intervention from my side:

    The next steps are to improve training and the resulting gaits. And once the gaits are good in simulation there are two ways. One is trying to get the learning policy running on Nybble/Bittle or learn it directly on them. I think there I have to use an additional set of hardware to make it run.


    If you want to train a walking gait, you can find the link to my repository below, where I will provide further updates. Make sure to install all the necessary python libraries in the import section of the code.

    https://github.com/ger01d/opencat-gym

    12 則留言
    12 則留言
    Gero
    2021年5月09日

    I've made a first try to run the policy from my last post on Bittle. The so called reality gap which was mentioned in literature is very obvious (difference between simulation and real life). And one might ask, why I use such a short cable. It looks like Bittle is on a chain...

    But nevertheless it's a start.


    For the application I used @Alex Young's OpenCat modification (Post: gleefully stealing the best ideas), so I could easily send via the legpose command the motor positions, which were generated by the neural network controller. My next steps will be to close the reality gap somehow or train Bittle directly on the hardware.



    按讚
    Gero
    2021年5月11日
    回覆

    @Rongzhong Li Thanks for the idea with the mapping. There was indeed a mistake and I corrected it in the code. Unfortunately this did not make it work. I guess because latency isn't part of the simulation the controller doesn't react properly. And the simulation model is very simple and masses and inertia aren't correct (I will try to increase model accuracy).


    The reason why I'm using the USB-serial is, that it is much faster regarding latency. I have to send the motor commands and in the next step I have to read the MPU-sensor-data. Requesting and receiving the sensor data can take up to 60 ms with the bluetooth and 30 ms with the USB-serial connection. The movement will be less fluent and the motors stutter.


    Unfortunately the upload of the embedded video doesn't work at the moment. Stops at 99 %. So I attached the videos for download.


    Bittle_bluetooth_serial
    .MOV
    Download MOV • 26.51MB

    Bittle_USB_serial
    .MOV
    Download MOV • 29.15MB

    按讚

    R
    Rongzhong Li
    2021年5月12日
    回覆