按讚

Installing OpenCat on non-NyBoards (Like Arduino Nano)

按讚

3月02日

New bittle x arrived with v0 board

按讚

若要測試此功能，請造訪即時網站。

Gero

2021年3月31日

已編輯： 2021年4月01日

Reinforcement Learning - OpenCat Gym

在 Software

Hi there,

in the recent years there has been a lot of progress concerning deep reinforcement learning and many publications are available that prove that machine learning can create stable gaits for robots. Mainly interesting and relatively understandable papers are for example https://arxiv.org/abs/1804.10332, where the authors created a walking and also a galloping gait based on training in simulation and a later application on the robot. In a later publication they further advanced and made it possible to learn new gaits via reinforcement directly on the robot without simulation in less than 2 hours https://arxiv.org/abs/1812.11103 There are a lot more examples and different approaches to this.

Nevertheless, this is very remarkable and this made me thinking, whether we can get there also with Nybble and Bittle.

But let's slow down a little. What's reinforcement learning in particular? When you look at the graph below you can get a basic understanding how this works: The Agent, in our case Nybble/Bittle is set in an Environment (e.g. flat floor). There it performs Actions like moving the limbs somehow and trying to get a Reward from the Programmer. The Reward is only given when the State is what we actually want, e.g. moving forward. Trapped in this loop our robot will try to maximize the Reward in every iteration becoming better and better in the movement.

Source: https://en.wikipedia.org/wiki/Reinforcement_learning#/media/File:Reinforcement_learning_diagram.svg

So what I tried to do now, is to use a simulation environment with a flat ground together with a simulation model of Nybble and wanted to make it move forward. I implemented this in the gym training environment in PyBullet with the reinforcement library Stable-Baselines3 https://stable-baselines3.readthedocs.io/en/master/index.html# . There are a lot of functional learning algorithms one can use for reinforcement learning. So for training in my case I tried an algorithm called SAC (Soft Actor-Critic) that seems to be the current state-of-the-art algorithm for reinforcement learning and applied it on Nybble to see how it performs. And the results is definitely still more a crawling than a walking gait, but it shows the potential.

This is a result of reinforcement training only without any intervention from my side:

The next steps are to improve training and the resulting gaits. And once the gaits are good in simulation there are two ways. One is trying to get the learning policy running on Nybble/Bittle or learn it directly on them. I think there I have to use an additional set of hardware to make it run.

If you want to train a walking gait, you can find the link to my repository below, where I will provide further updates. Make sure to install all the necessary python libraries in the import section of the code.

https://github.com/ger01d/opencat-gym

35 則留言

按讚

35 則留言

Amit Banerjee

1月06日

I wrote a detailed article mentioning some more details and workarounds for some of the issues I faced replicating this. https://www.learnwitharobot.com/p/training-a-reinforcement-learning

按讚

Amit Banerjee

1月07日

回覆

Thanks, but I think my Bittle is installed the same way. The video likely looks a bit different because the limbs didn't move properly as they were controlled by RL model.

按讚

Rongzhong Li

1月08日

回覆

The bulging curve on the shank was designed to resemble the muscle. So in terms of bionics, the muscle should be on the side where its shrinking will help support the major weight. On Bittle, it should be on the bottom side to look more natural.

按讚

Amit Banerjee

1月08日

回覆

Thanks! Now I get what you are saying. I didn't even realize that the two sides of the legs looked differently.

按讚

Amit Banerjee

1月05日

Thanks for providing the code and references.

I was able to train a model and try it out on Bittle. However my trained model doesn't deliver the same smooth gait compared to the pre-existing one in the repository ( trained_agent_PPO.zip) I trained using the exact same code as in the Jupyter notebook at https://github.com/ger01d/opencat-gym/blob/main/opencat-gym.ipynb. I also trained with 5 million steps instead of 2 million in the repo. The result is much poorer tha what trained_agent_PPO gives.

I am wondering if trained_agent_PPO was created using some different settings such as a different reward function? Btw, greta job with open sourcing the code. It provides a great example to learn RL.

按讚

Gero

1月20日

回覆

Hi Amit Banarjee,

thank you for sharing your experience with my code and trained an agent. Since reinforcement learning is a stochastic process you will not always get the same result for each training. And more steps do not necessarily yield a better walking gait. Stable Baselines provides a good resource to get a better understanding: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html

I highly encourage you do also active tensorboard-logs, to check if there is a performance degradation after too many steps, because of overfitting. And I really appreciate it, if you share your experience with adjustments in the reward function or hyperparameters.

Gero

按讚

MW Mercury

2024年5月14日

Hello @Gero

This is awesome!

I'm working on implementing reinforcement learning for Bittle and need some advice.

How does your model make the robot move forward? As I understand, you use IMU acceleration data along with yaw, pitch, roll, and current angles as inputs for the neural network. However this data alone doesn't indicate if the robot is moving forward, as velocity is needed for calculating reward, not just acceleration.

Am I missing something?

Thank you in advance!

按讚

Gero

2024年5月15日

回覆

Hi @MW Mercury

I answered you on GitHub https://github.com/ger01d/opencat-gym/issues/2

Short answer: The x-position of the can be read out in simulation. The difference of the x-position in the previous and current steps provides the information, how far the robot moved forward.

Gero

已編輯

按讚

Jiarui Wang

2024年4月03日

Hi Gero,

This is a brilliant work! I'm also trying to deploy my reinforcement learning algorithm on bittle but I have some trouble build the serial communication with the robot. It would be super helpful if you can share you code about the serial communication.

Best regards

按讚

Jiarui Wang

2024年4月03日

回覆

Will do! Thank you for your help!

按讚

Gero

2024年5月15日

回覆

Hi Jiarui,

pardon me for my delayed answering. I missed the notification somehow. I can upload the application part of the project. This might help to get an understanding of the communication.

Here you can find the sim2real transfer of the neural network gait controller: https://github.com/ger01d/opencat-gym-sim2real

Since I switched of performance reasons to BiBoard, this won't work with NyBoard or needs adaptions.

Gero

已編輯

按讚

Jiarui Wang

2024年5月19日

回覆

Thank you so much for your response, this is super helpful !!

按讚

Gero

2024年1月30日

I finally found some time to push further on this project. I updated the simulation model to better fit the actual mechanics of Bittle and trained a walking gait controller via Reinforcement Learning. It has been very beneficial to use the BiBoard, thanks for that RZ, because it significantly reduced the latencies in the communication of the serial port and the execution of the commands.

First you can see a video of the walking gait of the simulation model. It is important to say, that Bittle is fully controlled by a neural network, with two hidden layers of 256 nodes each. The training was performed with the PPO algorithm.

Next this controller was applied to Bittle equipped with the BiBoard. The data is transmitted via USB and the serial port from a notebook. Bittle sends the current motor angles and the gyro data and this data is processed by the neural network to generate the next motor angles - let's say the next step.

In the next days, I'm planning to update my repository on Github and will share the link.

按讚

Jason

2024年1月31日

回覆

Amazing! Nice job!🏅

按讚

Rongzhong Li

2024年2月01日

回覆

The gait looks weird, but quite efficient in terms of forwarding speed at a low leg frequency!

按讚

Gero

2024年2月01日

回覆

Yes that's true! Training a proper gait is a quite empirical process. You can only adjust training behavior by balancing reward and penalty. You need to check how the this effects the outcome and adjust accordingly. And believe me, this gait is one of the less weird ones.

But playing around with the reward/penalty gives you the possibility e.g. to reduce unwanted behavior like slipping on the ground, shaking of the joints and the base body stability. And that makes experimenting interesting. I'm certain that this gait can be further improved.

按讚

Gero

2023年4月26日

Hi there,

I was able to continue working on this project a little and created a walking gait controller via reinforcement learning, that I was able to successfully run on Bittle. The controller uses the gyro acceleration, the gyro angles and his leg angles for making steps forward. In this setup, Bittle receives the leg angles from the trained neural network, which uses the gyro information that are sent from Bittle. I think that this latency limits the walking gait smoothness and speed somehow.

In the next days I will upload the new code on my repository: https://github.com/ger01d/opencat-gym

Here you can see the simulation model, that is controlled by the trained neural network:

This is the neural network running on an office notebook and controlling Bittle via the serial port;

(I know, it feels kind of painful, how Bittle hits the ground with his elbow ...)

按讚

Gero

2023年4月28日

回覆

At the moment I'm still using a modified version of OpenCat from @Alex Young. Data transfers take up to 8 ms. At the moment one step (reading sensor data and moving all legs at once) looks like this:

1. Requesting Gyro data (~4 ms)

2. Receiving Gyro data (~4 ms)

3. Calculation of next leg positions (< 1 ms)

4. Sending new leg angles to Bittle (~4 ms).

I don't know ,if the 'K' token provides a significant lower latency? It would be also helpful to read the gyro data in parallel to get rid of the 8 ms delay each cycle.

按讚

Rongzhong Li

2023年4月29日

回覆

Hi, there's a token 'V' to make the robot continuously print the gyro data without receiving a request. Currently, the printing code in imu.h is commented out. You will need to disable the infrared code to save some memory space. You can disable it by commenting out #define IR_PIN at the beginning of OpenCat.h.

It's possible to reduce the delay by utilizing multi-threading. So sending the leg angles and calculating the next leg position can happen at the same time. An example is this imitation demo, where the heavy image capturing and mapping algorithms happen on one thread, and the motion commands happen on another thread.

Since you mainly focus on motion simulation, I recommend our BiBoard, an ESP32-based board with a much faster CPU and larger programming space. The code parallels the NyBoard version and shares the same serial communication protocol.

I look forward to seeing your new progress and hope it can benefit the user community. So I'd like to gift you a BiBoard. Please write to support@petoi.com to provide your shipping information.

按讚

Gero

2023年4月30日

回覆

Hi and thank you for the advice and the example. I'm going to take a closer look at the implementation.

The current bottleneck is the nature of the serial port, that sending and receiving cannot be done simultaneously - so this transfer just needs to be sufficiently fast. The primary goal is to reduce latency to that point, that the time for data transmission and execution of the movement is much smaller than the time for Bittle's physical movement due to gravity.

At first I think I'll give it a shot with your newest version of OpenCat and look how far I get with the NyBoard and if latency is better than in my modified OpenCat version.

I also had the ESP-based BiBoard in mind to further speed up the processing and minimizing latency. So thank you very much for the offer. I'll come back to you on this via mail.

按讚

Gero

2021年5月09日

I've made a first try to run the policy from my last post on Bittle. The so called reality gap which was mentioned in literature is very obvious (difference between simulation and real life). And one might ask, why I use such a short cable. It looks like Bittle is on a chain...

But nevertheless it's a start.

For the application I used @Alex Young's OpenCat modification (Post: gleefully stealing the best ideas), so I could easily send via the legpose command the motor positions, which were generated by the neural network controller. My next steps will be to close the reality gap somehow or train Bittle directly on the hardware.

按讚

Gero

2021年5月11日

回覆

@Rongzhong Li Thanks for the idea with the mapping. There was indeed a mistake and I corrected it in the code. Unfortunately this did not make it work. I guess because latency isn't part of the simulation the controller doesn't react properly. And the simulation model is very simple and masses and inertia aren't correct (I will try to increase model accuracy).

The reason why I'm using the USB-serial is, that it is much faster regarding latency. I have to send the motor commands and in the next step I have to read the MPU-sensor-data. Requesting and receiving the sensor data can take up to 60 ms with the bluetooth and 30 ms with the USB-serial connection. The movement will be less fluent and the motors stutter.

Unfortunately the upload of the embedded video doesn't work at the moment. Stops at 99 %. So I attached the videos for download.

按讚

Rongzhong Li

2021年5月12日

回覆

@Gero I feel there's still something wrong with the mapping. Note the knee joints move with the tips of shoulder joints. So you need to subtract the movement of its reference coordinate. The coordinates exported by the simulation may be different from your previous script.

And you can disable the gyro in the loop by comment out #define GYRO.

按讚

Gero

2021年5月12日

回覆

@Rongzhong Li For this I'm not using Inverse Kinematics anymore. The motor angles are part of the output of the neural network. I checked the mapping one more time, but it's correct.

In each iteration the input of the neural network is the current robot state:

quaternion angular velocities motor angles motor angle history

[x y z w] [dt_roll dt_yaw] [0 1 2 3 4 5 6 7] 29 * [0 1 2 3 4 5 6 7]

The output are the next motor angles:

[0 1 2 3 4 5 6 7]

The network between this input and the output was generated via training in the simulation environment, the weighted network. When I read the MPU the network generates the motor angles on the fly, dependent on the sensor input.

So without the input there won't be any useful output, so I cannot comment out the gyro. I hope this explains the workflow a little better.

按讚

Rongzhong Li

2021年5月04日

Is the result sensitive to friction?

按讚

Gero

2021年5月04日

回覆

I tried three friction coefficients from low 0.3, medium = 1.0 and high = 1.4 and simulated the result. The model was trained with the high coefficient and I didn't vary it during training.

friction low:

friction medium:

friction high:

按讚

Gero

2021年5月03日

I've made a little update regarding the reward function. Now the pitch angle will be considered and high angles will be penalized. The reason for me doing this was, because the Bittle leg configuration tends to walk on the back legs and lift the front legs frequently during training. The resulting gait reminds me of a walking pug somehow. Unfortunately there was a little drift towards the left, which could be minimized with a slightly longer training session.

I also tried to hold and push it with the cursor during walking (you will notice 2 holds and 1 push). This shows me, that the controller seems to be robust, so it can react to distortions.

按讚

Gero

2021年4月24日

I'm also trying to make the Bittle-Version (leg configuration < <) learning a walking gait. It seems to be a little more difficult. It ended up in something between running and bouncing:

按讚

Rongzhong Li

2021年4月24日

回覆

Have you tested this one on the real Bittle?

按讚

Gero

2021年4月24日

回覆

@Rongzhong Li I'm not that far, yet. I have to get this neural network controller running on Bittle either remotely or perhaps with a raspberry. I assume that latency might be a problem - let's see. And I haven't trained it yet with any tolerances, therefore chances are high that this will simply fail and stumble over (I'm pretty sure this will happen).

In these simulations I first tried to get a feeling on what I could expect under let's say ideal conditions. And also to get a feeling on what inputs are necessary to create a walking gait.

按讚

Gero

2021年4月20日

I've made some changes in the code to take the last joint positions into account for the neural network, a method that was presented here: https://robotics.sciencemag.org/content/4/26/eaau5872/tab-pdf. So now the observation space consists of the body angles and angular velocities and on a joint history of the last 20 positions. I sampled the joint positions every 2nd simulation step, which definitely improved learning rate and made it possible to learn a walking gait. I also had to limit the joints individually to prevent some acrobatic movements and also to prevent from falling over to often.

I'm still using SAC and for the result below I used 1000 episodes and 500000 iteration steps. But I noticed that already after 400-500 episodes the results saturated and there were no more significant improvements.

And I further interacted with the model with the cursor to hold or push it, to see how the model reacts on distortions:

按讚

Backing Up BiBoard V1 with esptool.py

Installing OpenCat on non-NyBoards (Like Arduino Nano)

New bittle x arrived with v0 board

Reinforcement Learning - OpenCat Gym