Timothy McCarthy | Forum Comments

Forum Comments

SOLVED: In Search of Reliable Bidirectional Bluetooth Serial Port Profile (SPP) Communication

In Clinic

Timothy McCarthy

May 13, 2024

I get you're having difficulty making a reliable connection. I want to understand how Bluetooth is supposed to work and how to test that is what it does. I'd then want to know why your connection is different. I don't think people are modifying the code to get this to work.

SOLVED: In Search of Reliable Bidirectional Bluetooth Serial Port Profile (SPP) Communication

In Clinic

Timothy McCarthy

May 13, 2024

"BTconnected is only true within the BiBoard boot session that successful pairing occurs between the laptop and the BiBoard. In all other boot sessions, BTconnected is false so it cannot be used as is written in line 117 of io.h." What makes you say this? Do you have evidence that BTconnected becomes true before a connection is made? The log you give doesn't show that. I assume you've modified the code to output BTconnected at discrete points. The log doesn't show a successful pairing during the boot session; it shows a successful initialization of the Bluetooth object.,i.e., // io.h void blueSspSetup() { PTH("SSP: ", strcat(readLongByBytes(EEPROM_BLE_NAME), "_SSP")); SerialBT.enableSSP(); SerialBT.onConfirmRequest(BTConfirmRequestCallback); SerialBT.onAuthComplete(BTAuthCompleteCallback); SerialBT.begin(strcat(readLongByBytes(EEPROM_BLE_NAME), "_SSP")); //Bluetooth device name Serial.println("The SSP device is started, now you can pair it with Bluetooth!"); } That's not the same as the message that should appear when a successful pairing occurs, i.e., // io.h void BTAuthCompleteCallback(boolean success) { confirmRequestPending = false; if (success) { BTconnected = true; Serial.println("SSP Pairing success!!"); } else { BTconnected = false; Serial.println("SSP Pairing failed, rejected by user!!"); } } "In all other boot sessions, BTconnected is false so it cannot be used as is written in line 117 of io.h." I'm not sure what you mean by "other boot sessions"; there's only one, to my knowledge. The bot only boots once per session. I don't have a working Bluetooth setup, (my phone never sees the BittleX or any other device) so I can't run the test, but the test should be very easy to perform if you have a working Bluetooth setup. Open the Arduino serial monitor and then connect the USB cable to the bot. You should see the boot message appear in the Serial Monitor; specifically, you should see the "The SSP device is started, now you can pair it with Bluetooth!". For completeness, turn the bot battery on. Now, go to your mobile device and connect via Bluetooth to the bot. When you have successfully connected, in the Serial Monitor you should see the message "SSP Pairing success!!" At that point, you know the value of BTconnected is true. I don't know how you disconnect Bluetooth from the box, but if you do, I expect some message in the Serial Monitor to indicate that.

SOLVED: In Search of Reliable Bidirectional Bluetooth Serial Port Profile (SPP) Communication

In Clinic

Timothy McCarthy

May 13, 2024

Again, I haven't looked at the Bluetooth API yet but the reasoning here doesn't add up to me. "Line 117 is a conditional statement that guards the Bluetooth SPP object ..." This isn't how I read this code. It prevents the call to the member function println when there's no valid, authenticated connection to the object. The purpose of the function printToAllPorrts is to output an informational message to every device that is currently connected. Removing BTconnected would result in a call to an unconnected Bluetooth object. The only thing I can say with certainty about making such a call is it's a waste of time. That's what BTconnected prevents. Aren't Bluetooth connections asynchronous by nature? A device constantly advertises itself as available to talk to but you have to make a connection to do so. After a Bluetooth device has been created, it's ready ... but not connected. When you do connect, the Bluetooth code makes the callback to inform you a connection has been established to your object. And if you fail to connect, the Bluetooth code will make the same callback telling you there's no connection to your object. I don't see anything in the log after the message "The SSP device is started, now you can pair it with Bluetooth!" that indicates you made a successful connection. Did you make a connection? If so, then the callback function should have occurred. It would be disturbing if that didn't happen.

SOLVED: In Search of Reliable Bidirectional Bluetooth Serial Port Profile (SPP) Communication

In Clinic

Timothy McCarthy

May 11, 2024

Eventually I will get to Bluetooth but I'm not there yet. I don't know the Bluetooth API anymore so I can't comment intelligently. However, I find it disturbing that the connection fluctuates dynamically between bidirectional and unidirectional. I can understand how that would make communication a real challenge. But I can't offer any advice until I tackle the API.

Time to see the Vet

In Clinic

Timothy McCarthy

May 11, 2024

Thanks. I want to release the code on github (if I can learn how to do it) but I'm concerned that the testing code may be the cause of the failure. I don't want that. But I do feel that a test tool like this is very helpful. I will send an email off ASAP to get that ball rolling.

Time to see the Vet

In Clinic

Timothy McCarthy

May 10, 2024

Prof. Farnsworth: "Good news, everyone!" Bender: "Uh-oh! I don't like the sound of that." Prof. Farnsworth: "I have two resolutions and one clue ..." Bender: "Here it comes ..." Prof. Farnsworth: "... and two apologies" Bender: "And I'm outta here!" Apology #1: The shoemaker's wife has no shoes. Dr. Petoi recently posted a video of a test of a T_SKILL_DATA command with the maximum number of elements that would fill up the newCmd buffer to demonstrate it could handle the data. In my reply I asked if he could increase the data amout in order to force the buffer overflow code to execute. I wasn't thinking. I could have done that myself with the test software I wrote. My bot wasn't working and had a hardware problem but I have been testing it with the battery off for some commands for a while. I could have easily written the stress test to overflow the newCmd buffer. But I wasn't thinking. So my apologies for asking you to do something I could have done myself. Apology #2: Do the whole job, not just half. The test is fairly trivial but the results ... well, that's getting ahead of things. Here's my first version of the test TEST_F(ftfBittleXProtocol, newCmd_overflow) { const unsigned BUFF_LEN{ 2507 }; // 1524 =125*20+7=2507 vector < int8_t > symphony(BUFF_LEN, 32); cmd_def_t command{ BEEP, "b", "BEEP" }; command += symphony; EXPECT_TRUE(on_command(command)); } When I ran the test I got a long beep and the output (trimmed for space) 1>[ RUN ] ftfBittleXProtocol.newCmd_overflow 1>ftBittleX::ftfBittleX::on_send 1>TX command : b32 32 32 32 32 ... 32 32 32 32 1>write count : 7522 1>G:\AppDev\Robotics\Petoi\test\ftBittleX\ftBittleX.cpp(423): error : Value of: result 1> Actual: false 1>Expected: true 1>ftSerial.write FAILED expected: 7522 actual: 2304 1>expected : 7522 1>actual : 2304 1>G:\AppDev\Robotics\Petoi\test\ftBittleX\ftBittleX.cpp(654): error : Value of: result 1> Actual: false 1>Expected: true 1>G:\AppDev\Robotics\Petoi\test\ftBittleX\ftprotocol.cpp(565): error : Value of: on_command(command) 1> Actual: false 1>Expected: true 1>[ FAILED ] ftfBittleXProtocol.newCmd_overflow (308 ms) I was surprised by the number of bytes sent and we saw the write failure message before in a previous test. First I cut down the data size to get the byte count closer to the buffer size (I'd figure out why the size was large later). But once I got the byte count down to just below 2507 I was still getting the write failure. So I needed to look more closely into that. I recalled the posts about the problem of BittleX reading the Bluetooth input and how Bluetooth would breakup the data into chunks that caused a BittleX read timeout error. Dr. Petoi said the solution was to increase the timeout value for those commands and asked if I could try that and play with the value. Well, I did try it and matched the BittleX TIMEOUT_LONG value in the OpenCat_serial code and tested it and didn't see any effect. But I didn't "play" with the value. This time I did by increasing the timeout value by 10 ms. And the result ... no more write failure! So, my apologies for slacking off and not doing as I'm told. Resolutions From my POV, this issue is on the BittleX side where the UART microcode needs to manage the input queue during read. The write timeout is the time to wait for the receiving device to signal ready to receive the next byte. But it could be on the Windows side where it manages the output queue during write. With that resolved I could explain the size of the data being sent. For an ASCII command like 'b' each data element is a number. An ASCII number has one byte for each digit in the number. I used a 2 digit number (32) for all elements. There is a space between each number. Thus I need 3 bytes for each element. The maximum number of elements I can have is then BUFF_LEN/3. That test gives 1>[ RUN ] ftfBittleXProtocol.newCmd_overflow 1>ftBittleX::ftfBittleX::on_send 1>TX command : b32 32 32 32 ... 32 32 32 32 1>write count : 2506 1>expected : 2506 1>actual : 2506 1>ftBittleX::ftfBittleX::on_response 1>Command completed (normal) 1>description : BEEP 1>cmd : b 1>id : 1 1>data : 32 32 32 32 ... 32 32 32 32 1> 12984 ms : RX_latency 1> 748 ms : RX_elapsed 1>response : 2 lines 1>Changing volume to 10/10 1> 1>b 1> 1>response : end //... 1>[ OK ] ftfBittleXProtocol.newCmd_overflow (14792 ms) To force the overflow condition I add 1 to the buffer size. vector < int8_t > symphony(BUFF_LEN / 3 + 1, 32); // overflow to get the output I'm looking for 1>[ RUN ] ftfBittleXProtocol.newCmd_overflow 1>ftBittleX::ftfBittleX::on_send 1>TX command : b32 32 32 32 ... 32 32 32 32 1>write count : 2509 1>expected : 2509 1>actual : 2509 1>ftBittleX::ftfBittleX::on_response 1>Command completed (normal) 1>description : BEEP 1>cmd : b 1>id : 1 1>data : 32 32 32 32 ... 32 32 32 32 1> 46 ms : RX_latency 1> 750 ms : RX_elapsed 1>response : 2 lines 1>OVFb 1> 1>response : end This is progress. I see the overflow response and I hear the error beep. I also see confirmation of my calculation of the response time (750 ms). The RX_latency value is an indicator of how long it the code to detect the overflow. But what about the rest of it? The overflow code replaces the command in error with the "standup" command and lets it get executed. In order to get that I need act as if I sent the "standup" command and I'm looking for the response. if (!HasFailure()) { // process overflow standup command cmd_def_t cmd_standup{ SKILL, "k", "SKILL" }; cmd_standup += string("up"); EXPECT_TRUE(on_response(cmd_standup)); } to get the output: 1>ftBittleX::ftfBittleX::on_response 1>Command completed (normal) 1>description : SKILL 1>cmd : k 1>id : 9 1>data : up 1> 0 ms : RX_latency 1> 23 ms : RX_elapsed 1>response : 3 lines 1>up 1> 1>k 1> 1>k 1> 1>response : end We're not done yet. This demonstrates that an ASCII command overflows properly. It should be identical for a binary command. If so, then the test for the binary test should be identical to the ASCII test and trivial to write and has identical behavior, TEST_F(ftfBittleXProtocol, BIN_newCmd_overflow) { const unsigned BUFF_LEN{ 2507 }; // 1524 =125*20+7=2507 vector < int8_t > symphony(BUFF_LEN + 1, 32); // overflow cmd_def_t command{ BEEP_BIN, "B", "BEEP" }; command += symphony; unsigned latency{ 14000 }; EXPECT_TRUE(on_command(command, latency)); if (!HasFailure()) { // process overflow standup command cmd_def_t cmd_standup{ SKILL, "k", "SKILL" }; cmd_standup += string("up"); EXPECT_TRUE(on_response(cmd_standup)); } } the results: 1>ftBittleX::ftfBittleX::on_send 1>TX command : 4220202020... 202020207e 1>write count : 2510 1>expected : 2510 1>actual : 2510 1>ftBittleX::ftfBittleX::on_response 1>Command completed (normal) 1>description : BEEP_BIN 1>cmd : B 1>id : 3 1>data : 32323232 ... 32323232 1> 46 ms : RX_latency 1> 749 ms : RX_elapsed 1>response : 2 lines 1>OVFB 1> 1>response : end 1>ftBittleX::ftfBittleX::on_response 1>Command completed (normal) 1>description : SKILL 1>cmd : k 1>id : 9 1>data : up 1> 0 ms : RX_latency 1> 228 ms : RX_elapsed 1>response : 3 lines 1>up 1> 1>k 1> 1>k 1> 1>response : end More progress. I was also able to expose a known issue in the binary data stream. A binary command is terminated by the tilde character ('~'). I believe the documentation cautions you to avoid using a tilde as a data value. The obvious reason is that the tilde will be interpreted as the end of the command but the command processing will see it as a data value and "strange" things may happen. As it turns out, my first version of the binary beep test used the same data length and values as the ASCII version TEST_F(ftfBittleXProtocol, ASC_newCmd_overflow) { const unsigned BUFF_LEN{ 2507 }; // 1524 =125*20+7=2507 vector < int8_t > symphony(BUFF_LEN / 3 + 1, 32); // no overflow cmd_def_t command{ BEEP_BIN, "B", "BEEP_BIN" }; command += symphony; //... This won't overflow the newCmd buffer but is a malformed command. The buzzer tone never stops. The data for the beep command comes in pairs, "note duration", We can calculate if the test data is a properly formed data stream by subtracting the command suffix ('~') from the length of the data stream we provide, (BUFF_LEN / 3 + 1). The result must be even value for a properly formed data stream. (BUFF_LEN / 3 + 1) - 1. = 836 -1 = 835. This is more directly shown by simply using BUFF_LEN - 1 for the data vector < int8_t > symphony(BUFF_LEN - 1, 32); // malformed This yields 3 data points for testing: vector < int8_t > malformed_symphony(BUFF_LEN - 1, 32); vector < int8_t > no_overflow_symphony(BUFF_LEN, 32); vector < int8_t > overflow_symphony(BUFF_LEN + 1, 32); I don't recommend the running the malformed test. The buzzer never ends and "annoys the pig." Here's the exciting footage of the T_BEEP buffer overflow test: https://youtu.be/vbEehKzfEt4 These results resolve two issues: 1. Why the Windows WriteFile call fails? (Thanks to Doc Petoi) 2. What happens when you do overflow newCmd buffer? The Last Test We need to do one last test before we put this issue to rest. We need to test when the T_SKILL_DATA command overflows the newCmd buffer the program doesn't crash. Logically, this should no different than the binary BEEP command but it is the issue under test so we want to explicitly test it. We just have to change the command we send; the data or its format doesn't matter. All that matters is the size of the data. The response should be the same as that for the binary BEEP command. There's an interesting twist to this test TEST_F(ftfBittleXProtocol, SKILL_DATA_newCmd_overflow) { const unsigned BUFF_LEN{ 2507 }; // 1524 =125*20+7=2507 vector nonsense(BUFF_LEN + 1, 32); // overflow cmd_def_t command{ SKILL_DATA, "K", "SKILL_DATA" }; command += nonsense; unsigned latency{ 14000 }; EXPECT_TRUE(on_command(command, latency)); if (!HasFailure()) { // process overflow standup command cmd_def_t cmd_standup{ SKILL, "k", "SKILL" }; cmd_standup += string("up"); EXPECT_TRUE(on_response(cmd_standup)); } } The results confirm our analysis but uncover a problem in processing some commands 1>[ RUN ] ftfBittleXProtocol.SKILL_DATA_newCmd_overflow 1>ftBittleX::ftfBittleX::on_send 1>TX command : 4b20202020 ... 202020207e 1>write count : 2510 1>expected : 2510 1>actual : 2510 1>ftBittleX::ftfBittleX::on_response 1>Command completed (toggle) 1>description : SKILL_DATA 1>cmd : K 1>id : 10 1>data : 32323232 ... 32323232 1> 45 ms : RX_latency 1> 1027 ms : RX_elapsed 1>response : 4 lines 1>OVFK 1> 1>up 1> 1>k 1> 1>response : end 1>ftBittleX::ftfBittleX::on_response 1>ftBittleX::ftfBittleX::on_response elapse timeout! 1>description : SKILL 1>cmd : k 1>id : 9 1>data : up 1> 0 ms : RX_latency 1> 2028 ms : RX_elapsed 1>response : 1 lines 1>k 1> 1>response : end When most commands complete they echo the command to the serial monitor. However, some commands, like T_SKILL_DATA, do some software "sleight-of-hand" and echo back one or more lowercase values of the command. The T_SKILL_DATA is one of these and normally responds with two 'k' values. But it doesn't do that here. For overflow it responds with "OVFK". It then issues the "kup" command which does respond with two 'k' valuss when it completes. So the test harness becomes confused about the responses. This is part of why I'm not fond of behavior I didn't ask for. It's inconsistent. I think an argument can be made for the error handling to be elsewhere. But that's a design, not a testing, issue. The tests here confirm the expected behavior and not a crash. Here's the dramatic and astounding video of the T_SKILL_DATA buffer overflow test: https://youtu.be/DIS5Efdx258 A Possible Clue As a result of the diagnosis of a possible hardware failure, I started to think about how to test the hardware to confirm the diagnosis. This brought back memories of my early days working in a computer manufacturing plant when I worked with various hardware components and microprocessors. It's been so long since then (40 years) that I've forgotten the day-to-day, simple tasks you do when a hardware issue comes up. But I suddenly recalled one of them, a task so simple you do it by reflex: visual inspection. You look at the component to see if there's a visible flaw. It takes a trained eye to see some flaws but the folks who make them know where to look and can spot them at a glance. The component under suspicion here is the power supply circuit. The documentation identifies where that circuit is located on the board. So I took a picture of Malfunctioning Eddie's board: Here's a enlarged view of the power circuit (if I've located it correctly): My eye isn't educated enough to know if there's a visible flaw. My eye is drawn to the one of the two circular soldered connections. The one on the right; the upper right hand corner seems odd to me; as if a connection point has been uncovered due to solder fatigue or something. But Idunno. But I do think there's someone who could quickly tell if there's an obvious flaw.

Time to see the Vet

In Clinic

Timothy McCarthy

May 09, 2024

If that's the case, then I'd expect the servo voltage to reflect that. I can test that the servos are working properly and measure the voltage on the servo pins. First, I need to document the battery test. Aside: I hope that others here will chime in with ideas and corrections.

Time to see the Vet

In Clinic

Timothy McCarthy

May 09, 2024

USB only boot test Link here: https://youtu.be/8KEaZYYn72Y Initializing MPU... OK - Testing MPU connections...attempt 0 - MPU6050 connection successful - Initializing DMP... - Enabling DMP... - Enabling interrupt detection (Arduino external interrupt 26)... - DMP ready! Waiting for the first interrupt... Bluetooth name: Bittle66 Waiting for a client connection to notify... Bluetooth name: Bittle66 The device is started, now you can pair it with bluetooth! Setup ESP32 PWM servo driver... Calibrated Zero Position 135 120 135 135 190 80 190 80 190 80 80 190 Build skill list...62 TaskQ Init voice Number of customized voice commands on the main board: 10 Ready! rest g d

Time to see the Vet

In Clinic

Timothy McCarthy

May 09, 2024

I may end up asking for a replacement board but ... ... we're fearless here and this is the place for diagnosing issues, both hardware and software. I want to help to improve the product and help others to do so as well. So, if there's a problem with the bot, let's address it, and try to fix it. I'm willing to take the time and do as much of the work as I can. This is an opportunity to learn but I will need your guidance on what needs to be done. Sound good? That said, if we suspect a hardware issue, how should I go about further diagnostic testing? I would, and based upon this assessment, will, do a tear down and then start testing components, starting with the servos; verify that the servos are correctly working. I would also test the battery (I have but I'll document it here.) After that I'm in a gray area and will need your advice. We should be able to do some testing of the BiBoard and hat, individually and combined. I have some basic equipment, multimeter, breadboard, spare adunino board and ESP32 chips, a power supply for 3 and 5V. In other words, if you had my bot on your workbench, how would you go about testing it? I'll try to do that. Addendum: My engineering sense tells me I should have a spare bot so I'll probably order one in a week or two.

Time to see the Vet

In Clinic

Timothy McCarthy

May 09, 2024

Can we reassess the patient again? Here's the state of affairs : https://youtu.be/cIR9SLpXLNw Unsuccessful correctives: I've upgraded the firmware using the Desktop app (1.1.7 and 1.1.9) I've done a factory reset for both versions. The IMU calibration has been erratic, but I did get it to complete. I need to ... ... get the bot into a stable working state ... get a valid, reliable build of the codebase

Time to see the Vet

In Clinic

Timothy McCarthy

May 09, 2024

Can you do the longer test and see the result? Find out if it crashes or if the code that is supposed to handle the condition works. You want to trigger that code.

Time to see the Vet

In Clinic

Timothy McCarthy

May 09, 2024

"The T_SKILL_DATA overflow is already handled by cmdLen >= BUFF_LEN alone because it's an "or" condition." That condition handles just the raw overflow of the newCmd buffer. No command may exceed the BUFF_LEN limit. By definition, read_serial would directly overflow the newCmd buffer in that case. But that's not the test case we're interested in. We're interested in the case of spaceAfterStoringData <= cmdLen. The command length exceeds the threshold spaceAfterStoringData of the move operation but not the size of the command buffer newCmd. I provided a test of that with T_SKILL_DATA that demonstrates the condition isn't detected 1>G:\AppDev\Robotics\Petoi\test\utOpenCatEsp32\readoverflow.cpp(120): error : Value of: is_overflowex() 1> Actual: false 1>Expected: true 1>Overflow undetected for: 1> K : token 1> 17 : cmdLen 1> 16 : spaceAfterStoringData I'm focused on the value of spaceAfterStoringData when the expression is evaluated. In read_serial, during each iteration of the do-while loop, what is the value of spaceAfterStoringData? What skill command does it pertain to? Get Out Jail Free: Fortunately, or unfortunately, this analysis is mute due to the sheer size of newCmd (2507 bytes). I don't have a test case to demonstrate the error state. We can calculate where the threshold value is for a given skill command, and I can't find a pair that would trigger the error. In addition, I've been mischaracterizing the error as a buffer overflow. It's not a buffer overflow but a data overlap. The duty angle data is inserted into the newCmd buffer by moving the skill command frame list to the end of the buffer. If the duty angle data overlap the frame it will overwrite the frame data.

Time to see the Vet

In Clinic

Timothy McCarthy

May 08, 2024

see Skill::inplaceShift

Time to see the Vet

In Clinic

Timothy McCarthy

May 08, 2024

I'm having trouble replying to this due to forum issues. Apparently, I look like some sort of bovine. Will try again later.

Time to see the Vet

In Clinic

Timothy McCarthy

May 08, 2024

The size of the newCmd buffer is #define BUFF_LEN 2507 // 1524 =125*20+7=2507 char *newCmd = new char[BUFF_LEN + 1]; The stress test for commands such as the 'b' command need to have 2508 or more values or the equivalent. They may still run into a false-positive overflow depending upon a prior skill command. I want a test to force the overflow conditions. "Do you have a test case of long skill data in text form?" Say again? The skill data command is binary, so no. I'm sure you mean an example of an ASCII command with parameters that is very long, like 'b', I haven't done stress testing yet. I was stopped when doing the protocol characteristic testing, i.e., testing valid, nominal commands and response. I do have a test "waiting in the wings". The tests for INDEXED_SEQUENTIAL_ASC has one for with a list with repeated servo indexes. Right now, it uses 28 servo+angle elements concatenated together to form the command. but could easily be extended for any length. The concatenation operator can be applied to any command. So, "Yes, I can", but haven't gotten to that yet.

Time to see the Vet

In Clinic

Timothy McCarthy

May 08, 2024

That seems reasonable and I thought of it as well. See the first test in part 3. The code is cleaner, but the logic is still flawed. If the token is not in the list, then it is allowed to overflow, Since spaceAfterStoringData <= BUFF_LEN, it dominates the second expression. Its value is unrelated to the current token. so, you can get the false--positive or, worse is the current token requires more freer buffer than the previous skill token did.

Time to see the Vet

In Clinic

Timothy McCarthy

May 08, 2024

Time to see the Vet

In Clinic

Timothy McCarthy

May 08, 2024

Time to see the Vet

In Clinic

Timothy McCarthy

May 08, 2024

Time to see the Vet

In Clinic

Timothy McCarthy

May 07, 2024

No soap. I even tried the Factory reset. All the SW reported success. The bot still jerks slowly and never gets to rest pose. Sad face.