possible long sleep on H743?

ChibiOS public support forum for topics related to the STMicroelectronics STM32 family of micro-controllers.

Moderators: RoccoMarco, barthess

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

possible long sleep on H743?

Postby tridge » Sat Aug 21, 2021 10:41 am

I'm chasing a bug that happens occasionally in ArduPilot on STM32H743. We're using 20.03 release.
The bug manifests itself as a long loop in flight. The normal loop time of a copter is 2500us, and we occasionally see a loop that is 65536us longer (to within a few us). This only has been seen on one particular type of board that uses the 16 bit TIM12 as the system timer. ArduPilot runs on lots of different STM32H743 based boards, the others use a 32 bit system timer. This particular board uses the 16 bit TIM12 as all the 32 bit timers are used by various peripherals.

Key information:
- system timer on TIM12
- TIM8 is being used as well for DShot DMA output (TIM8 shares an interrupt vector with TIM12)
- the bug typically happens every few minutes when flying
- we run CH_CFG_ST_FREQUENCY at 1000000, so each system tick is 1us
- we have CH_CFG_INTERVALS_SIZE=32
- we run tickless with CH_CFG_ST_TIMEDELTA=2
- we use CH_CFG_TIME_QUANTUM=0
- we're using gcc 10.2.1

I know this is a vague bug report, and my apologies for that. I am trying to find a way to reliably reproduce on the bench.
What I'm wondering is if anyone knows of a reason why a chThdSleep() would sometimes sleep by an extra 65536us when a sleep of a few hundred us is requested with the above config. Note that I haven't actually proven that a long chThdSleep() is what is happening, it is just that this would be consistent with the data that we have.

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: possible long sleep on H743?

Postby Giovanni » Sat Aug 21, 2021 12:27 pm

Hi Tridge,

This looks like a missed deadline, timers resynchronize after a full cycle of 2^16.

Could you try to increase your delta? this is one of the possible causes.

Which RT version are you using? we made important improvements to VTs recently.

Giovanni

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: possible long sleep on H743?

Postby tridge » Sat Aug 21, 2021 11:48 pm

> Could you try to increase your delta? this is one of the possible causes.

yes, will try that, although if this is the problem then it leaves me concerned for the boards that have a 32 bit timer. If we can miss a deadline in a chThdSleep() with a 16 bit counter (causing a 65ms scheduling error) then could we also miss a deadline for the majority of our boards with a 32 bit timer and end up with a 4294 second scheduling error?
Is this problem related to having a high interrupt latency for some ISR that is taking longer than it should?
Another thing I'd really like to work out is some way to make this bug happen more often to help with testing possible fixes. It is frustrating that I've only once seen the bug in a bench test, but we get it quite often in flight. Apart from higher EMI levels I can't really see what is different from the RTOS point of view from being on a bench and flying.

> Which RT version are you using? we made important improvements to VTs recently.

we're based on the 20.3 branch. The last commit we synced with on that branch was:
"Improved OpenOCD debug configuration generator. Now the script updates the project containing the elf file after genera
ting the configuration"
which means we do have this "Back-ported the latest VT algorithm and VT_Storm test application."

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: possible long sleep on H743?

Postby tridge » Sun Aug 22, 2021 12:31 am

I've now got a way to make the bug happen more often in bench tests. I replaced the call to chThdSleep() in our delay_microseconds() call with something that loops calling chThdSleep(2) until the delay condition is met. That means it is doing lots of delays of 2 microseconds, which makes it much more likely we hit the problem that the free running timer wraps while setting up the sleep.
The bit that puzzles me now is that we have lots of threads and several of them look like this:

while (true) {
delay_microseconds(1000);
do_some_work();
}

so why do we end up with an extra delay of 65536? Why doesn't the wakeup of one of the other threads within 1000us trigger the code in chSysTimerHandlerI() that walks the task list and wakes up all the threads that are due? So I'd still expect an issue, but it should cause a scheduling miss of at most 1000us (or whatever the shortest sleep going on at the time is).
On this board we have 25 threads in normal operation, most of which are sleeping for 1ms at any moment.

User avatar
FXCoder
Posts: 384
Joined: Sun Jun 12, 2016 4:10 am
Location: Sydney, Australia
Has thanked: 180 times
Been thanked: 130 times

Re: possible long sleep on H743?

Postby FXCoder » Sun Aug 22, 2021 1:38 am

Hi Tridge,
The overhaul of VT started when the VT Continuous mode capability was implemented.
This uncovered some issues that caused us to rework the logic of the DoTick function.
Rather than go into detail of the issues I'd suggest going to the revised VT in trunk.

Also note there is now a Runtime Faults Collection Unit that records Runtime Faults which is used to report if a skipped deadline happened.
This is active all the time (not a debug option) in VT.
The VT reports skipped deadline at line 468... chRFCUCollectFaultsI(CH_RFCU_VT_SKIPPED_DEADLINE);
See rt/chrfcu.h/c for further information.

Regards
--
Bob

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: possible long sleep on H743?

Postby tridge » Sun Aug 22, 2021 1:57 am

Hi Bob,

FXCoder wrote:Hi Tridge,
The overhaul of VT started when the VT Continuous mode capability was implemented.
This uncovered some issues that caused us to rework the logic of the DoTick function.
Rather than go into detail of the issues I'd suggest going to the revised VT in trunk.

copying such critical code from ChibiOS trunk to use in a production release autopilot doesn't really appeal. We're in the process of finalising the 4.1.0 stable release of ArduPilot and are are trying to find the safest fix.
Also note there is now a Runtime Faults Collection Unit that records Runtime Faults which is used to report if a skipped deadline happened.
This is active all the time (not a debug option) in VT.

thanks
My biggest concern is if the same issue can happen on boards with 32 bit system timers, and if so then would that result in a missed deadline giving a 4294s sleep? I know the chances of this are low as the wrap is 65k times less frequent, but the consequences would also be a lot worse.

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: possible long sleep on H743?

Postby tridge » Sun Aug 22, 2021 2:46 am

a bit more info ... I tried changing all sleeps in ArduPilot to be in increments of 40us, and I still sometimes see a 65ms sleep.
https://github.com/tridge/ardupilot/blo ... r.cpp#L157
if I raise this to 50us then I have yet to see the problem.
So this would seem to imply that we'd need CH_CFG_ST_TIMEDELTA at least 50 to avoid the issue, which seems awfully high. I doubt we really have an ISR that runs for that long (although I haven't tried to prove that).
Does the IRQ priority of the system timer matter for this issue? Currently TIM12 priority is 7.

User avatar
FXCoder
Posts: 384
Joined: Sun Jun 12, 2016 4:10 am
Location: Sydney, Australia
Has thanked: 180 times
Been thanked: 130 times

Re: possible long sleep on H743?

Postby FXCoder » Sun Aug 22, 2021 5:32 am

Hi Tridge,
Seems 20.3 needs a further update to VT to bring to trunk level functionality.
This may explain your issue.
Will discuss with Giovanni when he is online.

Cheers
--
Bob
Last edited by FXCoder on Sun Aug 22, 2021 5:39 am, edited 1 time in total.

tridge
Posts: 141
Joined: Mon Sep 25, 2017 8:27 am
Location: Canberra, Australia
Has thanked: 10 times
Been thanked: 20 times
Contact:

Re: possible long sleep on H743?

Postby tridge » Sun Aug 22, 2021 5:35 am

thanks Bob! I tried a quick hack to bring trunk vt into 20.3, but no luck getting it to boot yet. I suspect I stuffed up the init of the vtlist somehow

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: possible long sleep on H743?

Postby Giovanni » Sun Aug 22, 2021 6:04 am

Hi Tridge,

I will update 20.3 with the latest changes and make those "official", anyway you should look into your delta setting, increasing delays is not the same thing. You may try a delta of 10 for example. With 32 bits timers you could get 2^32 delays like you suggested if you miss a deadline.

If you enable assertions those 2^16 delays should result in a "missed deadline" assertion making the problem much more evident, in RT7, when assertions are disabled, problems are still caught in the new RFCU (Runtime Faults Collection Unit) making an handling possible.

New VTs have an important difference: Previously "delta" was also influenced by duration of callbacks, you needed to trim it on your application, now it is just function of CPU execution speed making it more reliable, in general much lower deltas are possible reducing jitter. i tested on a 170MHz G4 with a systick at 16MHz and a delta of 15, very reliable.

A renewed "VT Storm" application allows to find the correct delta for each setup.

Giovanni


Return to “STM32 Support”

Who is online

Users browsing this forum: No registered users and 13 guests