invalid kernel stats Topic is solved

Report here problems in any of ChibiOS components. This forum is NOT for support.
Thargon
Posts: 135
Joined: Wed Feb 04, 2015 5:03 pm
Location: CITEC, Bielefeld University, germany
Has thanked: 15 times
Been thanked: 24 times
Contact:

invalid kernel stats  Topic is solved

Postby Thargon » Fri Dec 03, 2021 3:57 pm

Hi,

I noticed that the kernel statistics are broken somehow - let me try to explain.

Note: I used the RT-STM32F103RB-NUCLEO64 demo of the stable_21.11.x version to investigate the issue in-depth. However, I could reproduce the same issue with other demos as well.

When I read chVTGetSystemTimeX() right after system initialization, I get a correct value of 0. When I read currcore->kernel_stats.m_crit_thd.worst right after, I get an arbitrary value. At first glance, the values seemed random, but when resetting the board, the value would increase every time. Actually, there seems to be some correlation to time, as I found that the value would increment at ~16 MHz (every ~1 second the value increases by 0x01000000). When reading currcore->kernel_stats.m_crit_thd.worst multiple times within a loop afterwards, the value would not change, though. It is "updated" only by reset (and probably whenever the last measurement would exceed it).
Interestingly, the issue does not affect any of the other statistics. Neither the other members of m_crit_thd are affected, nor any member of m_crit_isr. All of those seem to be correct.

I had a glimpse at the code, but I did not find any issues there. I also wonder what kind of error would result in such odd behaviour. Looks like some addresses are VERY off.

Maybe the issue is actually related to the compiler, but I tested with GCC 10.3.1 as well as GCC 9.3.1 with no effect.

I am sorry that I can only (rather vaguely) describe the issue and can not provide any sort of fix.

Regards,
Thomas

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: invalid kernel stats

Postby Giovanni » Sun Dec 05, 2021 9:42 am

Hi,

Not sure to understand fully :-)

Perhaps it is measuring the critical zone when the system is started on chSysInit().

Giovanni

Thargon
Posts: 135
Joined: Wed Feb 04, 2015 5:03 pm
Location: CITEC, Bielefeld University, germany
Has thanked: 15 times
Been thanked: 24 times
Contact:

Re: invalid kernel stats

Postby Thargon » Mon Dec 06, 2021 9:54 am

I am fairly sure that there is no invalid measurement during startup, because the reported values are much greater than the total uptime of the system.

For instance, I just got the value 2141351942 (0x7FA27006) right after startup (uptime < 100 ms). @ 72 MHz, however, that reported value would account for almost 30s of CPU time.

As mentioned before, this "reset value" seems to be correlated with some internal 16 MHz timer (as for that particular demo that is). If I manually reset the the value of currcore->kernel_stats.m_crit_thd.worst to 0 right after chSysInit(), consecutive readings seem to be correct.

Since I could not find any issues with the dbg-statistics code, my guess is that somewhere in the code, data is written to an invalid pointer.

- Thomas

Thargon
Posts: 135
Joined: Wed Feb 04, 2015 5:03 pm
Location: CITEC, Bielefeld University, germany
Has thanked: 15 times
Been thanked: 24 times
Contact:

Re: invalid kernel stats

Postby Thargon » Mon Dec 06, 2021 1:25 pm

I think I may have found the issue - and the solution is very simple.

When the TM object is initialized, all members (including 'last') are set to 0. This initialization is done within the chSysInit() function, which ends with a call to chSysUnlock(). That method again eventually calls chTMStopMeasurementX(), which calculates the best and worst values according to the current result of chSysGetRealtimeCounterX() and the member variable 'last' of the TM object. Since 'last' was not set to a valid value yet but only initialized with 0, the result must be considered "undefined" and that is what I originally observed.

I solved this issue by simply initializing the value of 'last' not with 0 but with chSysGetRealtimeCounterX(). This fixes the initial measurement during startup and does not interfere with any subsequent measurements, as 'last' will be overwritten by every call of chTMStartMeasurementX().

Regards
Thomas

Thargon
Posts: 135
Joined: Wed Feb 04, 2015 5:03 pm
Location: CITEC, Bielefeld University, germany
Has thanked: 15 times
Been thanked: 24 times
Contact:

Re: invalid kernel stats

Postby Thargon » Tue Feb 08, 2022 11:00 am

I just noticed that subsequent readings of kernel statistics is corrupted when compiling the code with GCC 10.3, even though it works fine with GCC 9.3.

Furthermore, there seems to be more to this, as I could reproduce the issue on a NUCLEO-F767ZI with both compilers. Actually I rather expect some broken code somewhere in ChibiOS (writing to an invalid address perhaps) than multiple compilers doing bogus on multiple (but not all!) devices.

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: invalid kernel stats

Postby Giovanni » Tue Feb 08, 2022 11:06 am

Could it be an effect of cache? try disabling it for a test.

Giovanni

Thargon
Posts: 135
Joined: Wed Feb 04, 2015 5:03 pm
Location: CITEC, Bielefeld University, germany
Has thanked: 15 times
Been thanked: 24 times
Contact:

Re: invalid kernel stats

Postby Thargon » Tue Feb 08, 2022 11:42 am

I am not sure how to do that. I just tried by disabling the calls of SCB_EnableXCache() crt1.c, but with no effect. Can you please give me some hints, how to disable caches for the M7 platform?

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: invalid kernel stats

Postby Giovanni » Tue Feb 08, 2022 1:48 pm

Thargon wrote:I am not sure how to do that. I just tried by disabling the calls of SCB_EnableXCache() crt1.c, but with no effect. Can you please give me some hints, how to disable caches for the M7 platform?


This was correct, then it is not related to cache.

Could you explain how to reproduce and verify the problem?

Giovanni

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: invalid kernel stats

Postby Giovanni » Mon Mar 14, 2022 9:22 am

Hi,

Fixed as bug #1222. (initialization problem)

Any news about the GCC 10.3 problem? I am trying to look into it.

Giovanni

Thargon
Posts: 135
Joined: Wed Feb 04, 2015 5:03 pm
Location: CITEC, Bielefeld University, germany
Has thanked: 15 times
Been thanked: 24 times
Contact:

Re: invalid kernel stats

Postby Thargon » Mon Mar 14, 2022 12:24 pm

Giovanni wrote:Fixed as bug #1222. (initialization problem)

Thanks!

Giovanni wrote:Any news about the GCC 10.3 problem? I am trying to look into it.

Unfortunately, I could not track down the issue.
GCC 9.3 seems to work fine. though. After wiping all temporal data, a fresh new build from scratch did not show that behaviour (usually GCC errors when using old temp files *shrug*).

Actually, GCC 10 has so many issues (linker warnings, bigger code size etc.) that I decided to stick to 9.3 for the time being. If I can be of any help, please let me know.


Return to “Bug Reports”

Who is online

Users browsing this forum: No registered users and 28 guests