performance issues on STM32F1 / Cortex-M3

Thargon · Postby **Thargon** » Fri Dec 03, 2021 4:12 pm

Greetings,

I am having severe performance issues when running my code* on STM32F1-based microcontrollers, loosing about two orders of magnitude in many test suite benchmarks (e.g. context switch performance). My configuration files mcuconf.h and halconf.h are almost identical to the RT-STM32F103RB-NUCLEO64 demo (I only enabled RTC and SPI), but my chconf.h file differs significantly. However, even if I modify the demo to match my configuration, I was unable to reproduce the issue.

I assume there must be something odd going on in my code, but I have no idea what that could be. Moreover, when running the same code on STM32F4/L4/G0 platforms, I have no such issues at all. Do you have any ideas, how/where I should investigate my code further? I am quite puzzled and rather lost

I figured that the issue could be related to the Cortex-M3, since all other microcontrollers I tested are based on M4 or M0 cores, but that's no more than a hypothesis at this point.

I am grateful for any advice!

Thank you in advance
Thomas

* I invite you to check my code yourself: https://gitlab.ub.uni-bielefeld.de/AMiRo/AMiRo-OS
Just clone the main or ChibiOS_21.11.x branch, run the setup.sh script (initialize project) and build any module via the Makefile right from the root directory. Access the shell via a serial terminal (baudrate 115200) and run the "kernel:test" command.

Postby **Giovanni** » Fri Dec 03, 2021 4:31 pm

Hi,

The F1 code has not been touched for a while, what are your clock settings? could it simply be a configuration problem?

BTW, which board I am supposed to use? I just have an old discovery F100.

Giovanni

Thargon · Postby **Thargon** » Mon Dec 06, 2021 9:29 am

Hi,

I guess you are right and I was too quick filing a complaint.
The issue seems to be caused my ST frequency settings in combination with a 16-bit HW timer. I've configured ST to tick at 1us resolution / 1MHz frequency. With the 16-bit timer of the F1 microcontrollers, the systime_t value overflows every ~65 ms. All benchmarks, however, are executed for 1000 ms and performance is calculated using systime_t variables instead of sysinterval_t, which renders the calculated results invalid. I guess, performance is actually okay, but I cannot validate this right now - obviously.

The takeaway message of this "issue" is probably, that kernel benchmarks should be modified to use sysinterval_t for performance calculations instead of systime_t. I'll leave this to you to consider for now, because I can not fully estimate the complexity of such a change.

Interestingly, I could not reproduce the issue with the demos, because a 1 MHz ST frequency would not run at all (even though I multiplied the CH_CFG_ST_TIMEDELTA value accordingly). Unfortunately I don't have the time to investigate the topic much further.

Regards,
Thomas

PS: I assumed you were well stocked with hardware, because ChibiOS supports so many MCUs. For my project, however, I can only support a few NUCLEO-boards (as well as our custom hardware) I have at hand. Even the F4-Discovery got lost and is no longer maintained, unfortunately.

Thargon · Postby **Thargon** » Mon Feb 07, 2022 5:04 pm

I found the time to write a patch, which modifies the RT test suite: https://gitlab.ub.uni-bielefeld.de/AMiRo/AMiRo-OS/-/blob/main/kernel/patches/0005_tests-with-fast-16bit-ST.patch

At its core, it limits all intervals to TIME_MAX_SYSTIME. As a result, however, some tests and benchmarks might behave slightly differently. For instance:

Step 3 of rt_test_005_001_execute() (thread sleeps for 1 second) might be skipped completely, if systime_t can not represent a single second.
Benchmarks are designed to run for a whole second and are thus chopped into multiple iterations of smaller runs. Alternatively, I could have shorten the overall benchmark time and extrapolated the result to a full second, but I wanted any side effects (like IO interrupts) to be captured as good as possible (it's not perfect though).

The patch is not complete, though, and probably needs some more work in order to be merged to mainline:

Only the RT test suite was modified, but NIL should most probably be adapted accordingly.
Not all tests of RT have been adapted. Basically any tests, which require CH_DBG_THREADS_PROFILING to be TRUE are not patched, but I might have missed some more.

You should be able to test the patch on any MCU with ST timer set to a 16 bit HW timer and some other configurations cranked up:

Code: Select all

#define CH_CFG_ST_RESOLUTION                16
#define CH_CFG_ST_FREQUENCY                 100000UL
#define CH_CFG_ST_TIMEDELTA                 100

As a side note: The reason why I could not reproduce the issue reliably with the demos before was another patch of mine, which modified the chTimeAddX() function (cf. this discussion). With the new patch, those changes became obsolete, so this is somewhat my replacement for the old patch.

Regards
Thomas

ChibiOS Free Embedded RTOS

performance issues on STM32F1 / Cortex-M3

performance issues on STM32F1 / Cortex-M3

Re: performance issues on STM32F1 / Cortex-M3

Re: performance issues on STM32F1 / Cortex-M3

Re: performance issues on STM32F1 / Cortex-M3

Who is online