Greetings,
I am having severe performance issues when running my code* on STM32F1-based microcontrollers, loosing about two orders of magnitude in many test suite benchmarks (e.g. context switch performance). My configuration files mcuconf.h and halconf.h are almost identical to the RT-STM32F103RB-NUCLEO64 demo (I only enabled RTC and SPI), but my chconf.h file differs significantly. However, even if I modify the demo to match my configuration, I was unable to reproduce the issue.
I assume there must be something odd going on in my code, but I have no idea what that could be. Moreover, when running the same code on STM32F4/L4/G0 platforms, I have no such issues at all. Do you have any ideas, how/where I should investigate my code further? I am quite puzzled and rather lost I figured that the issue could be related to the Cortex-M3, since all other microcontrollers I tested are based on M4 or M0 cores, but that's no more than a hypothesis at this point.
I am grateful for any advice!
Thank you in advance
Thomas
* I invite you to check my code yourself: https://gitlab.ub.uni-bielefeld.de/AMiRo/AMiRo-OS
Just clone the main or ChibiOS_21.11.x branch, run the setup.sh script (initialize project) and build any module via the Makefile right from the root directory. Access the shell via a serial terminal (baudrate 115200) and run the "kernel:test" command.
performance issues on STM32F1 / Cortex-M3
Moderators: RoccoMarco, lbednarz, utzig, tfAteba, barthess
- Giovanni
- Site Admin
- Posts: 14457
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: performance issues on STM32F1 / Cortex-M3
Hi,
The F1 code has not been touched for a while, what are your clock settings? could it simply be a configuration problem?
BTW, which board I am supposed to use? I just have an old discovery F100.
Giovanni
The F1 code has not been touched for a while, what are your clock settings? could it simply be a configuration problem?
BTW, which board I am supposed to use? I just have an old discovery F100.
Giovanni
-
- Posts: 135
- Joined: Wed Feb 04, 2015 5:03 pm
- Location: CITEC, Bielefeld University, germany
- Has thanked: 15 times
- Been thanked: 24 times
- Contact:
Re: performance issues on STM32F1 / Cortex-M3
Hi,
I guess you are right and I was too quick filing a complaint.
The issue seems to be caused my ST frequency settings in combination with a 16-bit HW timer. I've configured ST to tick at 1us resolution / 1MHz frequency. With the 16-bit timer of the F1 microcontrollers, the systime_t value overflows every ~65 ms. All benchmarks, however, are executed for 1000 ms and performance is calculated using systime_t variables instead of sysinterval_t, which renders the calculated results invalid. I guess, performance is actually okay, but I cannot validate this right now - obviously.
The takeaway message of this "issue" is probably, that kernel benchmarks should be modified to use sysinterval_t for performance calculations instead of systime_t. I'll leave this to you to consider for now, because I can not fully estimate the complexity of such a change.
Interestingly, I could not reproduce the issue with the demos, because a 1 MHz ST frequency would not run at all (even though I multiplied the CH_CFG_ST_TIMEDELTA value accordingly). Unfortunately I don't have the time to investigate the topic much further.
Regards,
Thomas
PS: I assumed you were well stocked with hardware, because ChibiOS supports so many MCUs. For my project, however, I can only support a few NUCLEO-boards (as well as our custom hardware) I have at hand. Even the F4-Discovery got lost and is no longer maintained, unfortunately.
I guess you are right and I was too quick filing a complaint.
The issue seems to be caused my ST frequency settings in combination with a 16-bit HW timer. I've configured ST to tick at 1us resolution / 1MHz frequency. With the 16-bit timer of the F1 microcontrollers, the systime_t value overflows every ~65 ms. All benchmarks, however, are executed for 1000 ms and performance is calculated using systime_t variables instead of sysinterval_t, which renders the calculated results invalid. I guess, performance is actually okay, but I cannot validate this right now - obviously.
The takeaway message of this "issue" is probably, that kernel benchmarks should be modified to use sysinterval_t for performance calculations instead of systime_t. I'll leave this to you to consider for now, because I can not fully estimate the complexity of such a change.
Interestingly, I could not reproduce the issue with the demos, because a 1 MHz ST frequency would not run at all (even though I multiplied the CH_CFG_ST_TIMEDELTA value accordingly). Unfortunately I don't have the time to investigate the topic much further.
Regards,
Thomas
PS: I assumed you were well stocked with hardware, because ChibiOS supports so many MCUs. For my project, however, I can only support a few NUCLEO-boards (as well as our custom hardware) I have at hand. Even the F4-Discovery got lost and is no longer maintained, unfortunately.
-
- Posts: 135
- Joined: Wed Feb 04, 2015 5:03 pm
- Location: CITEC, Bielefeld University, germany
- Has thanked: 15 times
- Been thanked: 24 times
- Contact:
Re: performance issues on STM32F1 / Cortex-M3
I found the time to write a patch, which modifies the RT test suite: https://gitlab.ub.uni-bielefeld.de/AMiRo/AMiRo-OS/-/blob/main/kernel/patches/0005_tests-with-fast-16bit-ST.patch
At its core, it limits all intervals to TIME_MAX_SYSTIME. As a result, however, some tests and benchmarks might behave slightly differently. For instance:
As a side note: The reason why I could not reproduce the issue reliably with the demos before was another patch of mine, which modified the chTimeAddX() function (cf. this discussion). With the new patch, those changes became obsolete, so this is somewhat my replacement for the old patch.
Regards
Thomas
At its core, it limits all intervals to TIME_MAX_SYSTIME. As a result, however, some tests and benchmarks might behave slightly differently. For instance:
- Step 3 of rt_test_005_001_execute() (thread sleeps for 1 second) might be skipped completely, if systime_t can not represent a single second.
- Benchmarks are designed to run for a whole second and are thus chopped into multiple iterations of smaller runs. Alternatively, I could have shorten the overall benchmark time and extrapolated the result to a full second, but I wanted any side effects (like IO interrupts) to be captured as good as possible (it's not perfect though).
- Only the RT test suite was modified, but NIL should most probably be adapted accordingly.
- Not all tests of RT have been adapted. Basically any tests, which require CH_DBG_THREADS_PROFILING to be TRUE are not patched, but I might have missed some more.
Code: Select all
#define CH_CFG_ST_RESOLUTION 16
#define CH_CFG_ST_FREQUENCY 100000UL
#define CH_CFG_ST_TIMEDELTA 100
As a side note: The reason why I could not reproduce the issue reliably with the demos before was another patch of mine, which modified the chTimeAddX() function (cf. this discussion). With the new patch, those changes became obsolete, so this is somewhat my replacement for the old patch.
Regards
Thomas
Who is online
Users browsing this forum: No registered users and 43 guests