Stack issues with fast interrupts Topic is solved

Report here problems in any of ChibiOS components. This forum is NOT for support.
geebee
Posts: 33
Joined: Thu Dec 06, 2018 10:22 pm
Has thanked: 4 times
Been thanked: 15 times

Stack issues with fast interrupts  Topic is solved

Postby geebee » Tue Dec 01, 2020 4:49 pm

Hi,

I'm bumping into some weird issues where I see the idle thread's stack overflowing but I can't quite catch it in the act due to watchpoints not working on my setup. I'm on an STM32H7, but I think this is applicable to all ARM v7m.

After testing out everything, it seems that if I switch a fast interrupt to be a regular one (plus some changes to handle it not keeping up) it stops happening, but the two configurations are not exactly equivalent.
Is that possible?

I'm using a debug build with all the various ChibiOS checks and tracing enabled, FILL_THREADS, and no custom hooks. I increased the idle stack size from the default of 16 to 256, and when the fast interrupt is enabled I see the untouched stack space going down to 192 bytes after letting it run for a while, whereas using a regular interrupt I get 364 bytes.

Reading the code, I think the only possibility is that in _port_switch_from_isr, the fast interrupt is triggered before chSchDoReschedule can complete the switch and unlock. So its stack plus the fast interrupt stored registers make it go over the limit.

Does that seem possible? The part that still doesn't make sense to me is that if that was the case I think it would take at most an additional sizeof(port_intctx), which for my build settings should be 100 bytes, but enabling the fast interrupt results in 172 additional bytes of stack space used, and I can't figure out where those other 72 bytes come from.

GB

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Stack issues with fast interrupts

Postby Giovanni » Tue Dec 01, 2020 6:28 pm

Hi,

If you are mixing interrupts that use the OS macros for ISRs and "naked" ISRs you need to make sure that all "naked" ISRs have higher priority than all OS ISRs.

http://www.chibios.org/dokuwiki/doku.ph ... ort_armv7m See "IRQ Priority Ordering".

Is that the case in you application?

Giovanni

geebee
Posts: 33
Joined: Thu Dec 06, 2018 10:22 pm
Has thanked: 4 times
Been thanked: 15 times

Re: Stack issues with fast interrupts

Postby geebee » Tue Dec 01, 2020 8:55 pm

I'm doing something like this (I cut out and adapted some parts, sorry if there are accidental errors here, the actual code compiles and mostly works other than the stack issue):

Code: Select all

void Vector1AC()
{
  //...
  if (condition)
    SCB->ICSR = SCB_ICSR_PENDSVSET_Msk;
}

void PendSV_Handler()
{
  OSAL_IRQ_PROLOGUE();
  chSysLockFromISR();
  chThdResumeI(&threadPtr, (msg_t)0);
  chSysUnlockFromISR();
  OSAL_IRQ_EPILOGUE();
}

int main()
{
  //...
  nvicEnableVector(SAI2_IRQn, 1);
  //...
}


The firmware also uses a number of regular interrupts via ChibiOS (Serial, USB, I2C, etc).

If I understand correctly, the following should be happening:

1) a thread is running
2) a regular interrupt fires (stack: extctx, created by the interrupt)
3) the interrupt completes, and as a result another thread is ready to run (stack: extctx + fake extctx to call _port_switch_from_isr)
4) execution returns to the thread in locked state from the ISR, now running _port_switch_from_isr (stack: extctx)
5) as part of switching out, chSchDoReschedule() calls _port_switch, which effectively pushes the remaining registers on the original thread's stack before switching (stack: extctx, intctx)
6) the stack is switched to the newly running thread's (new threa's stack: extctx, intctx)
7) the new thread's intctx is restored before returning and unlocking

But I suspect one thing that can go wrong is: the fast interrupt is serviced after intctx is pushed in step #5 but before sp is changed in step #6, the first thread's stack will be: exctx, intctx, extctx (from fast interrupt).
Likewise, if it's serviced after sp is changed in #6 but before intctx is restored in #7, the second thread can also end up with extctx, intctx, extctx in stack.
The locking prevents regular interrupts from doing that, but fast interrupts would bypass that mechanism, and so PORT_WA_SIZE should account for that if fast interrupts are used.

Unfortunately without watchpoints it's hard to know for sure, and my theory above does not seem to explain everything, since applying these changes

Code: Select all

--- a/os/common/ports/ARMCMx/compilers/GCC/chcoreasm_v7m.S
+++ b/os/common/ports/ARMCMx/compilers/GCC/chcoreasm_v7m.S
@@ -76,6 +76,7 @@
                 .thumb_func
                 .globl  _port_switch
 _port_switch:
+                cpsid i^M
                 push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
 #if CORTEX_USE_FPU
                 /* Saving FPU context.*/
@@ -170,7 +171,9 @@ _port_switch:
                 /* Restoring FPU context.*/
                 vpop    {s16-s31}
 #endif
-                pop     {r4, r5, r6, r7, r8, r9, r10, r11, pc}
+                pop     {r4, r5, r6, r7, r8, r9, r10, r11}^M
+                cpsie i^M
+                pop     {pc}^M
 



only reduces the maximum stack usage by 56 bytes, still well above where it should be.

GB

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Stack issues with fast interrupts

Postby Giovanni » Tue Dec 01, 2020 11:10 pm

Hi,

I think you nailed the problem, the context switch does not prevent fast interrupts from occurring so you can get an extra extctx pushed in the current thread stack and that is hard to catch.

I cannot put extra space in PORT_WA_SIZE by default because fast interrupts are not a common use case and that would take a lot of space for each thread in the system.

You may allocate enough space for each thread by increasing PORT_INT_REQUIRED_STACK in order to accommodate for an extra extctx.

The problem is that context switch is performed out of exception state in this port so an extctx of a fast interrupt is pushed in PSP (the threads stack), this is not avoidable in this context switch scheme. Blocking fast interrupts like you tried would make those no more "fast", there would be added jitter.

A possible solution would be to introduce an alternative port performing an in-exception context switch like we already have in trunk code for ARMv8-M-ML-TZ but this would take PendSV for itself for IRQ tail context switch (you may use NVIC for more SW-triggered IRQs anyway).

Giovanni

geebee
Posts: 33
Joined: Thu Dec 06, 2018 10:22 pm
Has thanked: 4 times
Been thanked: 15 times

Re: Stack issues with fast interrupts

Postby geebee » Mon Dec 07, 2020 4:28 pm

Agreed, that was definitely not a proposed solution, just a quick test to check my theory about what could be happening.

I think it would be great to find a way for ChibiOS to automatically set the constants to the right value if possible, since figuring out these kinds of issues is quite complicated, at least for me. One way around that would be to have fast interrupt a feature that needs to be explicitly enabled, so for example you'll need something like

Code: Select all

#define CH_CFG_USE_FAST_INTERRUPTS   TRUE


And make the definition of CH_FAST_IRQ_HANDLER depending on that configuration parameter, as well as modify PORT_WA_SIZE to allocate space for an additional extctx if fast interrupts are used.

The easiest way would be to just not define the fast irq macros if the feature is not enabled, but if you are worried about breaking people's projects you could do something like this:

Code: Select all

#if CH_CFG_USE_FAST_INTERRUPTS
#define CH_FAST_IRQ_HANDLER(id) PORT_FAST_IRQ_HANDLER(id)
#else
void fast_isr_error(void) __attribute__((error("must enable CH_CFG_USE_FAST_INTERRUPTS"),noinline));
#define CH_FAST_IRQ_HANDLER(id) \
  void id##_trigger(void) __attribute__((used)); \
  void id##_trigger(void) {fast_isr_error();}        \
  PORT_FAST_IRQ_HANDLER(id)
#else
#endif


and then in some C file define fast_isr_error as anything nonempty

Code: Select all

#if !CH_CFG_USE_FAST_INTERRUPTS
void fast_isr__error(void)
{
  chSysHalt("");
}
#endif


which will result in an error like this when building with gcc or clang/llvm:

Code: Select all

Linking build/ch.elf
src/main.cpp: In function 'STM32_SAI2_HANDLER_trigger':
src/main.cpp:268:1: error: call to 'fast_isr_error' declared with attribute error: must enable CH_CFG_USE_FAST_INTERRUPTS


(note: for clarity I left out the whole part checking for C++ and defining fast_isr_error as extern "C" if so).


With that said, I'm still having some trouble fully understanding what's going on with the idle thread. I should say that in practice for me all the problems are with the idle thread, because we monitor stack usage and size the work areas accordingly, so whatever is not accounted in PORT_WA_SIZE gets added in the number we pass for WA size.

Anyway, when inspecting the bottom of the stack for the idle thread (top of the working area), I see this (sorry for the lack of fixed width, I wasn't sure how to have fixed width and highlighting):

0x200025d8 <ch_idle_thread_wa+400>: 0x55555555 0x080023a1 0x00000000 0x0800041f
0x200025e8 <ch_idle_thread_wa+416>: 0x00000000 0x2000311c 0x00000000 0x00000000
0x200025f8 <ch_idle_thread_wa+432>: 0xffffffff 0x0800040f 0x080023a0 0x61030000
0x20002608 <ch_idle_thread_wa+448>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20002618 <ch_idle_thread_wa+464>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20002628 <ch_idle_thread_wa+480>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20002638 <ch_idle_thread_wa+496>: 0x00000000 0x00000000 0x00000000 0xffffffff
0x20002648 <ch_idle_thread_wa+512>: 0x00000000 0x08000405 0x200028dc

where

Code: Select all

(gdb) info line *0x08000405
Line 198 of "deps/tl-chibi/ChibiOS/os/common/ports/ARMCMx/compilers/GCC/chcoreasm_v7m.S"
   starts at address 0x8000404 <_port_thread_start+4> and ends at 0x8000406 <_port_thread_start+6>.


and 0x0800040f is actually the infinite loop. So if I understand correctly there is a whole additional intctx left over from PORT_SETUP_CONTEXT, which in this case is useless since we never return from the infinite loop. If there is no good way to get rid of that, it should probably be accounted for in PORT_IDLE_THREAD_STACK_SIZE.

Also I realized I never said it earlier, I'm working with ChibiOS version 20.3 stable.

Thanks!

GB

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Stack issues with fast interrupts

Postby Giovanni » Mon Dec 07, 2020 8:24 pm

Hi,

The intctx in the idle thread cannot be avoided, it is its own context so when "idle" is not the current thread there is an intctx pushed in its stack.

I will look into that fast interrupts thing.

Giovanni

geebee
Posts: 33
Joined: Thu Dec 06, 2018 10:22 pm
Has thanked: 4 times
Been thanked: 15 times

Re: Stack issues with fast interrupts

Postby geebee » Mon Dec 07, 2020 11:24 pm

I think I see what's going on. I was thrown off by the address of port_thread_start at the very beginning of the stack, so I assumed somehow there was some unwanted padding at the beginning, but it seems that that is just a value left over from the very beginning in PORT_SETUP_CONTEXT, simply because it ends up in the space of port_extctx's "reserved", and the idle thread doesn't touch the stack, so that value just stays like that indefinitely.

I also think I didn't count the usage correctly in my initial message, so to confirm with fast interrupts I see at most 328 additional bytes used, corresponding to 2x port_extctx, 1x port_intctx, and an extra 20 bytes which I'm not sure exactly where they come from but should we covered by the other parameters when adding an extra sizeof(port_extctx) to PORT_WA_SIZE when fast interrupts are enabled.

Thanks for looking into this!

GB

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Stack issues with fast interrupts

Postby Giovanni » Sat Feb 13, 2021 10:18 am

bump

geebee
Posts: 33
Joined: Thu Dec 06, 2018 10:22 pm
Has thanked: 4 times
Been thanked: 15 times

Re: Stack issues with fast interrupts

Postby geebee » Mon Mar 01, 2021 4:17 am

This is what I ended up changing in ChibiOS:

Code: Select all

--- a/os/common/ports/ARMCMx/chcore_v7m.h
+++ b/os/common/ports/ARMCMx/chcore_v7m.h
@@ -502,9 +502,18 @@ struct port_context {
  * @brief   Computes the thread working area global size.
  * @note    There is no need to perform alignments in this macro.
  */
+^M
+#if CH_CFG_USE_FAST_IRQS^M
+#define PORT_WA_CTX_SIZE (sizeof (struct port_intctx) +                     \^M
+                          sizeof (struct port_extctx) +                     \^M
+                          sizeof (struct port_extctx))^M
+#else^M
+#define PORT_WA_CTX_SIZE (sizeof (struct port_intctx) +                     \^M
+                          sizeof (struct port_extctx))^M
+#endif^M
+^M
 #define PORT_WA_SIZE(n) ((size_t)PORT_GUARD_PAGE_SIZE +                     \
-                         sizeof (struct port_intctx) +                      \
-                         sizeof (struct port_extctx) +                      \
+                         PORT_WA_CTX_SIZE +                                 \^M
                          (size_t)(n) +                                      \
                          (size_t)PORT_INT_REQUIRED_STACK)
 
diff --git a/os/rt/include/chsys.h b/os/rt/include/chsys.h
index 74cbf04e..39f62261 100644
--- a/os/rt/include/chsys.h
+++ b/os/rt/include/chsys.h
@@ -149,7 +149,9 @@
  *
  * @special
  */
+#if CH_CFG_USE_FAST_IRQS == TRUE^M
 #define CH_FAST_IRQ_HANDLER(id) PORT_FAST_IRQ_HANDLER(id)
+#endif^M
 /** @} */
 
 /**


And with this by default in chconf

Code: Select all

#if !defined(CH_CFG_USE_FAST_IRQS)
#define CH_CFG_USE_FAST_IRQS                FALSE
#endif


which then can be changed or defined elsewhere to TRUE to enable fast interrupt (and consequent increased work area sizes). That's not a complete solution, for example the other files in os/common/ports definitely need to change too. I'm not sure if this is the way you plan on doing it, but I hope it's useful at least as a starting point.


While you look at this, one thing I found very useful when debugging this issue is adding some fixed number of bytes for each work area, which then can be used later to check if the work area was overflown. I ended up still using it for tests, because it allows to get a good idea of the proper work area size, by e.g. increasing all work areas by 256 bytes with filling enabled, and then periodically checking that at least 256 bytes are still at the initial fill value.

Without adding an extra padding during testing, and with "tight" work areas, I often ended up with the problem of a number of registers (usually floating point) being filled with the initial fill value which ends up being saved right at the beginning of the work area, making it seem like there was no overflow, but actually some values before the work area were getting corrupted.

Thank you!

GB

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: Stack issues with fast interrupts

Postby Giovanni » Mon Mar 01, 2021 7:51 am

Hi,

You can increase PORT_INT_REQUIRED_STACK, it is added to all working areas.

About the fix, it looks OK but the option switch should not be CH_CFG_ because this is a Cortex-M port detail, not something needed for all architectures, it should become PORT_CFG_ etc.

Giovanni


Return to “Bug Reports”

Who is online

Users browsing this forum: No registered users and 18 guests