QSPI register corruption

ChibiOS public support forum for topics related to the STMicroelectronics STM32 family of micro-controllers.

Moderators: RoccoMarco, barthess

steved
Posts: 823
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 12 times
Been thanked: 135 times

QSPI register corruption

Postby steved » Fri Jul 24, 2020 5:42 pm

I have a strange QSPI problem where the symptom is that the QSPI address register occasionally gets set to zero - sometimes immediately after my code had set it to something else.

Sorry its a long post, but lots of information!

Using F767, initially with compiler V7.2.1, latterly with V9.3.1, Chibi 19.1.3 with a selection of updates from SVN, plus the QSPI routines from trunk/V20. No non-Chibi interrupts.

There's more explanation at the end; the key point is that the corruption seems to be happening in a context switch, usually when a "double switch" (preemption) is required.

Probably the clearest example is shown in
QSPI_Trace_60_short.png
.
Steps -12, -4 are the start of a QSPI transaction; set it going, then wait in the idle thread
Steps -11, -3 are the QSPI interrupt which occurs on completion of the transaction
Steps -10, -2 show AR, SR early on in the ISR
Steps -9, -1 show AR, SR after executing the QSPI "end of transaction" macro
Step 0 is an abnormal completion - AR is OK in the ISR exit, but zero in the idle->main context switch

Looking at the code flow from step 0, it's as follows:

Code: Select all

   OSAL_IRQ_EPILOGUE();      // Checks AR; non-zero
   _port_irq_epilogue();
      _port_switch_from_isr()
         chSchDoReschedule()
              thread_t *otp = currp;

              /* Picks the first thread from the ready queue and makes it current.*/
              currp = queue_fifo_remove(&ch.rlist.queue);
              currp->state = CH_STATE_CURRENT;

              /* Handling idle-leave hook.*/
              if (otp->prio == IDLEPRIO) {
               CH_CFG_IDLE_LEAVE_HOOK();      <--- Corruption detected here
              }


I have other examples showing the problem arising where two interrupts occur end to end, without an intervening thread switch. Here the corruption occurs between the start of the QSPI interrupt, and the chSysLockFromISR() immediately before the next trace write.
In these, the corruption is consistently picked up in the OSAL_IRQ_EPILOGUE() macro. On occasion it has been any ISR, not just the QSPI one!

My QSPI usage means that it never sets the QSPI address register to zero after initialisation, and as far as I can tell nor should any on-chip mechanism.

And the puzzling thing is that the corruption appears when Chibi is in control.

Has anyone else encountered something like this? Or any suggestions on how to debug further? Or am I missing something very obvious?


Further explanation and notes
=============================
The example code is in the startup sequence (After the normal halInit() and chSysInit()), with very little other activity (as can be seen from the trace).
I disable caching on all RAM.

The QSPI address register can only be written to when the QSPI is busy, which limits the time when this can happen to a short period between transfer start and transfer complete. So according to the logged status, it shouldn't be possible to update AR.

I have corruption checks in CH_CFG_IDLE_ENTER_HOOK(), CH_CFG_IDLE_LEAVE_HOOK(), CH_CFG_CONTEXT_SWITCH_HOOK(), CH_CFG_IRQ_PROLOGUE_HOOK() and CH_CFG_IRQ_EPILOGUE_HOOK(), as well as immediately after writing to the register.
In the example, CH_CFG_IDLE_LEAVE_HOOK() was triggered.

All interrupts which might be enabled are from normal Chibi drivers.

The detail and frequency of the problem varies as I add and subtract code, and also as I swap between -O0 and -Og. But I can usually trigger the problem.

There's plenty of stack space, and all Chibi debug options are enabled.
Statistics enabled (also tried disabled; no change).
FPU disabled.
The "ready list" threads look good (just main, idle)

No relevant errata on the QSPI from ST (although there's one for other F7 family devices; doesn't change anything).


There is a slight possibility that CAN-related code plays a part; if I strip out all my CAN code, leaving the Chibi-level drivers enabled, the problem still occurs. If I disable the Chibi Drivers, the problem goes away.

I've checked the DMA registers, and there's nothing to suggest that DMA is responsible. (QSPI is the only active user of DMA).


(I have relatively briefly tried both GCC V5.4.1 and GCC V8.3.1 - no crashes at the time, but have changed things a bit since then.)

Above tests done with STM32_WSPI_QUADSPI1_PRESCALER_VALUE 5 (43MHz I think).
I have also tried a few runs with prescaler values of 8 and 11, all of which failed in the same way.

The same problem occurs on two different sets of hardware (essentially an F767 Nucleo plugged into a carrier board which buffers up all the ports).

File hal_wspi_lld_extract.c shows the relevant parts of the LLD, including my debug checks
Attachments
hal_wspi_lld_extract.7z
(1.01 KiB) Downloaded 162 times

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: QSPI register corruption

Postby Giovanni » Fri Jul 24, 2020 9:06 pm

I don't see how the RTOS can change the AR register, it is possible that is the QSPI itself clearing it after entering a strange state, the CPU is not really writing it i think.

If you want to rule out the RTOS then you could try doing an RTOS-less test.

Fiovanni

steved
Posts: 823
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 12 times
Been thanked: 135 times

Re: QSPI register corruption

Postby steved » Sat Jul 25, 2020 11:39 am

Giovanni wrote:I don't see how the RTOS can change the AR register, it is possible that is the QSPI itself clearing it after entering a strange state, the CPU is not really writing it i think.

I agree with the premise. Especially as another scenario I have often seen involves the QSPI starting another transaction on its own; so that it's already busy when my code next tries to start a transfer. Probably that's also what happens here.
However, as always with these strange things, I wonder why noone else appears to have seen this; people have definitely been using QSPI. I'm basically using standard ChibiOS in this area (added debug code excepted); I am using a different flash chip, so a slightly different driver, modelled on the one in Chibios, and that appears solid (not that much to change; just enough to be annoying).

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: QSPI register corruption

Postby Giovanni » Sat Jul 25, 2020 3:54 pm

QSPI is a pure master, I don't see ho the flash chip type can affect its operations.

It could be something electrical in nature causing glitches somehow, have you tried lowering QSPI clock frequency?

Giovanni

steved
Posts: 823
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 12 times
Been thanked: 135 times

Re: QSPI register corruption

Postby steved » Sat Jul 25, 2020 4:16 pm

Giovanni wrote:QSPI is a pure master, I don't see ho the flash chip type can affect its operations.
I agree
Giovanni wrote:It could be something electrical in nature causing glitches somehow, have you tried lowering QSPI clock frequency?
Tried several lower clock frequencies; also two sets of hardware, bench PSU instead of "normal" dc-dc converter....
I suppose it might not like being connected to my PC via USB (embedded ST-link) and serial, but then difficult to see what's going on!

steved
Posts: 823
Joined: Fri Nov 09, 2012 2:22 pm
Has thanked: 12 times
Been thanked: 135 times

Re: QSPI register corruption

Postby steved » Tue Mar 22, 2022 4:28 pm

Finally revisiting this project, to discover that ST have recently released an updated Errata (https://www.st.com/resource/en/errata_s ... ronics.pdf). Quite a lot of new stuff, with 2.4.4 having similarities to my problem:
Memory-mapped access in indirect mode clearing QUADSPI_AR register
Description
Memory-mapped accesses to the QUADSPI peripheral operating in indirect mode unduly clear the QUADSPI_AR
register to 0x00.
Workaround
Adopt one of the following measures:
• Avoid memory-mapped accesses to the QUADSPI peripheral operating in indirect mode.
• After each memory-mapped access to the QUADSPI operating in indirect mode, write the QUADSPI_AR
register with a desired value

I've assumed that 'Memory-mapped accesses to the QUADSPI peripheral' just means 'normal QSPI peripheral register access'.

Shouldn't be the problem (the Chibi wspi driver writes the address register last of all), and I suspect I'm chasing an out of bounds write elsewhere which just happens to affect QSPI. But wondered if anyone else had encountered this erratum in the wild.

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: QSPI register corruption

Postby Giovanni » Tue Mar 22, 2022 6:47 pm

Hi,

"Memory-mapped access in indirect mode" does not make much sense, my understanding is that indirect mode is what you do when you are not in memory mapped mode.

Perhaps a memory access to OSPI memory area while it is not in memory mapped mode has that register clearing as a side effect? if this is the case then our driver should not be impacted by this.

Giovanni

rew
Posts: 380
Joined: Sat Jul 19, 2014 12:59 pm
Has thanked: 2 times
Been thanked: 13 times

Re: QSPI register corruption

Postby rew » Tue Apr 05, 2022 10:34 am

The way I read it, you can also map a memory region as memory in the chip connected through QSPI. i.e. you just access memory at 0x...something and the hardware will push out a QSPI read cycle, wait for the results and sends off the CPU with the result when its done. Loads of waitstates, but for scattered memory locations accessed only once maybe a good way to operate.

The way I read it you cannot access the memory mapped region if you're also using the register access version. Now as you're obviously not consciously using the memory mapped region, it is probably disabled... The erratum is hinting at that this doesn't matter. So a random pointer into the mmapped-QSPI device may trigger the bug.

I know my lowly M0 already has an MMU. Surely yours has too. Can you use the MMU to map that memory as "offlimits" ? Then you should get an exception the moment anybody accesses that memory region.

Worst case, there is another erratum that triggers the clearing of the AR register in some other way. But the actual trigger is still unknown....

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: QSPI register corruption

Postby Giovanni » Tue Apr 05, 2022 12:25 pm

The driver does not touch registers while in memory mapped mode, so this thing should have no effect.

Giovanni

rew
Posts: 380
Joined: Sat Jul 19, 2014 12:59 pm
Has thanked: 2 times
Been thanked: 13 times

Re: QSPI register corruption

Postby rew » Tue Apr 12, 2022 10:22 am

He is not in memory mapped mode. Or at least trying not to use it.

Another way to debug this might be with data breakpoints. Not sure if you can trigger on a region... A quick research session hints at: no ranges allowed.


Return to “STM32 Support”

Who is online

Users browsing this forum: No registered users and 16 guests