SDIOv1/SDMMCv1 hangs system forever on transfer error

Report here problems in any of ChibiOS components. This forum is NOT for support.
tpw_rules
Posts: 5
Joined: Wed Nov 12, 2025 4:35 am
Been thanked: 3 times

SDIOv1/SDMMCv1 hangs system forever on transfer error

Postby tpw_rules » Sat Nov 15, 2025 8:42 pm

Using the ArduPilot project ChibiOS fork on an STM32F765 based Pixhawk 4 Mini flight controller and an SD card prone to errors (in this case created by mechanical vibration), the SDMMCv1 peripheral can signal a CRC error in the middle of a transfer.

If this happens, the peripheral no longer is transferring bytes, so the DMA can never complete and the processor hangs forever here: https://github.com/ChibiOS/ChibiOS/blob ... lld.c#L289 . Only a watchdog reset can recover the system.

I did some additional investigation:
* SDIOv1 is vulnerable to the same problem (it is the only other driver that uses the dmaWaitCompletion function anyway?).
* SDMMCv2 is not vulnerable as it stops DMA unconditionally (SDMMCv1 cannot as the DMA is external and therefore we can't know it's complete by the time the DATAEND interrupt is taken, unlike SDMMCv2 where the reference manual says this is known).

I therefore propose the following patches (attached in one file):
* Unlock the system as soon as possible to avoid a complete hang, instead only hang the calling thread.
* Only wait for DMA completion on the success case defined in the reference manual (note the special check required by the reference manual for a late error). This should only be needed on the read path anyway but I did not want to complicate the patch too much. We properly shut down the DMA in sdc_lld_error_cleanup so no additional code is needed.
* Also turn on interrupts for an undocumented status bit to avoid cases where no other status bits get asserted after a read and the driver sleeps forever.

With these patches I can now repeatedly vibrate the SD card and cause dozens of disconnections/re-connections without hanging the thread or the system. If you are happy with the proposed patches, I can also port them and test them on the SDIOv1 driver.
Attachments
chibios-sdmmcv1-fixes-2025-11-15.patch.zip
(1.98 KiB) Downloaded 5 times

tpw_rules
Posts: 5
Joined: Wed Nov 12, 2025 4:35 am
Been thanked: 3 times

Re: SDIOv1/SDMMCv1 hangs system forever on transfer error

Postby tpw_rules » Sun Nov 30, 2025 6:11 pm

I have updated the SDMMCv1 patches to take the start bit error detection logic from SDIOv1. It turns out that is the mysterious reserved status bit I saw asserted sometimes. I don't know why the reference manual documents it as reserved.

The new patches are attached (and replace the old ones). I also verified that it applies to current ChibiOS master.
Attachments
chibios-sdmmcv1-fixes-2025-11-30.zip
(2.41 KiB) Not downloaded yet

tpw_rules
Posts: 5
Joined: Wed Nov 12, 2025 4:35 am
Been thanked: 3 times

Re: SDIOv1/SDMMCv1 hangs system forever on transfer error

Postby tpw_rules » Sun Nov 30, 2025 6:15 pm

Additionally, attached are the corresponding patches for SDIOv1. It already does the start bit error detection logic on some chips (but not all, this may be another fib on ST's part to correct later). I also incorporated a patch from ChibiOS to move transfer preparation before transfer start that was applied to SDMMCv1. I did not include the special error check, the reference manual does not mention it.

This has been tested in the same situations on an STM32F427 based flight controller and again brings it from hanging to working properly. It also applies to ChibiOS master.
Attachments
chibios-sdiov1-fixes-2025-11-30.zip
(1.82 KiB) Not downloaded yet

User avatar
Giovanni
Site Admin
Posts: 14733
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1156 times
Been thanked: 965 times

Re: SDIOv1/SDMMCv1 hangs system forever on transfer error

Postby Giovanni » Mon Dec 01, 2025 6:19 am

Hi,

It is "in queue", I need to finish some important changes in HAL, among those: waiting loops with timeout capability in HAL, this could be relevant in this problem.

Giovanni

tpw_rules
Posts: 5
Joined: Wed Nov 12, 2025 4:35 am
Been thanked: 3 times

Re: SDIOv1/SDMMCv1 hangs system forever on transfer error

Postby tpw_rules » Tue Dec 02, 2025 2:58 am

Thanks for the information.

I think adding a timeout could be useful for the sleeps in these drivers. I'm not sure it's the best idea for the DMA waits, but could be useful insurance after the proposed adjustments. I am unsure why only this driver needs to do them?

User avatar
Giovanni
Site Admin
Posts: 14733
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1156 times
Been thanked: 965 times

Re: SDIOv1/SDMMCv1 hangs system forever on transfer error

Postby Giovanni » Tue Dec 02, 2025 5:12 am

Loops with timeout will be gradually introduced for all drivers (there are not many drivers doing this anyway), the first use case is the clock initialization, those have been implemented in U0, U3 and H5, others to follow.

The idea is to move the whole HAL and RT toward a concept of functional safety.

About DMA, probably it is a good idea add timeouts in waiting loops, my understanding is that the peripheral stops triggering the DMA and the driver is stuck waiting for it.

Giovanni


Return to “Bug Reports”

Who is online

Users browsing this forum: No registered users and 40 guests