Hi,
I think there is an issue with the FatFS bindings on the STM32H7, and in general for systems witch cache.
I can't seem to get things working without adding the cache invalidate/flush suggested for DMA. However, even doing so did not get past initialization.
Upon debugging in depth, it turns out that FatFS's read/write buffer is not aligned to a cache line, and important internal state variables share the same cache line, so when invalidated before reading the second or third sector the state gets corrupted.
I attached a suggestion for a small fix for it:
- add cacheBufferInvalidateUnaligned/cacheBufferFlushUnaligned
- use them in disk_read/disk_write
Notes:
- I have developed and tested thoroughly on 20.3, and briefly checked that it still works with the master branch
- There are no comments and there are some formatting mismatch.
Thanks!
GB
FatFS interface on H7
- alex31
- Posts: 379
- Joined: Fri May 25, 2012 10:23 am
- Location: toulouse, france
- Has thanked: 38 times
- Been thanked: 62 times
- Contact:
Re: FatFS interface on H7
Hello,
I can confirm that we have unexplained filesystem corruption using SD/FATFS/ChibiOS 20.3 on L1 cache fitted MCU.
Don't know if our problem is related to the bug you mention, should made a try.
Alexandre
I can confirm that we have unexplained filesystem corruption using SD/FATFS/ChibiOS 20.3 on L1 cache fitted MCU.
Don't know if our problem is related to the bug you mention, should made a try.
Alexandre
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: FatFS interface on H7
alex31 wrote:Hello,
I can confirm that we have unexplained filesystem corruption using SD/FATFS/ChibiOS 20.3 on L1 cache fitted MCU.
Don't know if our problem is related to the bug you mention, should made a try.
Alexandre
I think it is related to that, FatFS support is not cache-aware yet. Those unaligned functions are a good idea.
Giovanni
- alex31
- Posts: 379
- Joined: Fri May 25, 2012 10:23 am
- Location: toulouse, france
- Has thanked: 38 times
- Been thanked: 62 times
- Contact:
Re: FatFS interface on H7
Hello geebee,
I have question about your implementation,
You do up to 2 cacheBufferFlush in your invalidate function, can you explain the rational behind that ?
I don't say it's a problem, it's just i want to understand.
I have another concern about the size calculation :
in my opinion, if you want to flush a cross boundary buffer, you'll have to flush all the touched lines
example :
32 bytes memory buffer which is @address 16, 2 cache lines has to be flushed : 0-31 and 32-63, so
a flush (0x0, 64) should be emitted, but your macro will emit a flush(0x, 32) :
compiler explorer snip
(I have replaced uint32_t* by uint64_t* in the snip because compiler explorer runs on 64 bits machine, but of course uint32_t is correct in your initial code for chibios)
Thanks
Alexandre
edited 10/02/21 -- 14h25
I have question about your implementation,
You do up to 2 cacheBufferFlush in your invalidate function, can you explain the rational behind that ?
I don't say it's a problem, it's just i want to understand.
I have another concern about the size calculation :
in my opinion, if you want to flush a cross boundary buffer, you'll have to flush all the touched lines
example :
32 bytes memory buffer which is @address 16, 2 cache lines has to be flushed : 0-31 and 32-63, so
a flush (0x0, 64) should be emitted, but your macro will emit a flush(0x, 32) :
compiler explorer snip
(I have replaced uint32_t* by uint64_t* in the snip because compiler explorer runs on 64 bits machine, but of course uint32_t is correct in your initial code for chibios)
Thanks
Alexandre
edited 10/02/21 -- 14h25
Re: FatFS interface on H7
alex31 wrote:Hello geebee,
I have question about your implementation,
You do up to 2 cacheBufferFlush in your invalidate function, can you explain the rational behind that ?
I don't say it's a problem, it's just i want to understand.
FatFS has something like this (conceptually):
Code: Select all
struct __attribute__((packed)) state_and_buffer {
uint32_t internal_state;
uint8_t buffer[32];
};
If you do not flush, when you f_mount two cache lines are invalidated, one with internal_state and 28 bytes of buffer, and the other with 4 bytes of buffer plus some other stuff. Then buffer is filled in with data from DMA, and FatFS uses that data to compute some value for internal_state.
Now say that you want to open a file and need to read in another sector. You invalidate the same two lines and start off the DMA. But then once the operation is completed and you re-read the cache line in, internal_state is also re-read from SRAM, so the state computed at the previous iteration is discarded.
So flushing before invalidating is needed to ensure that the values written to the variables sharing cache lines with the buffer are not lost.
The reason why it's done up to two times is that you can have that situation at the beginning and at the end of the buffer.
Looking back at my code now, I see a bug when start is aligned but stop is not and the buffer fits inside a cache line. I was trying to be clever with not flushing twice if it's the same cache line, but either we should flush every time anyway, or change the condition to
Code: Select all
...
if ((astop != astart) || (astart == start)) \
cacheBufferFlush(astop, CACHE_LINE_SIZE); \
...
alex31 wrote:I have another concern about the size calculation :
in my opinion, if you want to flush a cross boundary buffer, you'll have to flush all the touched lines
example :
32 bytes memory buffer which is @address 16, 2 cache lines has to be flushed : 0-31 and 32-63, so
a flush (0x0, 64) should be emitted, but your macro will emit a flush(0x, 32) :
compiler explorer snip
(I have replaced uint32_t* by uint64_t* in the snip because compiler explorer runs on 64 bits machine, but of course uint32_t is correct in your initial code for chibios)
That is correct, but also I think that's what my code does. You did not put this piece of the macro in that compiler explorer, which is what would increase astop to the next cache line in that situation:
Code: Select all
if (astop != stop) \
astop += CACHE_LINE_SIZE; \
Thanks for looking things over! The more testing the better.
I'm re-attaching the changes with the bug above fixed. Again, this is really just a starting point as it's lacking comments, proper formatting, and probably clearer variable names.
GB
- Attachments
-
- fatfs_patch.zip
- (3.54 KiB) Downloaded 150 times
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: FatFS interface on H7
I think you need only to flush the two cache lines at boundaries then invalidate all the flash lines covering the whole buffer. The reason is that variables adjacent to the buffer would be corrupted if only the invalidation is performed (unrelated variables would be invalidated, the content is in cache and not yet in RAM).
I see another problem: imagine that the adjacent variables are modified by other threads or ISRs, the values could change after the "flush" and before the "invalidate". Because of this the whole procedure should be performed within a critical section. This could lead to very sporadic and hard to debug errors.
Giovanni
I see another problem: imagine that the adjacent variables are modified by other threads or ISRs, the values could change after the "flush" and before the "invalidate". Because of this the whole procedure should be performed within a critical section. This could lead to very sporadic and hard to debug errors.
Giovanni
Re: FatFS interface on H7
True. The extra stuff at the beginning of the buffer is safe in this specific case, but not in general. And even for this case the linker could put data accessed by other threads after the end of the buffer.
The ideal fix would be for FatFS itself to move the buffer to the beginning of struct FATFS (or add extra fields to align it, but that gets complicated since the fields present depend on various configuration macros), and then we can just use the regular flush/invalidate. I don't see any downsides to that, so we could propose it and eventually it will make its way to ChibiOS.
Until then, we can probably use an additional aligned buffer to do DMA for fatfs_diskio if the MCU has cache. Not ideal, but maybe the MCUs for which this is a problem have enough memory that it's not too big of a problem? At least for our application, we can sacrifice 512 bytes for this, not sure how much it impacts other users.
GB
The ideal fix would be for FatFS itself to move the buffer to the beginning of struct FATFS (or add extra fields to align it, but that gets complicated since the fields present depend on various configuration macros), and then we can just use the regular flush/invalidate. I don't see any downsides to that, so we could propose it and eventually it will make its way to ChibiOS.
Until then, we can probably use an additional aligned buffer to do DMA for fatfs_diskio if the MCU has cache. Not ideal, but maybe the MCUs for which this is a problem have enough memory that it's not too big of a problem? At least for our application, we can sacrifice 512 bytes for this, not sure how much it impacts other users.
GB
Re: FatFS interface on H7
Actually, I was experimenting a bit with things, and at least for our use the following seems to work well, requiring no changes in FatFS, and only requiring to add the (regular, aligned) flush/invalidate to fatfs_diskio.
Basically manually allocate a larger buffer, and incapsulate it into a method to return a pointer to a FATFS, which will adjust the offset:
and then use like
For me, this gets compiled down to a constant function when building for debug, and probably gets fully inlined with optimizations. In any case for most uses the FATFS object is only needed during mounting.
On top of adding the (aligned) flush/invalidate, I would also check in disk_read and disk_write that the buffer is aligned and chSysHalt() otherwise, and in the code near the halt comment on how to properly allocate the buffer/object.
GB
Basically manually allocate a larger buffer, and incapsulate it into a method to return a pointer to a FATFS, which will adjust the offset:
Code: Select all
FATFS *fatfs(void)
{
static __attribute__((aligned(CACHE_LINE_SIZE))) uint8_t _fatfs_buf[sizeof(FATFS) + CACHE_LINE_SIZE];
const size_t offset = offsetof(FATFS, win) % CACHE_LINE_SIZE;
return (FATFS*)(_fatfs_buf + (offset ? CACHE_LINE_SIZE - offset : 0));
}
and then use like
Code: Select all
f_mount(fatfs(), "", 1);
For me, this gets compiled down to a constant function when building for debug, and probably gets fully inlined with optimizations. In any case for most uses the FATFS object is only needed during mounting.
On top of adding the (aligned) flush/invalidate, I would also check in disk_read and disk_write that the buffer is aligned and chSysHalt() otherwise, and in the code near the halt comment on how to properly allocate the buffer/object.
GB
Re: FatFS interface on H7
Thinking some more about it, this problem can potentially affect anything else that uses DMA. For example, if you allocate a small buffer for a SPI transfer that ends up on the same cache line as other global data used by other threads, you could have the same behavior you outlined.
So maybe the correct thing to do would be to check in flush/invalidate that the address is aligned and the size is a multiple of the cache line size. The check could be performed only when CH_DBG_ENABLE_CHECKS is true to not add overhead to release builds.
GB
So maybe the correct thing to do would be to check in flush/invalidate that the address is aligned and the size is a multiple of the cache line size. The check could be performed only when CH_DBG_ENABLE_CHECKS is true to not add overhead to release builds.
GB
- Giovanni
- Site Admin
- Posts: 14455
- Joined: Wed May 27, 2009 8:48 am
- Location: Salerno, Italy
- Has thanked: 1076 times
- Been thanked: 922 times
- Contact:
Re: FatFS interface on H7
The easiest solution is to add a critical section to the unaligned invalidate function. The best solution for performance are aligned buffers.
Giovanni
Giovanni
Who is online
Users browsing this forum: No registered users and 9 guests