tutorial : place and run code in ram

ChibiOS public support forum for topics related to the STMicroelectronics STM32 family of micro-controllers.

Moderators: RoccoMarco, barthess

User avatar
alex31
Posts: 374
Joined: Fri May 25, 2012 10:23 am
Location: toulouse, france
Has thanked: 38 times
Been thanked: 61 times
Contact:

tutorial : place and run code in ram

Postby alex31 » Tue Aug 10, 2021 3:10 pm

Hello,

I made some experiments to place functions in RAM, using GCC, and eventually achieved my goal, so I give some information here.

I have used a NucleoH743 board, but the examples can be adapted to any MCU that is able to execute code from ram.

Oh, I just achieved to made it, so I'm not a linker script specialist, so do not hesitate to show me errors, or inaccuracies, I'll edit the post accordingly.

First, why one wants to run code in RAM ?

among many interest :

1/ performance since TCM ram has fastest access than flash, especially if the buses are heavily used by DMA.
2/ for some STM32 family which has plenty of ram but few flash, it can be interesting to load code from external memory in ram and execute it.
3/ run a bootloader that will write in the flash


We will do 3 things :

a/ place and run a function in ITCM ram

b/ place the ISR vector table in ITCM ram

c/ place all the hal functions that are called from ISR in ITCM ram

I will explain b and c step in next post.

note that to keep examples simple, constant data that are used by functions in ITCM will stay in flash, but it’s possible to also relocate theses constants in DTCM ram if needed.

A/ place and run a function in ITCM ram

To do that, there are 3 steps :

a1 : declare a section in the linker script that will locate the object in ITCM, but store it in flash
a2 : at startup, copy the code from flash to ITCM ram
a3: in the source file , use attribute to specify that the code must fit ITCM



A1 : declare a section in the linker script that will locate the object in ITCM, but store it in flash

One must modify the linker script, so first copy os/common/startup/ARMCMx/compilers/GCC/ld/STM32H743xI.ld to your cfg directory, and modify your makefile to use your linker script file instead of the distrib one



in your STM32H743xI.ld file:

after REGION_ALIAS("PROCESS_STACK_RAM", ram5), add :

Code: Select all

/* RAM region to be used for hot code executed with zero wait state access */
REGION_ALIAS("ITCM_RAM", ram6);


at the end, after INCLUDE rules_memory.ld, add :


Code: Select all

SECTIONS
{
   __itcm_flash_base__ = LOADADDR(.itcm_text);
  .itcm_text : ALIGN_WITH_INPUT
  {
        . = ALIGN(4);
        PROVIDE(__itcm_text_base_ram__ = .);

   /* *(.ramfunc.$ITCM_RAM) */
        KEEP(*(.itcm_text*))
        KEEP(*(.bss.__itcm_text_*))
   KEEP(*lld.o (.text*))
        . = ALIGN(4);
        PROVIDE(__itcm_text_end_ram__ = .);
  } >ITCM_RAM AT> TEXT_FLASH
}

A2 : at startup, copy the code from flash to ITCM ram

The copy must be done early, I choose to put the code in __early_init, in the file board.c (that must be part of your project, don’t modify the distrib one), but perhaps there is a better place to do it ?

So I call a function in __early_init :

Code: Select all

void __early_init(void) {
  copy_flash_to_itcm();
  stm32_gpio_init();
  stm32_clock_init();
}


and at the end of board.c, add the function :

Code: Select all

void copy_flash_to_itcm(void) {
  extern const unsigned char __itcm_flash_base__;
  extern unsigned char __itcm_text_base_ram__;
  extern unsigned char __itcm_text_end_ram__;
 
  size_t itcm_text_size = (size_t) (&__itcm_text_end_ram__ -
                &__itcm_text_base_ram__);
 
  memcpy(&__itcm_text_base_ram__, &__itcm_flash_base__, itcm_text_size);
}

you ‘ll have to #include <string.h> at the start of board.c to use memcpy

A3 : in the source file , use attribute to specify that the code must fit ITCM

the easiest for the end : where you want to declare your “hot function” :

Code: Select all

__attribute__((section(".itcm_text")))
void testITCMfun2(void) 
{
  chprintf(chp, "addr of fun2 is 0x%x", (uint32_t) &testITCMfun2);
}

when called, the function will print its own address, hopefully in ITCM section :-)

Alexandre

User avatar
alex31
Posts: 374
Joined: Fri May 25, 2012 10:23 am
Location: toulouse, france
Has thanked: 38 times
Been thanked: 61 times
Contact:

tutorial : place ISR in ram

Postby alex31 » Tue Aug 10, 2021 6:14 pm

In this second part, we’ll see how to reduce ISR latency by putting vector table and ChibiOS ISR function in ITCM. (Q: vector table is not code, so should it better fit in DTCM ?)

On the STM32, ISR vector table is not at a fixed address, a register (SCB->VTOR)
indicates the start of the table (must be 512 bytes aligned), so the table can be in flash or in ram.

There is already a section for the vector table in standard chibios linker script files, and Chibios take care of setting the SCB→VTOR value accordingly, so relocating the vector table is simple, just a minor change in the linker script, and the needed copy from flash to ram at startup.

The vectors section is in a file named rules_code.ld, so again, you’ll have to copy it in your project cfg directory, and the top linked script will have to include this local rules_code.ld instead of distrib one.

B/ place the ISR vector table in ITCM ram


Code: Select all

cp os/common/startup/ARMCMx/compilers/GCC/ld/rules_code.ld your_project_path/cfg/rules_code_alt.ld

edit local top-level linker script (STM32H743xI.ld in my case) and change

Code: Select all

INCLUDE rules_code.ld

by

Code: Select all

INCLUDE cfg/rules_code_alt.ld


then edit rules_code_alt.ld and change vector section by this one :

Code: Select all

   __itcm_vectors_base_flash__ = LOADADDR(.vectors);
    .vectors : ALIGN(1024)
    {
   PROVIDE(__itcm_vectors_base_ram__ = .);
        KEEP(*(.vectors))
   PROVIDE(__itcm_vectors_end_ram__ = .);
    } > ITCM_RAM AT > VECTORS_FLASH_LMA


then edit board.c to copy the table from flash to ram

Code: Select all

void copy_flash_to_itcm(void) {
  extern const unsigned char __itcm_flash_base__;
  extern unsigned char __itcm_text_base_ram__;
  extern unsigned char __itcm_text_end_ram__;
 
  size_t itcm_text_size = (size_t) (&__itcm_text_end_ram__ -
                &__itcm_text_base_ram__);
 
  memcpy(&__itcm_text_base_ram__, &__itcm_flash_base__, itcm_text_size);


  extern const unsigned char __itcm_vectors_base_flash__;
  extern unsigned char __itcm_vectors_base_ram__;
  extern unsigned char __itcm_vectors_end_ram__;

  size_t itcm_vectors_size = (size_t) (&__itcm_vectors_end_ram__ -
                   &__itcm_vectors_base_ram__
                   );
  memcpy(&__itcm_vectors_base_ram__, &__itcm_vectors_base_flash__,
    itcm_vectors_size);
}

now the vector interrupt table is in ITCM, but all the vectors will point to ISR in flash.


C/ place all the hal functions that are called from ISR in ITCM ram

One can set function in ITCM with attribute in source code as seen in first post, but we can choose to put the entire object file content in ITCM at the linker script level.

Chibios ISR are all in lld file, so in this example, we choose to relocate all *lld* file in ITCM, but you can have a finest granularity and put only some lld and not all.

First edit STM32H743xI.ld and change the last section by this one :

Code: Select all

SECTIONS
{
   __itcm_flash_base__ = LOADADDR(.itcm_text);
  .itcm_text : ALIGN_WITH_INPUT
  {
        . = ALIGN(4);
        PROVIDE(__itcm_text_base_ram__ = .);

   /* *(.ramfunc.$ITCM_RAM) */
        KEEP(*(.itcm_text*))
        KEEP(*(.bss.__itcm_text_*))
        KEEP(*lld.o (.text*))
        . = ALIGN(4);
        PROVIDE(__itcm_text_end_ram__ = .);
  } >ITCM_RAM AT> TEXT_FLASH
}


the line KEEP(*lld.o (.text*)) says that all the files *lld.o belong to this section, but unfortunately, this is not sufficient, because .text* of all object files have been already included previously.
We ‘ll have to edit rules_code_alt.ld to exclude the *lld file from being stored in flash :

remplace the .text section by this one :

Code: Select all

 .text : ALIGN_WITH_INPUT
    {
        __text_base__ = .;
   *(EXCLUDE_FILE(*lld.o) .text*)
        *(.glue_7t)
        *(.glue_7)
        *(.gcc*)
        __text_end__ = .;
    } > TEXT_FLASH AT > TEXT_FLASH_LMA

the EXCLUDE_FILE line will do the job.

You can now recompile, flash, and using gdb verify that now all the function in ldd are located in ITCM.

I will join a complete simple example archive to this post in a near future.

Any suggestion and correction is welcome !

Alexandre

electronic_eel
Posts: 77
Joined: Sat Mar 19, 2016 8:07 pm
Been thanked: 17 times

Re: tutorial : place and run code in ram

Postby electronic_eel » Wed Aug 11, 2021 9:09 pm

Thank you for posting this.

I plan to use the RP2040 in an upcoming project. Since it has no internal flash but lot's of RAM, copying code to ram is even more important there. So your tutorial will most probably help me to implement it there.

When I did modifications to the ChibiOS linker scripts and initialization code in the past, I found it not really geared towards customization. The level at which it could be customized was coarse, so you had to copy big chunks of code and could not just customize exactly the part you want and keep the rest in the original ChibiOS repository. I hope that this could be improved in the future. I plan to post patches once I have used it a few times and get a better idea how this could be achieved.

User avatar
alex31
Posts: 374
Joined: Fri May 25, 2012 10:23 am
Location: toulouse, france
Has thanked: 38 times
Been thanked: 61 times
Contact:

Re: tutorial : place and run code in ram

Postby alex31 » Tue Aug 17, 2021 9:10 pm

Hi,

unfortunately, it seems that a GCC limitation breaks the use of placing object files in linker script (KEEP and EXCLUDE_FILE directive) when using LTO.

not exactly the same thing than https://stackoverflow.com/questions/34709001/force-gcc-to-keep-section-when-using-link-time-optimization but closely retated I think ...

Alexandre

User avatar
Giovanni
Site Admin
Posts: 14444
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1074 times
Been thanked: 921 times
Contact:

Re: tutorial : place and run code in ram

Postby Giovanni » Tue Aug 17, 2021 9:24 pm

It is because global inlining, code can migrate across modules, you need to disable LTO for this kind of things.

Giovanni

electronic_eel
Posts: 77
Joined: Sat Mar 19, 2016 8:07 pm
Been thanked: 17 times

Re: tutorial : place and run code in ram

Postby electronic_eel » Tue Aug 17, 2021 10:19 pm

From my understanding the problem is compiling with LTO, that prevents the compiler emitting code with the right section attributes. So while you can still have the rest of the code using LTO, the functions that should get special treatment (like copy-to-ram) must be compiled without. Implementing that means you want just the copy-to-ram functions in a separate .c file and some way in the build system to apply the LTO compile option to these files.

Seems like a lot of extra work ahead...

rew
Posts: 380
Joined: Sat Jul 19, 2014 12:59 pm
Has thanked: 2 times
Been thanked: 13 times

Re: tutorial : place and run code in ram

Postby rew » Mon Aug 23, 2021 5:38 pm

When I enable "LTO", I see the compiler emitting complete gibberish that the linker somehow seems to understand and is able to process. I'm not sure if you can mix LTO and non-LTO functions.


Return to “STM32 Support”

Who is online

Users browsing this forum: No registered users and 12 guests