[DONE]inter procedural optimisation

ChibiOS public support forum for all topics not covered by a specific support forum.

Moderators: tfAteba, barthess, RoccoMarco, lbednarz, utzig

szmodz
Posts: 11
Joined: Thu Jul 11, 2013 1:07 am
Been thanked: 1 time

Re: [DONE]inter procedural optimisation

Postby szmodz » Mon Sep 09, 2013 7:15 pm

This doesn't look right though:

Code: Select all

/*
 * Area fill code, it is a macro because here functions cannot be called
 * until stacks are initialized.
 */
static void fill32(void *start, void *end, uint32_t filler) {
  uint32_t *p1 = start;
  uint32_t *p2 = end;
  while (p1 < p2)
    *p1++ = filler;
}


The comment doesn't match the actual code. It's a function, not a macro.

And even if it did, I think that clearing the currently in use stack from a naked function is ugly, and just plain wrong, even if it happens to work most of the time. The compiler's documentation explicitly discourages such use.

The crt0 code also violates strict aliasing rules. The compiler doesn't need to care that the memory between &_data and &_edata happens to overlap some global variable, and is permitted to do unexpected things because of that (and it often does). It may happen to work in this case, but...

http://dbp-consulting.com/StrictAliasing.pdf

If you still don't believe me:

Code: Select all

#include <stdio.h>

/* these are provided by default linker sciprts under linux */
extern int __data_start;
extern int _edata;
 
int a = 7;
short b = 1;
int c = 6;

#define barrier() asm volatile("" ::: "memory")

int main(void) {
  short s1 = b;
 
  /*barrier();*/
  int *pa = &__data_start;
  int *pc = &_edata;
  while (pa < pc)
    *pa++ = 0;
  /*barrier();*/
 
  short s2 = b;
 
  printf("s1: %hd, s2: %hd\n", s1, s2); 
 
  return 0;
}



compile using

Code: Select all

gcc -O2 test.c -otest


And see for yourself.

Then, uncomment the two barrier() calls before and after the while loop, and see the change.

In the first case, the program should print:
s1: 1, s2: 1
while in the second:
s1: 1, s2: 0

The compiler's behavior in both cases is correct, even if it's not what you'd expect.

User avatar
Giovanni
Site Admin
Posts: 14704
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1146 times
Been thanked: 960 times

Re: [DONE]inter procedural optimisation

Postby Giovanni » Mon Sep 09, 2013 8:18 pm

Hi,

I agree it is a mistake, it was a macro exactly for that reason, I carelessly applied a series of changes for the LTO support. Fun thing is that it become a macro because a tracked bug.

About the stack clearing, the Reset_Handler function is assumed to not access any variable in RAM. It is a bit of a stretch assuming all automatic variables located in registers but reasonable knowing the compiler and the architecture.

The alternative would be to put the code into an assembler module (and split it for v6m and v7m) or enforce the registers for the various automatic variables (possible with GCC).

Giovanni

szmodz
Posts: 11
Joined: Thu Jul 11, 2013 1:07 am
Been thanked: 1 time

Re: [DONE]inter procedural optimisation

Postby szmodz » Mon Sep 09, 2013 9:42 pm

This would suffice. It doesn't need to be split for v6m and v7m (same source will compile for both), and doesn't rely on undocumented compiler internals (which are not enforced by any contract and therefore subject to change from version to version). _init_process_stack and _start can use the stack freely, and shouldn't be naked.

Code: Select all

.global ResetHandler
.type ResetHandler, %function
.thumb_func
ResetHandler:
               /* Running on main stack.*/
               cpsid   i

#if CRT0_INIT_STACKS
               /* Init process stack while running on the main stack.*/
               bl     _init_process_stack /* C function */
#endif
               /* Switch stacks.*/
               movw    r0, #:lower16:__process_stack_end__
               movt    r0, #:upper16:__process_stack_end__
               msr     PSP, r0
               
               mov     r1, #CRT0_CONTROL_INIT
               msr     CONTROL, r1
               isb
               
               /* Running on process stack.*/
               b       _start /* C function does the rest.*/



Doesn't touch the strict aliasing issue though.

ulikoehler
Posts: 71
Joined: Tue Mar 17, 2015 2:32 am
Location: Munich, Germany
Been thanked: 3 times

Re: [DONE]inter procedural optimisation

Postby ulikoehler » Mon Mar 23, 2015 10:19 pm

@szmodz:

I know this is an I have encountered a similar issue with LTO enabled and -O0 which has been documented here:
https://sourceforge.net/p/chibios/bugs/571/

@Giovanni:

In theory, LTO provides a lot of advantages, however in my experience it is unclear whether they are significant in practice. This paper shows a comparison with some real-world-ish applications: http://arxiv.org/pdf/1010.2196.pdf
The results both concerning size and performance vary - on average they seem to be negative.

Also, the linux kernel devs discussed about the same issue for several years. In 2012, Phoronix also got mixed results:
http://www.phoronix.com/scan.php?page=a ... _lto&num=3

Therefore I believe it is worth persuing but performance/size-critical applications should always benchmark before using/disabling LTO. Maybe the documentation should reflect that. This leaves the question of whether to enable it by default or not.

Best regards, Uli

User avatar
Giovanni
Site Admin
Posts: 14704
Joined: Wed May 27, 2009 8:48 am
Location: Salerno, Italy
Has thanked: 1146 times
Been thanked: 960 times

Re: [DONE]inter procedural optimisation

Postby Giovanni » Tue Mar 24, 2015 8:22 am

Hi,

Most of this discussion was about version 2.6.x where LTO was not supported, there were issues in various bits of code. Everything should be OK now in 3.0, from my tests there is a nice size/speed advantage when using LTO.

Anyway, now it is just a switch in the makefile, measuring advantages/disadvantages is very simple and can be done case by case.

Giovanni


Return to “General Support”

Who is online

Users browsing this forum: No registered users and 12 guests