STM32 Personal Notes – Embedded C Language Optimization

Forgot to excerpt the notes where, summarize. 

[Embedded] C language optimization

Define match variable type

The length of the machine code generated by different data types varies greatly. The smaller the range of variable types selected, the faster the running speed and the faster the [memory] occupied .

For example, if you can use char type variables, you don’t need to use int type to define them; if you can use int type variables, you don’t need to use long int type to define them; if you can use float type, you don’t need to use them. Variables of type unsigned int can be used, so there is no need to use int type to define;

Because some processors handle unsigned more efficiently than signed. Using the float type requires the help of an FPU (floating point unit) or a floating-point operation library, while the int type can be directly operated by the processor.

Note: Do not exceed the scope of the variable after defining the variable. If the assignment exceeds the scope of the variable, the C compiler will not report an error, but the program running result is already wrong at this time. 

Algorithm optimization – short and efficient

Algorithmic optimization refers to the optimization of program time and space complexity.

When programming on a PC, it is generally not necessary to pay too much attention to the length of the program code, but only to consider the function implementation. But on embedded systems, the hardware resources of the system must be considered.

In program design, the algorithm with the shortest code should be considered without affecting the program function.

macro

The function call needs to use the system stack to save data, and the CPU needs to save and restore the current scene when the function is called, and perform push and pop operations on the stack, so the function call actually requires CPU time.

The macro definition is only embedded into the current program as a pre-written code, and does not generate a function call. It only occupies some space, eliminating the need to push parameters on the stack, generate assembly language call calls, return parameters, and execute processes such as return . Thereby increasing the execution speed of the program.

inline assembly

Time-critical parts of the program can be rewritten with inline assembly to improve execution speed. Compilation is difficult, use it with caution.

loop language

In multiple loops, try not to exceed three loops.

In multiple loops, the longest loop should be placed in the innermost layer and the shortest loop should be placed in the outermost layer. This can reduce the number of CPU cross-cut loops.

int i= 0 , j= 0 ;
 for (i;i< 10 ;i++)         //The efficiency is higher than below
{
    for(j;j<30;j++)
    {
    }
}

for (j;j< 30 ;j++)         //The efficiency is lower than the above
{
    for(i;i<10;i++)
    {
    }
}

Infinite loop

while(1) has the same effect as for(;;), looping infinitely.

while(1) after compilation

mov     eax,1
test    eax,eax
je      foo+23h
jmp     foo+18h

for(;;) after compilation

switch (expression)
{
    case value 1 :
        statement 1 ;
         break ;
     case value 2 :
        statement 2 ;
         break ;
    ……
    /* Put low frequency occurrences in the inner switch statement */ 
    default :
         switch (expression)
        {
            case value n:
                statement n;
                break ;
             case value m:
                statement m;
                break;
            ……        
        }
}

for(;;) has few instructions, does not occupy registers, and has no judgment or jump. CPU efficiency is better than while(1).

SWITCH statement

When there are many case labels in a switch statement, in order to reduce the number of comparisons, you can put the condition with relatively high frequency in the first place or convert the entire switch statement into a nested switch statement. Place the frequently occurring case labels in the outermost switch statement, and the less frequently occurring case labels in another switch statement.

For example, the case label with high frequency is placed in the outermost switch statement, and the case label with low frequency is placed in the default (default) inner switch statement.

int fib ( n )
{
    if ( n == 0 || n == 1 )
    {
        return  1 ;
    }else
    {
        return fib( n - 2 ) + fib ( n - 1 );
    }
}

Standard library – avoid using

Using the C language standard library can speed up development, but because the standard library needs to manage all the situations encountered by the user, many standard library codes are large.

For example, the sprintf function of the standard library is very large, and a large part is used to process floating-point numbers. If the program does not need to format floating-point values, the programmer can implement this function with a small amount of code according to the actual situation.

operator evaluation – efficient

Multiplication is more efficient than division, and shifting is more efficient than multiplication.

a=b2 -> a=b<<1;
a=b/4 -> 
a=b>>2;
a=b%8 -> 
a=b&7*;

a=b/88+b%4 -> a=((b>>3)<<3)+(b&3)*;
a=b15 -> 
a=(b<<4)-b*;****

memory allocation

Embedded systems have limited memory capacity due to cost constraints. All variables on the program, including library functions and stacks, use limited memory.

Global/static variables are created before entering the main function, and their life cycle is the entire source program, and they are stored in the global/static storage area.

Local variables, allocated on the stack. It is created when the function is called, and the life cycle is within the function. 

Therefore, local variables should be used as much as possible in the program to improve memory usage efficiency.

The heap size in a program is limited by the amount remaining after all global data and stack space have been allocated. If the heap is too small, the program cannot allocate memory when needed.

Therefore, after using the malloc function to apply for memory, you must use the free function to release the memory to prevent memory leaks and cause memory fragmentation.

Use Memoization to avoid recursive recalculation

Consider the Fibonacci problem, which can be solved by a simple recursive method.

int calc_fib ( int n )
{
    int val[ n ] , i;
    for ( i = 0; i <=n; i++ )
    {
        val[ i ] = -1; 
    }
    val[ 0 ] = 1; 
    val[ 1 ] = 1; 
    return fib( n , val );
}

int fib( int n , int* value )
{
    if ( value[ n ] != -1 )
    {
        return value[ n ]; 
    }else
    {
        value[ n ] = fib( n - 2 , value ) + fib ( n - 1 , value );
    }
    return value[ n ]; 
}

Note: Consider the Fibonacci series starting at 1, so the series looks like: 1, 1, 2, 3, 5, 8…

From the recursion tree, we compute the fib(3) function 2 times and the fib(2) function 3 times. This is a repeated calculation of the same function. If n is very large, the efficiency of the fib function will be relatively low.

Memoization is a simple technique that can be used recursively to speed up computation. The code of the fibonacci function Memoization is as follows:

int calc_fib ( int n )
{
    int val[ n ] , i;
    for ( i = 0; i <=n; i++ )
    {
        val[ i ] = -1; 
    }
    val[ 0 ] = 1; 
    val[ 1 ] = 1; 
    return fib( n , val );
}

int fib( int n , int* value )
{
    if ( value[ n ] != -1 )
    {
        return value[ n ]; 
    }else
    {
        value[ n ] = fib( n - 2 , value ) + fib ( n - 1 , value );
    }
    return value[ n ]; 
}

hardware equipment

For example, the DMA transfer method is used to reduce the number of interrupts, because the DMA transfer does not use the CPU, and the interrupt uses the CPU.

Leave a Comment

Your email address will not be published. Required fields are marked *