"SoC architectures and FPGA
prototyping"
Lab 7 (optional) – Determining efficient C
coding methods for the STM32H7 SoC
This lab’s
assignments will measure execution times required for
-
adding differently defined
constants;
-
the same loop defined using
various C language options;
-
executing two assembly
instructions and their combinations.
in order to find
the most efficient ways to program typical tasks on the STM32H7 SoCs.
Please
make sure that you compile the test codes with the compiler option O0 (no
optimisation) with the only exception that will be mentioned specifically.
For all the assignments please use hardware acceleration options
(cache enable and FPU settings) according to the last digit of your student ID
number
Throughout
this lab you will measure execution time of various code snippets using the
following hardware and software mechanism:
-
ARM provides a 24 bit timer
(system timer) with each of their Cortex M cores; this timer counts the system clock cycles and can trigger an
interrupt when the clock count reaches some set value;
-
in the template project the
system timer’s count is set to the value to trigger the interrupt every 1 ms;
the interrupt service routine increments the value of the global variable that
can be accessed by calling the
subroutine HAL_GetTick(),
for example
sta=HAL_GetTick();
- therefore calling
this subroutine tells the time in ms elapsed since the CPU started executing
the code, and can be used for measuring time intervals;
- as the duration of the clock cycle for the 400 MHz system
frequency is only 2.5 ns, 400,000 instructions
are required to increase the Tick count by 1.
We need to loop every code snippet for a substantial
number
of times. Please use your student ID number for the
loop counts throughout this lab.
The template project is available from BB, please copy
it for every lab assignment, and add your codes strictly between the
// YOUR CODE SHOULD START AFTER
THIS LINE...
// YOUR CODE
SHOULD END BEFORE THIS LINE
placeholders.
Assignment 1. Measuring
execution time for adding differently defined constants
Copy the template
project into a directory Lab7a1, and work in this new
directory.
In
many programs we would like to define parameters that are easy to modify in
order to adapt the code to different requirements.
This can be done
by using #define statements
#define CONST1 17 //
day of the month on which you were born #define
CONST2 1234567 // your student ID number
or,
alternatively, by using some variables for these values
uint32_t c1=CONST1,
c2=CONST2, tmp1;
These
constants can be added to a variable in the following ways that are the same
from the point of view of the programmer but may require different time to
process by a microcontroller.
tmp1 += CONST1;
tmp1 += c1; // for CONST1 tmp1 += CONST2; tmp1 += c2; // for CONST2
You
will measure execution time for these four statements in the loop that uses
your student ID number similarly to the code snippet below
#define CONST1 17 //
day of the month on which you were born #define
CONST2 1234567 // your student ID number
uint32_t c1=CONST1, c2=CONST2, tmp1, ctr;
//snippet for measuring time for tmp1 +=
CONST1
// (you will need to add three more) sta=
HAL_GetTick();
for (ctr=0;
ctr<CONST2; ctr++) { tmp1 += CONST1;
};
fin=HAL_GetTick(); looptime=fin-sta;
printf("Loop time = %d
ms\n",looptime);
// please add three more snippets for the
three other options
After running the
code, write down the execution times to the table
Assignment 2. Measuring execution
time for differently coded loops
Copy the template
project into a directory Lab7a2, and work in this new
directory.
We
use computers because they can do repetitive tasks many times very fast without
any complains or errors. To tell a computer that we want it to repeat
something, loops are used.
It
is possible to use integer and floating point loop counters; it is possible to
use constants or variables to define loops.
Here
we are going to measure time required to execute FOR loops using various
options in order to find out which is the best option for the STM32H7 MCU.
Two different
types of FOR loops you will use are shown below for integer and float variables
for (ctr=0;
ctr<10000000; ctr++); // constants
int
for (ctr=ctrsta;
ctr<ctrfin; ctr+=ctrstep); //
variables int for (fctr=0.;
fctr<10000000.; fctr+=1.); //
constants float
for (fctr=fctrsta; fctr<fctrfin;
fctr+=fctrstep); // variables float
You will need
to run these codes
- for integer ctr, ctrsta, ctrfin, ctrstep (uint32_t)
-for
floating point fctr, fctrsta,
fctrfin, fctrstep (float - add f at the start of the
name to not have errors during compile).
Here is the
snippet of code to run the loop for integer constants
#define CONST1 1234567 //
your student ID number uint32_t ctrsta=0, ctrfin=CONST1, ctrstep=1;
float fctr, fctrsta=0.,
fctrfin=(float)CONST1, fctrstep=1.;
//snippet for measuring time for integer
constants
// (you will need to add three more) sta=
HAL_GetTick();
for (ctr=0; ctr<CONST1; ctr++) {
// empty loop
};
fin= HAL_GetTick(); looptime=fin-sta;
printf("Loop time = %d
ms\n",looptime);
// please add three more snippets for the
three other options
Run the code and
note the execution times.
Change the
compiler optimisation option to O3, compile the code again, and note the
execution times
Assignment 3. Measuring execution
times for assembly instructions and their combinations
Copy the template
project into a directory Lab7a3, and work in this new
directory.
Sometimes
use of assembly instructions is beneficial, and compiler we use allows mixing C
and assembly instructions (but one needs to be careful not to break the
compiled C code by inserting inappropriate
assembly instructions).
This
assignment is dedicated to measurement of the execution time of some assembly
instructions that is complicated by the pipelined and superscalar
microarchitecture of Cortex M7 in the STM32H7 SoC.
In
order eliminate influence of the above mentioned microarchitectural features,
we will measure the execution time within the loop that includes many NOP (no
operation) instructions to flush the pipeline before and after executing the
instructions of interest as follows:
uint32_t tmp1, tmp2, tmp3,
*tmp4=&tmp3;
//snippet for measuring time for two NOPs
// (you will need to add four more
snippets) sta= HAL_GetTick();
for (ctr=0; ctr<1234567; ctr++) { // your ID number
// flush the pipeline before the
instruction to be timed
nop(); nop(); nop(); nop(); nop(); nop();
nop(); nop(); nop(); nop(); nop(); nop();
// here will come the instruction(s) to
be timed
nop(); nop();
// two NOPs
// flush the pipeline after the
instructions to be timed
nop(); nop(); nop(); nop(); nop(); nop();
nop(); nop(); nop(); nop(); nop(); nop();
};
fin= HAL_GetTick();
looptime=fin-sta;
printf("Loop time = %d ms\n",looptime);
// please add four more snippets for the
four other options
( nop() is an intrinsic
instruction - instruction present in the assembly language but not
defined in the standard C; it made accessible through the call to a fictional
subroutine, the compiler will just place this instruction instead calling any
code; note the DUAL underscore _ _ at the start of this subroutine’s name)
You
will need to run this code with two NOPs (as above), with two same memory
access instructions, with two same arithmetic instructions, with the arithmetic
instruction followed by the memory access instruction, and the memory access
instruction followed by the arithmetic instruction (5 code runs).
Select your arithmetic instruction based on the last digit of your
student ID number from the following table
The instruction
you will need to use in the code will be respectively
SUB1:
asm("sub
tmp1,tmp2,tmp3"); SUB2: asm("sub tmp3,tmp2,tmp1");
ADD1: asm("add
tmp1,tmp2,tmp3"); ADD2: asm("add tmp3,tmp2,tmp1");
Select your memory access instruction based on the penultimate digit
of your student ID number from the following table
The instruction
you will need to use in the code will be respectively
LDR1: asm("ldr tmp1,[tmp4]");
LDR2: asm("ldr tmp3,[tmp4]");
STR1: asm("str tmp1,[tmp4]");
STR2: asm("str tmp3,[tmp4]");
Note the execution times in the following table
Lab 7
Throughout the lab the following settings for the
hardware acceleration were used FPU – DP / none
D-cache – enabled / disabled I-cache – enabled/disabled
This loop count was
used throughout the lab:
Assignment 1.
Screenshot of the code snippet presented
in the lab sheet modified with your data
…
Measured execution times in the table
below
Explain the reasons behind the results
you have obtained
…
Assignment 2.
Screenshot of the code snippet presented
in the lab sheet modified with your data
…
Execution times for differently defined
loops
Comment on the measured figures (option
O0) and explain why they are different if they are
…
Comment on the differences between the
execution times obtained using the O0 and O3 compiler options
…
Assignment 3.
Screenshot of the code snippet presented
in the lab sheet modified with your data for arithmetic then memory access
instruction
…
Lab 8 (optional) – Adding HEX5..HEX0 peripheral to the Lab4 Soc
In
the lab 4 you have developed a peripheral that allowed controlling the
LEDR[7]..LEDR[0] and read switches SW[9]..SW[0] on the DE10-Lite FPGA board by
amending the LED2AHB.v module. In the
previous labs we also used the HEX5..HEX0 displays.
This lab is unstructured, and you need to create
another
module that will enable controlling all the a.m. segments. You will need to map
this peripheral to some unused address space, and write a C code to set the
Christmas date on it this way 24.12.20
0 comments:
Post a Comment