Data blocking in the C2000 MCU compiler explained

At some point, C2000TM microcontroller (MCU) compiler users might have encountered scenarios where holes are observed within memory sections, or the linker is unable to place a section even though the available memory is larger than the section. These are likely a result of data blocking[1], which is a side-effect of data page pointer (DP) load optimization – an optimization automatically performed by the compiler. Before explaining DP-load optimization and data blocking, it might be helpful to explain how the compiler supports direct accesses to scalar-type variables and members of aggregate-type variables.

In direct addressing mode, an instruction is able to directly access data memory within the current 64-word data page stored in the DP register. The compiler generally prefers direct addressing mode over other addressing modes because direct accesses free up the other CPU registers for use by other instructions. 

 Figure 1: Direct and Indirect addressing modes for accessing a[10]
 
Consider the following sample C source that includes two accesses to the global “array1”.

Figure 2: Sample source with constant accesses to array "array1"

 

Without DP-load optimization, the compiler will conservatively issue a DP-load instruction before each direct access to (members of) a global variable.

Figure 3: Assembly file generated after compiling Figure 2 with DP-load optimization disabled (using compiler option --disable_dp_load_opt)

However, additional DP-load instructions are unnecessary when the current DP is still valid. Unnecessary DP-loads negatively impact code size and performance. Consider the (simplistic and probably unlikely) case of consecutive direct accesses to 64 words that are allocated on the same data page. These accesses will cost 63 bytes of redundant DP-load instructions in the .text section and 63 unnecessary cycles executing these redundant load instructions. These costs will increase proportionally with the number of direct accesses. 

Figure 4: Assembly file generated after compiling Figure 2  with DP-load optimization enabled (default behavior of the compiler)

Our compiler mitigates these performance costs by blocking global variables. Blocked variables must either fit entirely within a data page, or be page-aligned (i.e., start on a page boundary). Given this restriction, the compiler knows that direct accesses to the first 64 words of a blocked variable are valid using the same DP. A combination of data blocking and knowledge about relative placement of variables allows the compiler to ascertain whether an existing DP is valid for a new direct access and issue DP loads only when necessary. Reducing the number of DP-loads saves code size and improves performance. However, the compiler must adjust placement of variables to satisfy blocking requirements.  In particular, the compiler must align variables that span multiple pages to the start of a new page. This restriction on the placement of blocked variables can introduce holes within data sections and deciding the order to allocate a set of variables to minimize holes is a difficult problem. Currently, the C2000TM MCU compiler attempts to limit the holes by sorting variables by increasing size. If you examine the map file of a program with global variables, you will observe that smaller variables get allocated before larger ones. Note that the assembler sets the layout of variables within a section. The linker is unaware of these holes; therefore, it is unable to fill the holes.

In addition, the linker must also respect section blocking requirements when placing blocked sections in memory. Sections containing blocked variables must be blocked. That is, the section must either fit entirely within a data page, or be page-aligned. Blocking relevant sections guarantees the validity of assumptions that the compiler made while optimizing DP-loads. That is, the variables contained within blocked sections will not span a page boundary except when the variable is larger than a page. Blocked sections can cause a linker placement failure even when you have sufficient memory.

The C2000 MCU compiler performs data blocking and DP-load optimization by default. You can verify that a variable or section is blocked by examining the .bss and .usect directives in the assembly listing file and checking that the blocking flag is set. The blocking flag is documented in the TMS320C28x Assembly Language Tools User's Guide under "Uninitialized Sections".

While blocking offers opportunities for optimization of DP-loads, the data memory penalties might be too costly for your application. In such cases, we offer the following options for limiting the degree of data blocking or disabling it altogether:

  1. Use #pragma DATA_SECTION or #pragma SET_DATA_SECTION to place one or more global variables in separate user-defined sections. Variables in the same section will still be blocked and can benefit from DP-load optimization. The pragmas must appear at every definition and declaration of the variables that they apply to. Keep in mind, however, that DP-load optimization is not possible across sections. This approach will allow you decide on how much optimization/blocking is performed and give the linker better opportunities to reduce fragmentation. For examples on using the DATA_SECTION and SET_DATA_SECTION pragmas, please see the TMS320C28x Optimizing C/C++ Compiler User’s Guide.
  2. Group global variables in a structure. While the structure will still be blocked, the compiler is not allowed to introduce additional holes within the structure and DP-load optimization will still be performed for accesses to elements of the structure.
  3. Use the --disable_dp_load_opt compiler option to disable data blocking and DP-load optimization for your files, or a subset of them containing data that will likely not benefit from DP-load optimization. For example, most array accesses are of the form “array[i]”, which are indirect accesses that do not use the DP and will not benefit from DP-load optimization. Accesses to constant locations such as “array [16]” use the DP, but such accesses are rare in practice. Therefore, placing arrays in a separate file that is built with --disable_dp_load_opt can reduce holes without impacting performance. Accesses to scalars and structures continue to be optimized.

Linker placement failures that occur when placing blocked sections may be resolved by examining the placement of memory regions and sections in the application’s linker command file. Recall that a blocked section must either fit within a page or be page aligned. Therefore, placement errors might occur when the start address and size of memory regions do not take into account holes that might be introduced by blocked sections. In such cases, try adjusting the start address of the relevant memory regions to account for blocked sections. 

 Figure 5a: .dataBufferA must be page aligned and will not fit in RegionA          Figure 5b: RegionA is page aligned. .dataBufferA fits in RegionA

Please post questions about data blocking in the TI E2E C/C++ Compiler Forum



[1] Alignment specifications for variables and sections could also cause these issues.