AM2754-Q1: C7x Optimized library alignment requirements

SREEKANTH SREEKUMAR

Part Number: AM2754-Q1
Other Parts Discussed in Thread: MATHLIB, FFTLIB

Tool/software:

Hello Team,

I was working on creating wrappers on top of FFTLIB, DSPLIB, MATHLIB kernels for a C7x optimized application to be run on AM275 EVM. I have followed the test drivers in the respective library and can't help notice that the processing buffers are explicitly aligned to 128-byte/64-byte during the allocation/setup as described below.

FFTLIB

In the test driver source freertos_sdk_am275x_11_00_00_16/source/fftlib/test/fft_c7x/FFTLIB_fft1d_i32f_c32fc_o32fc/FFTLIB_fft1d_i32f_c32fc_o32fc_d.c, for example, I see

pX = (FFTLIB_F32 *) TI_memalign(128, numPoints * 2 * sizeof(FFTLIB_F32));

and also a comment that says:

/* pX is required to be 16-byte aligned for streaming engine use in kernel */

which is a bit confusing.

DSPLIB

In the test driver source freertos_sdk_am275x_11_00_00_16/source/dsplib/test/DSPLIB_fir/DSPLIB_fir_d.c, for example, I see

void *pIn = (void *) TI_memalign(DSPLIB_L2DATA_ALIGNMENT, bufParamsIn.stride_y * bufParamsIn.dim_y);

where DSPLIB_L2DATA_ALIGNMENT is defined in DSPLIB_types.h as:

#define DSPLIB_ALIGN_SHIFT_64BYTES 6  //!< Number of bits to shift for 64-byte memory alignment
#define DSPLIB_ALIGN_SHIFT_128BYTES 7 //!< Number of bits to shift for 128-byte memory alignment
#define DSPLIB_ALIGN_SHIFT_256BYTES 8 //!< Number of bits to shift for 256-byte memory alignment

#define DSPLIB_ALIGN_64BYTES (1 << DSPLIB_ALIGN_SHIFT_64BYTES)   //!< Align by 64-byte memory alignment
#define DSPLIB_ALIGN_128BYTES (1 << DSPLIB_ALIGN_SHIFT_128BYTES) //!< Align by 128-byte memory alignment
#define DSPLIB_ALIGN_256BYTES (1 << DSPLIB_ALIGN_SHIFT_256BYTES) //!< Align by 256-byte memory alignment

#define DSPLIB_L2DATA_ALIGN_SHIFT DSPLIB_ALIGN_SHIFT_64BYTES //!< Set the default L2 data alignment

/** @brief Macro that specifies the alignment of data buffers in L2 memory for
 * optimal performance */
#define DSPLIB_L2DATA_ALIGNMENT (((uint32_t) 1) << ((uint32_t) DSPLIB_L2DATA_ALIGN_SHIFT))

which means DSPLIB_L2DATA_ALIGNMENT equates to 64.

MATHLIB

In the test driver source freertos_sdk_am275x_11_00_00_16//source/mathlib/test/MATHLIB_cos/MATHLIB_cos_test.cpp, for example, I see:

pSrc = (T *) MATHLIB_memalign(MATHLIB_L2DATA_ALIGNMENT, length * sizeof(T));

where MATHLIB_L2DATA_ALIGNMENT is defined in MATHLIB_memory.h as:

#define MATHLIB_ALIGN_SHIFT_64BYTES 6  //!< Number of bits to shift for 64-byte memory alignment
#define MATHLIB_ALIGN_SHIFT_128BYTES 7 //!< Number of bits to shift for 128-byte memory alignment
#define MATHLIB_ALIGN_SHIFT_256BYTES 8 //!< Number of bits to shift for 256-byte memory alignment

#define MATHLIB_ALIGN_64BYTES (1 << MATHLIB_ALIGN_SHIFT_64BYTES)   //!< Align by 64-byte memory alignment
#define MATHLIB_ALIGN_128BYTES (1 << MATHLIB_ALIGN_SHIFT_128BYTES) //!< Align by 128-byte memory alignment
#define MATHLIB_ALIGN_256BYTES (1 << MATHLIB_ALIGN_SHIFT_256BYTES) //!< Align by 256-byte memory alignment

#define MATHLIB_L2DATA_ALIGN_SHIFT MATHLIB_ALIGN_SHIFT_64BYTES //!< Set the default L2 data alignment

/*! @brief Macro that specifies the alignment of data buffers in L2 memory for
 * optimal performance */
#define MATHLIB_L2DATA_ALIGNMENT (((uint32_t) 1) << ((uint32_t) MATHLIB_L2DATA_ALIGN_SHIFT))

which means MATHLIB_L2DATA_ALIGNMENT equates to 64.

Now, I want to understand:

if the above explicit alignments seen in the test drivers are hard requirements for processing buffers for running the optimized kernels efficiently? Can you confirm this? If yes, could you specify the alignment requirement for them in FFTLIB, DSPLIB and MATHLIB kernels?
what is the general platform data alignment requirement? Isn't it 16 bytes? Is there a trade-off in efficiency if I use the general data alignment for processing buffers that I pass to the kernels?

I couldn't find this information in the documentation provided. I would really appreciate if the above are answered and any references to the above in the library documentation is provided.

Regards,
Sreekanth

2 months ago

0 Ming Wei 1 month ago

TI__Guru 55155 points

Hi Sreekanth,

According to the C71x DSP Corepac TRM:

The L1D can sustain 512 bits of data (64 Bytes) to the CPU every cycle, while the L2 can provide 1024 bits of data (128 Bytes) to the streaming engine every cycle.

The DSPLib, FFTLib and MathLib are setting the default L2 data alignment to 64 Bytes. My guess is that 64 bytes is the minimum requirement for the L2 data alignment. To achieve the optimal C7x performance, you will need to set the default L2 data alignment to 128 Bytes.

Best regards,

Ming

0 SREEKANTH SREEKUMAR 1 month ago in reply to Ming Wei

Prodigy 80 points

Hello Ming,

Thanks for the response. I see that in DSPLIB, the test driver for FIR and cascaded biquads uses 64 bytes aligned buffers while in the FFTLIB the test driver explicitly uses 128 byte aligned arrays for buffers. So, according to you, in order to optimize streaming engine execution, it is recommended to use 128-byte alignment for all the buffers passed as an input to any kernel in FFTLIB, DSPLIB and MATHLIB. Did I understand it right?

Regards,
Sreekanth

0 Shreyansh Anand 1 month ago in reply to SREEKANTH SREEKUMAR

TI__Expert 5830 points

Hi Sreekanth,
Yes, you understanding is correct.

Thanks,

Shreyansh

Processors

Processors forum

AM2754-Q1: C7x Optimized library alignment requirements