This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Avoiding register spill

Hi,

I am trying to optimize some array-to-array loops that do funny indexing operations, such as interleaving/permuting the data. There is no immediate way to work with only aligned reads and writes. Either reads or writes tend to require unaligned operations, unless the loop body operates on "enough" data, so that it can always produce words or double-words worth of output data, even when it is reading in aligned words or double words from the input array. If that is confusing, consider implementing an interleaving function such as:

i = 18 * (k mod 16) + floor(k/16)

where k is the input index that must map to output index i. Array sizes in consideration are about 300 bytes.

It turns out that to be able to write a desired aligned-access loop, I need to bring in at least 48 bytes of data, which needs 12 32-bit registers. On top of that, I would need to allocate some working variables for assembling output words. Is there a way to ensure that the loop body (in C) is compiled without registers spilling into memory? And if that is not possible, have the compiler throw an error? If not, is there any alternative to looking at the assembly output to assess spillage?

Thanks,
Manu 

  • Manu,

    Your questions are about the compiler, and so they will get better answers on the compiler forum. Let me know if you would like this moved there, and I will make that request for you.

    "Optimization" has a huge class of solutions, depending on the goals and constraints and specifics of the application. I think it is safe to assume that you have a pure C version of the algorithm that is running correctly to prove the functionality. Then you have tried to use the optimizer settings to improve the code performance. How far are you from your goal, or is there a specific goal?

    My usual recommendation for what I understand of your situation is to use the EDMA3 module to move data into the alignment that most favorably fits your algorithm. Then use the EDMA3 module to move data into the alignment that most favorably fits the output target. Hopefully, one of those is a simple copy. But the EDMA3 can do the data movement more efficiently than the DSP, and it can operate on one part of the problem at the same time that the DSP is operating on another part of the problem.

    To explain any more about this idea, I would need to k now a lot more about your application. You may not need any more explanation, or you may not want anything other than answers to your questions about compiler switches (I doubt they exist), or you may have other ideas to try first.

    Regards,
    RandyP