Hi,
I am trying to optimize some array-to-array loops that do funny indexing operations, such as interleaving/permuting the data. There is no immediate way to work with only aligned reads and writes. Either reads or writes tend to require unaligned operations, unless the loop body operates on "enough" data, so that it can always produce words or double-words worth of output data, even when it is reading in aligned words or double words from the input array. If that is confusing, consider implementing an interleaving function such as:
i = 18 * (k mod 16) + floor(k/16)
where k is the input index that must map to output index i. Array sizes in consideration are about 300 bytes.
It turns out that to be able to write a desired aligned-access loop, I need to bring in at least 48 bytes of data, which needs 12 32-bit registers. On top of that, I would need to allocate some working variables for assembling output words. Is there a way to ensure that the loop body (in C) is compiled without registers spilling into memory? And if that is not possible, have the compiler throw an error? If not, is there any alternative to looking at the assembly output to assess spillage?
Thanks,
Manu