Hi,
I did some experiments to get a better understanding of how the compiler manages delay slots.
Therefore, I compiled the following C-function with optimizations enabled:
int ptrsum(int* a, int* b) {
return *a + *b;
}
the generated assembly code was:
RETNOP .S2 B3,4 ; |12|
|| LDW .D2T2 *B4,B4 ; |11|
|| LDW .D1T1 *A4,A3 ; |11|
ADD .L1X B4,A3,A4 ; |11|
What I do not understand is, how can the values in B4 and A3 already be accessed in the next cycle, as C66x does not feature pipeline interlocks?
According to the C66x instruction manual requires 4 delay slots for the values to become available.
Pointers are very appriciated.
Thx