I am working on some code for a C6678 that does some linear filters, using cl6x 7.4.1 and plain C.
The basic filters actually compile quite fast (several seconds), even with -O3 --opt_for_speed=5, and I'm happy with the results.
One specific file with some decimation filters takes a ludicrous amount of time to compile (434 seconds) for four little functions.
The annoying part of this is that most of the time seems to be spend exhaustively trying to find loop schedules for loops on which it eventually gives up. I wouldn't mind the long compile times if it was actually succeeding... I've tried forcing #pragma UNROLL(1) but this doesn't help.
If I build it with -O3 --disable_software_pipeline [1] it takes just 3.5 seconds to compile. That's perhaps enough to speed up my edit-compile-debug cycle and get me through for now, while I work on optimising other files in the library that are actually more performance-critical. It does disable pipelining for a handful of loops in the file that can be successfully pipelined, so it's not ideal.
Is there something else I could do here?
The underlying issue is that the filter coefficients in question almost but don't quite fit in the register file, which is obviously very frustrating for the compiler. I suspect that I could fix it by (for example) exploiting the symmetry in the filter coefficients, but given this is well tested and working code I feel like that's optimisation work I can't justify just now.
Thanks for any tips,
Gordon