I have an inner loop that determines a pointer offset into a rolling circular buffer.
Pseudo code:
pointer = current pixel + offset;
value = *(char *)(circular_buffer_base + (pointer % circular_buffer_size);
This loop fails to software pipeline due to a function call that could not be inlined. If I take the mod (%) operator out, it pipelines successfully. I would like to pipeline this loop & appreciate any advice on how to do so, whether that be how to inline the mod function call or another way to set up the circular buffer addressing. I am fast-paging memory into L1D SRAM to implement the circular buffer.