i am running a "bare metal" DSP application from shared memory (program text and data are linked around $8001000). I have been timing portions of my sw using the internal timer TSCL.
when i inspect the assembler listings, i see that a small number of instructions can take inordinately long to execute -- the code snippet below was measured to take 144 processor cycles. Processor interrupts are disabled. There is no DMA, etc going on.
379 00000158 022803e2 MVC .S2 TSCL,B4 ; |255|
380 0000015c 03800d6e! LDW .D2T2 *+DP(_az_phase_accumulator+4),B7 ; |256|
381 00000160 02800b6e! LDW .D2T2 *+DP(_az_ref_accumulator+4),B5 ; |256|
382 00000164 04000c6e! LDW .D2T2 *+DP(_az_phase_accumulator),B8 ; |256|
383 00000168 03000a6e! LDW .D2T2 *+DP(_az_ref_accumulator),B6 ; |256|
384 0000016c 9dc5 STW .D2T2 B4,*+SP(48) ; |255|
385 0000016e 4c6e NOP 3
387 00000174 eadb SUB .S2 B7,B5,B5 ; |256|
388 00000170 031905fb || SUBU .L2 B8,B6,B7:B6 ; |256|
389
390 00000178 022803e2 MVC .S2 TSCL,B4 ; |258|
i am thinking that memory caching is to blame, and that the processor is stalling while the cache is filled -- however i have disabled the L2 cache and frozen the L1 program and data caches.
is there any way to completely prevent the caching h/w from running? Or am i competely mistaken about the cache, and is something else to blame for stalling?