Hello,
I am using simulator to simulate the effect of L1D cache/memory, but when I found that after the data is loaded into the memory, it needs 4 instruction time slots to have the data ready. I mean the value will only be loaded into the register after 4 instruction time slots of executing the LDW instruction. As in assembly, it needs 4 NOPs after the LDW instruction. Any ideas about this?
And I wonder whether the hardware is working like this? I am currently using the c6472 little endian simulator in CCS 4. Any comment is appreciated. Thank you!