Hello,
We have an AM5728 based system running a ported code base from a previous design that used the TI 6482. The C66x core on the AM5728 is running under Sys/Bios and at 750 MHz. The C64x core on the 6482 runs under DSP/BIOS at 1 GHz. In both systems we get data from an FPGA. We are seeing the 5728 implementation is 2.5 to 3.0 times slower than the 6482. I am wondering where to start looking at why this might be happening.
More information: The AM5728 has DDR3 that we basically lifted from the EVM design. The DSP has as much L1/L2 cache enabled as I can. The DSP is connected the FPGA via PCIE at 5G x 1 lane. The data from the FPGA to the DSP is via a PCIE write from the FPGA to DSP. The data is written into L2SRAM memory, part of the 32k left over after cache, the DSP then gets an interrupt and copies the data to DDR3 memory for later processing. This was done to avoid having to cache invalidate the DDR3 memory each time the DSP wants to access the data from the FPGA. As mentioned above, the code that then processes the data, from DDR3 is 2.5 to 3.0 times slower on the AM5728 DSP. I would guess that of our cycle time, ~70 usec per interrupt, data is arriving from the PCIE to the L2SRAM about 50% of that time.
I am wondering if possibly the large volume of data arriving into the L2RAM is blocking the cache from working effectively or just blocking the DSP core from accessing memory effectively. In other words, is the PCIE transfer of data into L2SRAM killing our processing?
Any suggestions or insight into this would be much appreciated.
Thanks,
Chris