Hello,
I have a custom AM3352 board and our own bare metal software. Developing LAN drivers I've found that reading buffer descriptors from Ethernet Subsystem CPPI RAM (0x4a102000 - 0x4a103FFF) is quite slow.
The driver structure in general (LAN RX interrupt)
read_next_buffer_descriptor();
process_buffer_data();
take_care_of_buffer_descriptors_queue();
When I've measured the duration of the descriptors read operation (using ARM CP15 cycle counter) I've found that a single read might take around 200 cycles of CPU. Two consecutive reads sometimes take 200+ and sometimes around 400 cycles. All reads are to registers.
The CPU runs at 1GHz. We use MMU and 0x4a102000 - 0x4a103FFF region is mapped as device (non-cached) memory.
The entire interrupt handler takes around 3.5 us where 1.7 us is taken by our buffers processing. It gives us around 300 Kpckts/s overall LAN RX performance limit. So the descriptors access is the real bottleneck for heavy loaded network application.
The questions are:
Is the read time of 200 cycles normal for Ethernet CPPI RAM?
Is there any way to decrease it?
I suppose the problem is that CPSW_DMA uses this memory at the same time as I'm reading it - am I right?
Are there any ways/tricks to design LAN RX driver so that reading from the descriptors would be faster?
Thanks in advance