Part Number: AM5K2E04
can you review below topic and comment of the six questions at the end of the thread?
We are developing a new board with Keystone2 processor. Attached to the EMIF16 of the Keystone2 processor there are several analog and digital
input and output devices. To reach our system performance requirements the read and write accesses to these devices must have a very short latency.
That means that the duration of read and write accesses must be as short as possible. Since the accesses on the EMIF are mostly random or
read-modify-write, we cannot use burst transfers by using DMA. Additionally some of our digital I/O devices need to use the wait signal.
In the past we made EMIF16 performance measurements on the K2EVM-HK development board with the XTCI6638K2KXAAW processor.
We observed the following EMIF performance (16 bit read) values with enabled extended wait mode:
Length of Chip Enable (CE) low pulse: ~26 ns
Distance between risig edge of CE and falling edge of next CE access: ~30 ns
ð Access duration = ~56ns
Note: On the evaluation board we made measurements on the expansion header without a device on slave side.
The wait signal was hard wired on the expansion header to be inactive. Clocking: SYSCLK1 = 1167,36 MHz, ARMCPUCLK = 1375 MHz.
We used Linux-3.8.4-g42865b7. The test application only generates accesses to the EMIF Interface in a loop.
Based on this results we decided to use the Keystone2 processor for the new board design. We decided to use the AM5K2E04XABDA4, because we only
need the ARM cores, the DSPs are not necessary.
Now we are investigating the performance of the EMIF on our board and we are seeing that it is much below the results we reached with the development board.
Length of CE low pulse: ~22 ns
Distance between risig edge of CE and falling edge of next CE access: ~111 ns
ð Access duration = 133ns
Note: On our board we made measurements with the same conditions like in the past with the evaluation board.
The slave device on EMIF16 bus is a FPGA that acknowledges the access immediately. The wait signal is always inactive.
Clocking: SYSCLK1 = ARMCPUCLK = 1400 MHz.
We use RT-Linux. The test application only generates accesses to EMIF Interface in a loop.
We are wondering, if there are differences between the two processors, or if it is a problem of our register settings?
Following you will find the EMIF register settings we used on both processors:
EMIF Settings on the development Board:
AWCCR= 0xf0040080 (WP1=1, WP0=1, CS5_WAIT=0, CS4_WAIT=0, CS3_WAIT=1, CS2_WAIT=0, MAX_EXT_WAIT=128)
A2CR= 0x40100081 (SS=0, EW=1, W_SETUP=0, W_STROBE=1, W_HOLD=0, R_SETUP=0, R_STROBE=1, R_HOLD=0, TA=0, ASIZE=1)
EMIF Setting on our own Design:
AWCCR= 0xf0040080 (WP1=1 WP0=1 CS5_WAIT=0 CS4_WAIT=0 CS3_WAIT=1 CS2_WAIT=0 MAX_EXT_WAIT=128)
A2CR= 0x40100081 (SS=0 EW=1 W_SETUP=0 W_STROBE=1 W_HOLD=0 R_SETUP=0 R_STROBE=1 R_HOLD=0 TA=0 ASIZE=1)
Please can you give us some help, how to reach the EMIF performance of the development board on our own design?
Here are our detailed questions:
1) The timing of the access to the external device is controlled by the EMIF16 interface IP. The timing between accesses is based on delays in accessing that IP and by the software running on the device. The accesses to EMIF16 were not prioritized for streaming data into the device so we didn't optimize that path. You mentioned that you were measuring the performance with different boards and devices using different software builds. There are too many variables to point to any one thing that is effecting the delays observed.
2) Access to the CEs should be the same but CE3 does have one different tie-off inside the part. CE3 does have the ability to generate byte write enable signals for wider parts. This feature wasn't found to be useful so we didn't document it but the different tie-off may effect the start or end of the state machine causing a different delay time.
3) The external wait signal is an external asynchronous input. The internal state machine must wait for the signal to be latched before it can detect it. The system clock on the development board is operating at a slower frequency then your board. The higher frequency means that the wait will be detected more quickly on your board. This does have an input on how quickly the access is completed but doesn't effect the delay between accesses.
4) I have no explanation for this behavior. Did you observe this was consistent for every setting of the strobe length?
5) I will have to do some research on this question. All the wave forms I have seen captured using the wait signal have the wait going low to end the cycle and staying low until the cycle has ended. I agree that this doesn't appear to match what is in the data manual. The internal documentation that I have matches the timing in the data manual. Can you provide a scope capture of the access with the wait signal? How were you generating the timing for the wait signal?
6) That looks like a carryover from earlier documents. The K2E doesn't have a separate ARM PLL. The main PLL is used for both the ARMs and the system.
If you need more help, please reply back. If your question is answered, please click Verify Answer
We are glad that we were able to resolve this issue, and will now proceed to close this thread.
If you have further questions related to this thread, you may click "Ask a related question" below. The newly created question will be automatically linked to this question.
In reply to Bill Taboada:
1) It seems, that we found out why the EMIF accesses with the development board were faster than with our own board. Our software colleagues have investigated the old Uboot of the development board and they have found the following code:
tmp = __raw_readl(CONFIG_AEMIF_CNTRL_BASE+8);
tmp |= 0x80000000;
After adding this code to the Uboot of our own board the EMIF accesses seems to be as fast as on the development board. It seems to be an undocumented feature. We found the following case in the TI E2E forum: http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/550371 . This case is related to a different processor (C6678), but seems to describe the solution for our problem. The code above sets the EMIF16 in an asynchronous mode with lower latency. But why ist that mode not documented? Are there any problems with this mode in the kestone2? We would like to use this mode, do you have any concern?
2) Question ist solved, thanks.
3) May be this question will be resolved with the asynchronous mode, we will check that.
4) Yes, it is consistent for every setting of the strobe length.
5) Ok, we will do a scope capture and come back to you afterwards.
6) Question ist solved, thanks.
In reply to Stefan Merten:
All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.
TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs andembedded processors, along with software, tools and the industry’s largest sales/support staff.