This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Fast GIO

Other Parts Discussed in Thread: TMS570LS3137

Hello. On TMS570LS3137 it is necessary to implement fast switching of GIO. When using existing ports it turned out to reach switching speed at least 280 nanoseconds after the beginning of execution of the instruction. The main time occupies execution of the instruction of STR. Whether use of other instruction is possible, or pipeline setup is necessary. Use of the TC RAM module will increase speed of operation with GIO? What can advise? It would be desirable to receive a response in 1 machine clock period.

  • Sergey

    The time frames you are talking about seem about right and expected. Unfortunately, there are few opportunities I see to improve these times.

    I am running some experiments to see which ones can prove to be useful. I will let you know by tomorrow.

    Regards

    Abhishek

  • Sergey

    I tried a few options but could not improve this any further.

    I can think of a slightly convoluted way if you really were to think about it. That would be to use a DMA to transfer the data to GIO. It would be necessary to give some interval between two transfers which you can achieve via some chained dummy transfers. But this of course means you lose control on when to assert/deassert.

  • TMS570 - the Microcontroller having clock rate of 160 MHz, capable to carry out the instruction for 6-12 nanoseconds, has the developed fast periphery. And control of this periphery is carried out with a time delay in hundreds nanoseconds, and it not only operation with GIO. Controler operation with a kernel of Cortex R4F it is in real time very restricted. Why it is necessary to carry out quickly difficult mathematical computation when the result delays on external execution. Really there are no methods to avoid these expenses?

  • When designing a CPU, there is an inherent trade off between the maximum computational performance which can be achieved and the I/O latency.  In the world of ARM CPUs, the Cortex A platform optimizes computational performance, with I/O latency in the 100+ CPU clock rate.  The Cortex R platform balances the two design goals, with I/O latency in the 10+ CPU clock rate.  The Cortex M platform optimizes I/O latency, at <10 CPU clocks, but at the cost of performance.

    When using the Cortex R4 CPU, achieving fast I/O access can be challenging due to the design decisions made by ARM in the CPU. You have very few options to improve the default performance.  The main improvement which could be made is to use the device MPU to change the memory ordering of the region including your GIO to "Device" from the default "Strongly Ordered".  This will enable write buffering on the writes to the peripheral; with write buffering enabled, the CPU will not wait to receive the "ok" write response from the peripheral before issuing subsequent bus transactions after a write.  In a code base where performance was bound by I/O access, we have seen up to 60% performance improvement.  The downside is that any error returned from the peripheral is no longer coherent to the issuing bus transaction.

    Regards,

    Karl

  • You could advise ways of reduction of a delay between direct calculation and delivery on the target port, using the available periphery?