Instruction Execution time is 100ns even when CPU Clock & GCLK is 300MHz

Karthikeyan Kasi Vishwanathan

Other Parts Discussed in Thread: TMS570LC4357, HALCOGEN

Hi Texas,

I am working with TMS570LC4357. I use HalcoGen to implement Free-RTOS in this controller. I have made GCLK as 300MHz and RTI Clock also as 300MHz.

I want to know how fast the controller is. So my code is very simple.

void main()
{
while(1)
{
gioPORTA->DSET=1;
gioPORTA->DCLR=1;
}
}

That's all.

And I watched from the Oscilloscope.

i.e:

Just this pin toggling takes 128 ns.

But I am sure that CPU Clock is configured as 300 MHz. But eventhough it is happening.

Please anybody explain why it is happening.?

and How to achieve the full speed of my TMS570 Controller.

Is there any Free-RTOS related or CPU Related (Pipeline etc.,) Settings I have to make?

Here are my HalCoGen Settings Screen Shots:

Since my CPU Clock is 300MHz, Please anybody explain me how to use 3ns as my execution time for one asm instruction.

Thanks in advance.

Regards,

Karthikeyan.K

over 10 years ago

0 Jean-Marc Mifsud over 10 years ago

TI__Mastermind 22375 points

Karthikeyan,

I'm not sure what you are trying to achieve here, but your configuration is outside the specification.
RTI is not supposed to run at 300Mhz.

Anyway, the peripheral region is by default configured as strongly ordered.
That means, all access will be done without using the CPU write buffer, the CPU will also wait for the current access to be completed before issuing the next one.
Try changing the peripheral region to Device. This can be done in Halcogen in the MPU setup.

Also you did not mentioned if the Cache (Inst and data) are enable in your experiment.
Be sure to enable the Caches.

0 Karthikeyan Kasi Vishwanathan over 10 years ago in reply to Jean-Marc Mifsud

Expert 1265 points

Hi Jean,

Thanks for your reply.

I had my L2SRAM Space configuration as Strongly Ordered and shareable.

As you mentioned, I have changed my RTI Clock as 100MHz (= VLCK).

And I make Tick Rate as 1000Hz in HalCoGen->OS window.

So Even after make the above configurations, my controller takes 128ns time, just to toggle the gpio pin.

Ok. Let me explain my question here clearly Sir.

So,

I am using a controller which is configured to work in 300MHz. So 300000000 pulses per second. Hence one pulse takes around 3.3 ns. If suppose a instruction in C Language for toggling a pin takes 4 assembly instructions and each assembly instructions takes 4 pulses to get executed, then Time taken by the controller to toggle a pin will be of 16 pulses which is equal to 52ns. (Here I am not considering pipeline concept. I assume even less amount of time will be taken by the controller.).

This is just an assumption only. May the Actual scenario be different. I don't know. (and want to know.)

So Clearly, Just a Toggling a Pin using 300MHz controller, will it takes 130 ns?

Please explain me (Of course this is utter basic. But for clarity I need explanation.).

Thanks in advance.

Regards,

Karthikeyan.K

0 Christian Herget over 10 years ago in reply to Karthikeyan Kasi Vishwanathan

TI__Expert 6985 points

Hi Karthikeyan,

It will took some time for the CPU writes to go through the bus matrix to the GIO module and to the actual pin.
I remember that it takes 12 cycles for a peripheral access on the non cached TMS570LS parts, but I'm not sure how this takes on the TMS570LC4357.
However, this would mean that a "simple" GIO toggle will take atleast 2*12/300MHz = 80ns.
Adding the 16 cycles for the instructions brings us to 40/300MHz = 133ns so very close to what you measured.
But please keep in mind, that this is a cached architecture with a long pipeline, thus it is very hard to predict how long such accesses will take, even that this is a fully deterministic design.

Best Regards,
Christian

0 Jean-Marc Mifsud over 10 years ago in reply to Karthikeyan Kasi Vishwanathan

TI__Mastermind 22375 points

Karthikeyan,

You want to make the Peripheral Region as DEVICE instead of STRONGLY ORDERED and keep the SRAM as DEVICE.

0 Karthikeyan Kasi Vishwanathan over 10 years ago in reply to Christian Herget

Expert 1265 points

Hi Christian Herget,
Thanks for your valuable information.
Really it helped me a lot.
By the way,
Can you point this information exactly where it is? I have searched many documents multiple times. But I can not find this.
Please say in which document this information (regarding cycle time taken for execution) is available.?
(if in TMS570LS controller also, Ok for me.)

Thanks in advance.

0 Christian Herget over 10 years ago in reply to Karthikeyan Kasi Vishwanathan

TI__Expert 6985 points

Hi Karthikeyan,

This isn't specified, the delay depends on the system settings like the clock dividers and the bus where the peripheral is connected to.
The 12 cycles is more a rule of thumb (for the R4 based devices), I would expect a few more cycles on most peripherals and few less on EMIF, eCAP, eQEP and ePWM.
The R4 and R5 cores are optimized for high throughput rather than low latency. However, the R5 as a got an AXI peripheral interface with lower latency than the AXI master interface, the R4 had only an AXI master interface.

So there is no easy answer to your questions, please just keep in mind that the latency might be around 12 cycles for most peripheral accesses.
It furthermore makes not a lot of a sense to bit bang serial interfaces or PWM's on such an architecture, the Hercules devices do all have one or two N2HET's integrated forch such operations.

Best Regards,
Christian

Arm-based microcontrollers

Arm-based microcontrollers forum

Instruction Execution time is 100ns even when CPU Clock & GCLK is 300MHz