This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Performance LS1227 vs LC4357

Other Parts Discussed in Thread: TMS570LC4357, TMS570LS1227, HALCOGEN, RM57L843

Hi

I ported my application from TMS570LS1227 to TMS570LC4357 (main reason: 300 MHz vs 180 MHz). I expected a faster execution of code, but I do experience a slower one (!).

Basically I replaced the HALCoGen part, the rest is the same.

  • Code is running from RAM (and flash).
  • LS1227 at 180 MHZ (HCLK too), LC4357 300MHz (HCLK 150 MHz)
  • in LS1227 ECC not enabled, in LC4357 ECC is always on.
  • Compiler settings the same for both targets: -O3, opt_for_speed=5, fp_mode=5, --float_support=VFPv3D16

What could cause this performance reduction?

Wrong clock configs? Did I make mistakes in HALCoGen? Or is this ECC related? MPU? Or RTS? I selected libc.a / automatic

Thank you for every hint!

Roger

  • Hi Roger,

    Sorry for the basic question, but have you enabled the caches?  The datapath to SRAM and flash is much longer on the LC4357 as compared to the LS1227.  If the caches are not enabled, you should expect lower performance.

    Regards,
    Karl

  • Thank you Karl

    Is this done just by setting the "enable cache" on the R5-MPU-PMU tab in HALCoGen? If yes, this checkbox is set.

    As you see, any basic question is ok.

    Regards,

    Roger

  • Roger,

    I think you mentioned before that your code is CPU bound.  So you're certain it's not IO that is slowing the code down.

    How much slower is the code running on the 330MHz LC4357 compared to the 180MHz 1227? 

    I think the next steps would entail trying to break down your algo and find the area where the performance is different.

    You could either do this by inserting PMU benchmarks around particular segments of your algo or you could use a tool like the XDS560v2 PROTRACE emulator.   In this cae the LC4357 is supported.   For the 1227 you would run the code on the 3137 (which is the trace enabled superset of the 1227).   This latter tool will give you profiling information in graphical form and also give insight into how long each instruction takes to execute.

  • Hi Anthony

    Thank you for your hints. I'm pretty sure that my code is not slowed down by IO, but I will check this too...

    I will do some measurements with the PMU and will then report.

    Regards,

    Roger

    Edit: Anthony, you mentioned 330 MHz CPU clock, this is what HALCoGen also notes, but the datasheet says 300MHz, so I configured 300MHz. Could I go up to 330MHZ? Where can I find the information?

  • Anthony,


    I work with Roger and can give you some answers to your questions:

    The (sample) code takes 2.417us on the TMS570LS1227 (GCLK = 180MHz, HCLK, 180MHz, VCLK = 90MHz). The same code needs 3.413us on the TMS570LC4357 (GCLK = 300MHz, HCLK = 150MHz, VCLK = 75MHz). If I reduce the GCLK to 150MHz (GCLK = 150MHz, HCLK = 150MHz, VCLK = 75MHz) on the TMS570LC4357, the code needs 4.580us to be run. In case of TMS570LC4357, cache is enabled (HalCoGen R5-MPU-PMU tab, checkbox "Enable Cache" is checked).

    The sample code consists mainly of multiplications and additions, plus some branches (if, switch) and a look-up table access.

    I noticed that disabling cache has no influence on the execution time of this sample code! But regarding the real application, enabling/disabling cache has great influence.


    Best regards, Peter

  • Hi all,

    Thanks to Peter we found the wrongly configured part! The MPU!

    What Peter did: Create a new HALCoGen project with 4.01 and copy the MPU settings form this project to our project.

    Voilà.

    Mentioned sample code runs now in 1.603 us (instead of 3.413 us).  About 1.5 times faster than with LS1227. That's what we expected.

    Thanks to all!

    Roger

    @Anthony: What about the 330MHz / 300 MHz?


    And as soon as I know why I totally mis-configured the MPU, I will post it.

  • Hi Roger,

    Glad to see you got it working.  Even when the cache is enabled, you must mark memory regions as cacheable with the MPU.  "Normal" memory type should be cacheable and bufferable.  

    Regards,

    Karl

  • Roger,

    My mistake.  RM57L843 is 330MHz - got it confused w.TMS570LC4357.