This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS570LS1227: CPU processing performance difference on LS1227 (no cache) vs LC43 (with cache OFF)?

Part Number: TMS570LS1227
Other Parts Discussed in Thread: AM2612

Team,
Can you please help with the below customer question?
Thanks in advance,
Anber

This is a performance analysis of 32bit-CRC LUT-based processing for Ethernet packets.
Our motivation for deactivated cache is higher determinism:

TMS570 seems to have relatively long calculation times for this use case 
and  apparently LS1227 @150Mhz (without cache) is faster than the LC43 @300MHz (deactivated cache). 

Can you confirm those numbers?
Do you have an idea of what we could have missed to improve results?

We already varied the location of LUT and instructions between Flash and RAM.

Ethernet CRC, 32 Bit 

   

Ethernet Polynom --> 0x04C11DB7

   
     

 

TMS570LC43 - CRC Calculation 

Frequency 

Core: 300 MHz | Busse: 150 MHz    (2:1)

Lookup-Table (normal)

1514 Bytes in µs

42 Bytes in µs

from ROM with Cache

214

9

from ROM without Cache

927

35

from RAM without Cache (Table + calcCRC)

977

35

from RAM without Cache (calcCRC)

977

35

from RAM without Cache (Table)

927

35

Slicing-by-8

1514 Bytes in µs

42 Bytes in µs

from ROM with Cache

51

2

from ROM without Cache

537

21

from RAM without Cache (Table + calcCRC)

430

17

from RAM without Cache (calcCRC)

480

19

from RAM without Cache (Table)

473

19

     
     

TMS570LS1227 (does not have Cache)

TMS570LS1227 - CRC Calculation

Frequency 

Core: 150 MHz | Busse: 75 MHz    (2:1)

Lookup-Table (normal)

1514 Bytes in µs

42 Bytes in µs

from ROM 

485

19

from RAM (Table + calcCRC)

484

19

from RAM (calcCRC)

484

19

from RAM (Table)

485

19

Slicing-by-8

1514 Bytes in µs

42 Bytes in µs

from ROM 

139

6

from RAM (Table + calcCRC)

112

5

from RAM (calcCRC)

130

6

from RAM (Table)

105

5

  • Hi Anber,

    I was surprised with your data initially; however we have one internal TI AI tool which can analyze all our old e2e issues along with all the materials. When i feed your question to this AI tool i got some useful information. I want you to go through it once:

    Based on your observation that the TMS570LS1227 @150MHz (without cache) is faster than the TMS570LC43 @300MHz (with cache disabled), this is actually not entirely surprising when you understand the memory architecture differences between these devices. Here are the key factors that could explain your results:

    Possible Explanations:

    1. Flash Wait States and Memory Architecture:

      • The information I retrieved shows that Flash wait states have a significant impact on performance
      • When cache is disabled, every memory access (both instruction fetch and data access) must go through the Flash memory with its associated wait states
      • The TMS570LC43 running at 300MHz likely requires more Flash wait states than the LS1227 at 150MHz due to the higher clock frequency
      • Flash access time doesn't scale linearly with CPU frequency - the Flash itself has a fixed access time
    2. Memory Access Patterns for LUT-based CRC:

      • LUT-based CRC calculations involve frequent table lookups (data reads)
      • With cache disabled, every LUT access becomes a Flash or RAM read with wait states
      • The performance becomes memory-bound rather than CPU-bound
      • At 300MHz with cache disabled, you may be spending most cycles waiting for memory rather than computing
    3. Effective Performance vs. Clock Speed:

      • From the data I found, Flash wait states can reduce effective performance significantly
      • For example, devices running at higher frequencies with more wait states can have lower effective performance than slower devices with fewer wait states
      • The formula: Effective Performance = CPU MHz × Flash Access Efficiency

    Recommendations to Improve Performance:

    1. Place LUT in RAM:

      • Move your lookup table to SRAM (if not already done)
      • RAM typically has zero or minimal wait states compared to Flash
      • Ensure you're using the fastest RAM region available
    2. Place CRC Function in RAM:

      • Copy the CRC calculation function itself to RAM for execution
      • This eliminates instruction fetch wait states
      • Use compiler directives or linker scripts to place code in RAM sections
    3. Optimize Flash Wait States:

      • Review your Flash wait state configuration (RWAIT register)
      • Ensure it's set to the minimum safe value for your clock frequency
      • Check the device datasheet for optimal settings
    4. Consider Enabling Cache (with caveats for determinism):

      • Modern safety-critical systems can use cache with proper analysis
      • Use cache locking mechanisms to make critical sections deterministic
      • Implement worst-case execution time (WCET) analysis with cache effects
      • Some safety standards (ISO 26262, IEC 61508) allow cache usage with proper qualification
    5. Memory Bus Configuration:

      • Check if there are any bus arbitration or priority settings
      • Ensure the CPU has optimal access to the memory containing your LUT and code
    6. Compiler Optimizations:

      • Try different optimization levels (-O2, -O3, -Ofast)
      • Use inline functions to reduce function call overhead
      • Consider loop unrolling for the CRC calculation
    7. Alternative CRC Implementations:

      • Consider using hardware CRC if available on your device
      • Evaluate slice-by-4 or slice-by-8 CRC algorithms which may have better cache/memory behavior
      • Test bit-wise CRC (though typically slower, it has more predictable performance)

    Regarding AM2612 LDRA Support:

    From the e2e.ti.com search results I obtained, I found references to AM2612 functional safety and certification discussions, but I did not find specific confirmation of LDRA tool support in the available data.

    What recommend:

    1. Contact TI directly through your FAE or the e2e forum specifically asking about LDRA qualification for AM2612
    2. Check the AM2612 Safety Manual - this document typically lists qualified tools
    3. Review the MCU-PLUS-SDK documentation for AM261x - it may list supported development tools
    4. Contact LDRA directly - they maintain a list of supported devices and can confirm AM2612 support

    The AM2612 is part of the AM26x family, which is designed for functional safety applications, so tool support is likely available or in development, but I cannot confirm LDRA specifically without access to the qualification documentation.

    --
    Thanks & regards,
    Jagadish.