This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6657: L1P as data memory

Genius 5785 points

Part Number: TMS320C6657

Hello,

I understand L1P is Cache/SRAM for program normally, but I'd like to use not only L1D but also L1P as data memory. I tried to make a test program and verify it. Please see attached file. I saw a strange phenomenon about L1P verification. The verify_data function for L1P passes but the verify_data2 function fails. The verify_data2 function has a dummy printf inserted between reading src(L2) and dst(L1P) at Line 109. Please let me know if you can explain a mechanism of the phenomenon.

Regards,
Kazu

l1p_test.zip

  • Hi Kazu,

    I've forwarded this to the software experts. Their feedback should be posted here.

    BR
    Tsvetolin Shulev
  • Hello,

    Cvetolin Shulev-XID said:
    I've forwarded this to the software experts. Their feedback should be posted here.

    I'm waiting for an answer from the experts. Please give me some advice.

    Regards,

    Kazu

  • Kazu:

    Please refer to the C66 CorePac reference manual at:

        www.ti.com/.../sprugw0c.pdf

    for internal architecture of the C66. Specifically, please pay attention to Sec 2.2.1.1 regarding to how to write to L1P if it is configured as SRAM. therefore, I am not sure how the set_data function() behaved in your test, as it is using the master mode instead DMA.

    Therefore I don't think you can use the L1P as data ram similar to L1D practically, besides the inefficiencies of program execution without cache.

    Let us know if this helps.

    Jian

     

  • Kazu,

    What are you trying to accomplish by using L1P as data memory? There are a variety of reasons you might be seeking this, but knowing what is your reason will let us address your actual requirements.

    Regards,
    RandyP
  • Hello Jian and RandyP,

    Thank you for your reply.

    RandyP said:
     What are you trying to accomplish by using L1P as data memory?

    C66x needs to access random data quickly for our application. Because I'd like to avoid a penalty of cache miss, I configure L1 to SRAM and use DMA to update contents of L1 from L2 periodically. Also, since L1D does not have enough size, I'll use L1P as SRAM. Even if we run our program from L2 rather than L1P, it looks like our application software has sufficient performance. Instead, it is important for us to be able to place the data in L1P-SRAM.

    jian35385 said:
    Please refer to the C66 CorePac reference manual at: www.ti.com/.../sprugw0c.pdf

    According to "C66x CorePac UG (SPRUGW 0 C): 2.2.1 L1P Memory", C66x can read the data from L1P-SRAM as like L1D-SRAM, can't it? It's good information for me. However, since C66x can't write the data to L1P-SRAM, I may need to customize the current software which is not for C665x.

    Regards,
    Kazu

  • Kazu,

    Thank you for the clear explanation of your intention for the use of L1P as data memory. This is not possible when the goal is 0 wait-state accesses by the DSP.

    The C6000 architecture is technically a modified Harvard architecture in which the program and data buses are physically separated. This allows both buses to access different memories at the exact same time, and this doubles the available memory bandwidth for the device. At the L1P/L1D level, this separation is physically enforced because the program fetch bus does not connect to the L1D memory and the data read/write buses do not connect to the L1P memory. This separation is vital to achieving the 0 wait-state access for the L1 memories.

    A C66x CorePac cannot access its L1P memory as data. This is not possible to do through DSP data reads or writes.

    On a multicore C66x device like the C6657, CorePac0 can read and write to CorePac1's L1P memory using the global address. But this access speed is not 0 wait state, and can be very slow compared to L2 or MSMC or even DDR3 accesses.

    You will need to work out other ways to optimize the performance to get what you need. L1P SRAM cannot be used as you describe.

    You mentioned that you use DMA to move data from L2 to L1D. You will get best performance using IDMA1 for intra-CorePac transfers from L2 to L1D. If you are using EDMA3, the speed of the transfer will be much slower than with IDMA1.

    If you want to describe your application's requirements and algorithm for help with optimization, please post that as a new thread but post a link here to the new thread so we can find it.

    Regards,

    RandyP

  • Hello RandyP,

    I appreciate your support. It helps a lot.

    Regards,

    Kazu