Sending monolithic buffers from DDR fails from (apparent) caching problem

Philip Taylor

Hello,
I have successfully run the SRIO multi-core loopback example on the Keystone II TCI6638K2K (ti\pdk_keystone2_3_01_01_04\packages\ti\drv\srio\example\SRIOMulticoreLoopback)
I then converted this from Type 11 host packets to Type 9 Monolithic buffers, as will be required by our application. This also worked fine.
We require the buffers to be in DDR (as there will be a lot of data being transferred). This presented a number of problems, some of which were overcome with using CACHE_wbL1d and CACHE_invL1din the correct places.

The situation is that by splitting the receive and transmit buffers, so I can place either one in DDR, I can successfully receive into DDR (by calling INV in Srio_rxCompletionIsr, just before the Srio_processReceivedBD call).

When trying to send, I call CACHE_wbL1d in Srio_sockSend_TYPE9 just before pushing to the Tx queue with Qmss_queuePushDescSize.
1. If the Tx buffers are NOT in DDR, this work fine.
2. If the Tx buffers ARE in DDR, I get no fail message, but the packet is not received on the next core
3. If (based on https://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/364723) I set the MAR flags to NOT cache (any of) the DDR, the packet is received.
This suggests it is a problem with the caching of the DDR.

Are there any other points in the code that I need to write back the cache?
Do you have any other suggestions?

Thank you for your time.

over 7 years ago

0 Yordan Kovachev over 7 years ago

TI__Guru**** 161600 points

Hi Philip,

I've contacted the TCI6638K2K design team. They should respond directly here.

Best Regards,
Yordan

0 Raja over 7 years ago

TI__Guru* 81335 points

Philip Taylor,

Welcome to the TI E2E forum. I hope you will find many good answers here and in the TI.com documents and in the TI Wiki Pages (for processor issues). Be sure to search those for helpful information and to browse for the questions others may have asked on similar topics (e2e.ti.com). Please read all the links below my signature.

We will get back to you on the above query shortly. Thank you for your patience.

Note: We strongly recommend you to create new e2e thread for your queries instead of following up on an old/closed e2e thread, new threads gets more attention than old threads and can provide link of old threads or information on the new post for clarity and faster response.

0 ran35366 over 7 years ago in reply to Raja

TI__Genius 12805 points

Philip

1. When you say that it works if the buffer is not in DDR, did you try to put the buffer in MSMC memory?

2. Have you changed the MPAX or the SES/SMS registers at all?

3. can you try to do call back to L2 cache as well (CACHE_wbL2) ? Please try it and report your results

Regards

Ran

0 Philip Taylor over 7 years ago in reply to ran35366

Prodigy 30 points

Hello Ran,

My answers to your questions are inline:

1. When you say that it works if the buffer is not in DDR, did you try to put the buffer in MSMC memory?
A. I have jsut tried that but it failed the same way, reporting it had sent the data, but it never being received by the waiting core

2. Have you changed the MPAX or the SES/SMS registers at all?
A. I don't think I have changed any of these.

3. can you try to do call back to L2 cache as well (CACHE_wbL2) ? Please try it and report your results
A. I have added the write back L2 as well, immediately after my L1 call back, just before the descriptor is pushed to the Tx queue (shown below)
This did not have the desired effect, as the next core still fails to receive the SRIO message.

//Flush the memory region, to ensure it is written to DDR
CACHE_wbL1d ((void *) &mono_region[0], NUM_Rx_MONO_DESC * NUM_CORES * SIZE_MONO_DESC, CACHE_FENCE_WAIT); //or CACHE_FENCE_WAIT?
CACHE_wbL2 ((void *) &mono_region[0], NUM_Rx_MONO_DESC * NUM_CORES * SIZE_MONO_DESC, CACHE_FENCE_WAIT); //or CACHE_FENCE_WAIT?
/* OSAL Hook: Once the descriptors have been populated; let the OSAL know that we are done. */
Srio_osalEndDescriptorAccess ((Srio_DrvHandle)ptr_srioDrvInst, (void *)ptrMonoDesc,
ptr_srioDrvInst->txDescSize);
/* Push the transmit buffer descriptor into the Transmit Queue. */
Qmss_queuePushDescSize (ptr_srioDrvInst->txQueue, (uint32_t*)hDrvBuffer, ptr_srioDrvInst->txDescSize);

Thank you for your time.

Philip Taylor

0 ran35366 over 7 years ago in reply to Philip Taylor

TI__Genius 12805 points

One more suggestion

I wonder what happens if you disable cache on the receive core and try again. Let's see if this is send issue or receive issue.

(This will not explain why when you disabled cache on the send everything worked, but let's try)

Thanks a lot

Ran

0 ran35366 over 7 years ago in reply to ran35366

TI__Genius 12805 points

Any update?

0 Philip Taylor over 7 years ago in reply to ran35366

Prodigy 30 points

Hi Ran,

Sorry for the delay in replying, I was on leave.

I was originally disabling caching in the DDR globally using the following line in the app.cfg file:
Cache.setMarMeta(0xA0000000, 0x40000000, 0);

To implement your requested change, I moved the setting of the MAR value from the cfg file to the C code. At the start of the multicoreTestTask function I added the following lines:
extern "C" Void multicoreTestTask(UArg arg0, UArg arg1)
{
coreNum = CSL_chipReadReg (CSL_CHIP_DNUM);

    //Disable the caching JUST on the (first) receiving core
    if ( coreNum == (FIRST_TX_CORE+1) )
    {
    //From sprugw0c
    //0184 8280h, MAR160, Memory Attribute Register 160, A000 0000h - A0FF FFFFh
    volatile uint32_t* MAR160 = ( volatile uint32_t*)0x01848280; // This is MAR 160 => A000 0000h - A0FF FFFFh
    TRACE(info,"About to disable caching on core " << coreNum << " with MAR160 = " << *MAR160);
    CACHE_disableCaching (160);
    TRACE(info,"Finished disabling caching on core " << coreNum << " with MAR160 = " << *MAR160);
    }
The output of this code suggested that the bottom bit was changing:
[C2] ../main.cpp(1050) : About to disable caching on core 2 with MAR160 = 13
[C2] ../main.cpp(1052) : Finished disabling caching on core 2 with MAR160 = 12

However this didn’t work. I tried applying this change to all cores, but it still didn’t work.

Is this the correct way to disable the caching on a per core basis?

I only set MAR 160 as this covers more memory than I am using for my small buffer in this simple test, and the MAP files shows this allocated at the start of the buffer (i.e. at 0xA000 0000)

Thank you for your help.

Philip Taylor

Processors

Processors forum

Sending monolithic buffers from DDR fails from (apparent) caching problem