EVM 6472 large amount of data transfer

Mary Green

Platform:EVM 6472
version: 3.20.04.68
IPC: 1.21.02.23
BIOS: 6.30.03.46
XDAIS: 6.25.01.08

Are there examples about large amounts of data transfer? MessageQ example explains how to transfer a message from

one core to another, but it doesn't tells us how to edit the content of message. For example, core0 get an image from camera , put it into array[10000] and notify core1. Core1 deal with array[10000] furthur. Now I use # pragma DATA_SECTION(array,'.mydata') and .mydata points to DDR2. I don't know whether it is right. How to deal with the large amounts of data ? Who has a good idea?

over 14 years ago

0 jsobota over 14 years ago

TI__Intellectual 995 points

Mary,

One option would be to use MessageQ to pass a pointer to the data you want to process between cores. The data would have to be in shared memory like DDR2. In order to add the pointer to the MessageQ message you need to define your own message structure. Following is a quick example of how to do this.

Define new message structure:

typedef struct TstMsg
{
MessageQ_MsgHeader header; /* 32 bytes */
UInt32 *dataPtr;
} TstMsg;

Allocate, populate, and send message to another core

/* send core */
{
    TstMsg *sendMsg;

    sendMsg = (TstMsg *)MessageQ_alloc(heapId, sizeof(TstMsg));

    sendMsg->dataPtr = &array[0];

    MessageQ_put(remoteCoreId, (MessageQ_Msg)sendMsg);
}

/* receive core */
{
    TstMsg *rxMsg;
    UInt32 *rxData;

    MessageQ_get(coreRcvMsgQ, (MessageQ_Msg*)&rxMsg, MessageQ_FOREVER);

    rxData = rxMsg->dataPtr;

    MessageQ_free((MessageQ_Msg) rxMsg);

    /* process data using rxData pointer */
}

This example is extremely simple so you'll need to handle things like cache writeback (send core) and invalidate (receive core) to make sure your data is coherent across cores. You'll also need to add some sort of arbitration to make sure the sender and receiver aren't modifying/viewing the data at the same time.

Hope this helps.

Justin

0 ToddMullanix over 14 years ago in reply to jsobota

TI__Guru* 96960 points

Regarding the arbitration...you could use the message as the token. Simply pass the same message back and forth. Who ever has the message, has access to the shared memory that it is pointing to. So you are never freeing the message, just ping-ponging it around. You can use MessageQ_setReplyQueue to specify who to return the message to.

Also, I'd probably throw the size of the dataPtr into the message. That way the receiving side can do the cache coherencies calls based on that size value instead of a hard-coded constant.

Regarding the cache coherency...remember to writeback the dataPtr before you send the message. You lose ownership once you send it. Similarily, invalidate the dataPtr as soon as you receive it. MessageQ takes care of the cache coherency of the actual message, but not to any memory pointed to by a user defined field in the message.

Todd

0 Mary Green over 14 years ago in reply to jsobota

Intellectual 400 points

Thanks,Justin. I just want to know how to transfer my own message. This example tells me . But when the message is very large , it will waste much time for putting and getting. So I define array as a global variable and placed at DDRII. Core0 and core1 can deal with array directly instead of tranfering .

Moreover , can you tell me whether array running in core0 and array running in core1 has the same value? In other words, if I define a global value named array which is placed in DDRII. Does the value of array change simultaneously in core1 when core0 changes the value of array? From the watch window of CCS , I watched array in core0 and array in core1 has the same address.

0 Mary Green over 14 years ago in reply to ToddMullanix

Intellectual 400 points

Thanks for your suggestion,Todd. I'll take it.

0 jsobota over 14 years ago in reply to Mary Green

TI__Intellectual 995 points

Mary,

If you look at my example code I'm only appending a pointer to the MessageQ message. This pointer will point to your array in DDRII. Your _puts and _gets will only be transferring one extra 32-bit word (or two if you add the array size as suggested by Todd), no matter how large you define your array to be.

If you define an array in DDRII the data contained in the array will be same when viewed from the perspective of both cores only if the caching operations are handled correctly. For example, if core 0 writes to the array, it is writing to a copy of the array stored in its cache. Core 0 must perform a cache writeback prior to sending a message with the array pointer to core 1. This will push all the values of the array stored in core 0's cache back to the array in DDRII. You then send the array pointer to core 1. Core 1 has a version of the array stored in it's cache as well, but at this point it hasn't been updated with the new values written by core 0. As a result, core 1 performs a cache invalidate. This, simply, will pull the DDRII version of the array into core 1's cache updating all the values written by core 0.

In short, when caching is enabled if one core writes to global data the remote cores will not immediately see the updates to the data. In order for remote cores to see the new data the writing core must perform a cache writeback of the data to the global memory area. After this operation, the remote cores must perform a cache invalidate in order to see the new data residing in global memory.

Justin

0 Karl Wechsler over 14 years ago in reply to jsobota

TI__Mastermind 20805 points

Mary --

Not sure if anyone said it already, but if your buffer is in DDR and use are using Cache_wb and Cache_inv() APIs to writeback and invalidate, you need to make sure that you data buffer has 128-byte alignment and is a multiple of 128 bytes. This is a hardware restriction where you can only wb or invalidate on a cache line boundaries (128). If you don't have good alignment you will be invalidating adjacent live data and it'll take a long time to debug this.

You can use #pragma DATA_ALIGN to get aligned buffer. Or you can allocate the buffer using Memory_alloc() and specify 128 for the align parameter.

-Karl-

0 Mary Green over 14 years ago in reply to jsobota

Intellectual 400 points

Justin,

Thanks! Your explanation is so detailed that I know how to transfer my own message and how to keep the cache cohenrence with Cache_inv and Cache_wb(suggested by Todd and Karl). I have verified your method by modifying the programm message_single.c (<ipc_install>/ipc_1_21_xx_xx /packages/ti/sdo/ipc/examples/multicore/c6472).

0 Mary Green over 14 years ago in reply to Mary Green

Intellectual 400 points

Justin,

In the example meesge_single.c , core1 must wait and get a message until core 0 put a message . Core 0 can't put another message until array is free.To improve the efficiency of core0 and core1, I want to use three buffers array, array1 and array2 to store datas. Core0 store data in array, array1 and array2 by turns with link. When core1 process data in array, core0 can store data in array0 and array1 by turns. When core1 process data in array1, core0 can store data in array and array0. In short,I want to process data in array, array1,array2 by ping-pong operation. Can you give me some suggestion?

0 jsobota over 14 years ago in reply to Mary Green

TI__Intellectual 995 points

Mary,

Your idea is good and will allow you to get better throughput. I advise to use Todd's suggestion of messages as tokens. Whoever has the token can manipulate the data. You'd have three token messages, one for each ping-pong buffer. Whichever core has the ping-pong buffer token message can read/write the data.

Justin

Processors

Processors forum

EVM 6472 large amount of data transfer