This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IPC messageq and cache coherency problem

Other Parts Discussed in Thread: SYSBIOS

Hi,

Does MessageQ module takes care of cache coherency of whole message or just message header?

The reason I am asking because I encountered a problem that probably is due to cache coherency. I pass data (few hundred bytes) between cores by attaching it to messageq header. That is I reserve header_size + data size using MessageQ_allocate, write data and send the whole thing. On the receiving side I just start using message and data in it, I dont do any cache syncing because supposedly MessageQ should do cache syncing for the body of a message. Everything works fine in debug mode but sometimes the received data contains zeros if program is compiled in release. Adding invalidate cache on receiving side seems to fix the problem.

So my understanding is that MessageQ is not syncing data attached to the message. If so could you please add a note to the documentation or if it is already in documentation point me to the section where it is mentioned.

Thank you,

Alexey

  • Hi Alexey,

    I'm not sure what IPC version you are actually using, but I briefly checked the source code
    of 1_25_02_12 for your reference. I think Cache coherency is being manged (writeback and invalidate)
    in MessageQ_put() if the following conditions are met :

    - using TrasnportShm as a transport for MessageQ
    - the message allocated by MessageQ_alloc is in the range of the SharedRegion memory you have configured.
    - cache is being enabled for that SharedRegion

    I'm wondering you were using your own heap for the MessageQ_registerHeap() not belonging to SharedRegion. 

    If you are using another transport for MessageQ, please check the code.

    Hope this helps.
    kawada 

  • Hi Alexey,

    Oh, sorry you are asking about the receiver side.
    As I mentioned earlier, according to my understanding, cache coherency would be managed by MessageQ sender-side-implementation under the specific conditions. The MessageQ receiver-side-implementation, however, does not look like handling the cache coherency.

    ...But as for reader side, I think the reader should intentionally manage the cache, i.e., reader core should invalidate the cache for the received message before touching it.
    For receiver, the message sent from the writer can be regard as the data stream from the peripheral. So, invalidating the cache before the message seems to be natural for me.

    May be, TI's comment is required to clarify this.

    Best Regards,
    Kawada 



  • Sorry forgot to mention: I use latest IPC on c6678, shared memory transport (default settings for messageq) and default instance of HeapMemMP.

    The reason I thought that MessageQ does syncing for the whole message (not just header) is that for example here http://e2e.ti.com/support/embedded/bios/f/355/p/141701/511949.aspx#511949 TI employe said: " Yes, MessageQ is most appropriate method for sharing data between cores.  If using MessageQ, there's no need to do Cache_wb or Cache_wbInv since it does it for you." And I think it was mentioned in a few other places that messageq does syncing.

    I am pretty sure syncing happens for the header because MessageQ uses ListMP to pass headers around.

    The other thing I am worried about is in the light of manual syncing it that I can not use HeapMemMP. I don't think that this heap does cache line alignment. What will happen if two messages (one I am writing into and the other just arrived) happen to be in the same cache line and I do manual cache invalidate?

    Alexey

  • The purpose I am asking because I experienced a issue that probably is due to storage cache coherency. I successfully pass information (few number of bytes) between cores by linking it to messageq headlines. That is I source header_size + information dimension using MessageQ_allocate, create information and deliver the whole thing. On the getting part I just start using concept and information in it, I don't do any storage cache syncing because apparently MessageQ should do storage cache syncing for the body system of a concept. Everything performs excellent in debug method but sometimes the obtained information contains 0's if system is collected in launch. Including invalidate storage cache on getting part seems to fix the issue.

    http://www.mmo2buy.com/RS2007-Gold.html
    http://www.fifacoins2buy.com/
    http://www.solorsgold.com/gold

  • Alexey,

    >The reason I thought that MessageQ does syncing for the whole message (not just header) is that for example here http://e2e.ti.com/support/embedded/bios/f/355/p/141701/511949.aspx#511949 TI employe said: " Yes, MessageQ is most appropriate method for sharing data between cores.  If using MessageQ, there's no need to do Cache_wb or Cache_wbInv since it does it for you." And I think it was mentioned in a few other places that messageq does syncing.

    Cache_wbInv and Cache_wb can write back the cache to memory.  So I believe this Ti's comment is for writer side, not for reader side. Do you agree ?

    >I am pretty sure syncing happens for the header because MessageQ uses ListMP to pass headers around.

    Yes. I think so.

    >The other thing I am worried about is in the light of manual syncing it that I can not use HeapMemMP. I don't think that this heap does cache line alignment. What will happen if two messages (one I am writing into and the other just arrived) happen to be in the same cache line and I do manual cache invalidate?

    HeapMemMP_alloc has a parameter for alignment. Please take a look below:

    Ptr HeapMemMP_alloc(HeapMemMP_Handle handle, SizeT size, SizeT align)

    I'm wondering if you do MessageQ_free when the message receiver completes the message.
    How about invalidating the cache for the completed message before MessageQ_free ?

    Best Regards,
    Kawada 

  • Hi,

    Naoki Kawada said:

    Cache_wbInv and Cache_wb can write back the cache to memory.  So I believe this Ti's comment is for writer side, not for reader side. Do you agree ?

    True, missed this part. Anyway, syncing is not mentioned in messageq examples and I didn't find anything in the documentation. That is why I wanted to get a reply from a TI employee. If I sync then everything seems fine.

    Naoki Kawada said:

    HeapMemMP_alloc has a parameter for alignment. Please take a look below:

    Ptr HeapMemMP_alloc(HeapMemMP_Handle handle, SizeT size, SizeT align)

    The problem is that you have to use MessageQ_alloc and it does:

    msg = Memory_alloc(MessageQ_module->heaps[heapId], size, 0, &eb);

    That is no alignment. Moreover HeapMemMP does not have default align parameter, though you can set default alignment for HeapBufMP.

    Naoki Kawada said:

    I'm wondering if you do MessageQ_free when the message receiver completes the message.
    How about invalidating the cache for the completed message before MessageQ_free ?

    I do free messages, at least, I didn't find any memory leaks (free heap size does not change over long time span). The problem occurs before freeing because without invalidation I  read zeros sometimes.

    Alexey

  • Hi Alexeym,

    Alexey said:

    The problem is that you have to use MessageQ_alloc and it does:

    msg = Memory_alloc(MessageQ_module->heaps[heapId], size, 0, &eb);

    True. I understood your point. I missed that.

    Alexey said:

    I do free messages, at least, I didn't find any memory leaks (free heap size does not change over long time span). The problem occurs before freeing because without invalidation I  read zeros sometimes.

    Yes. you said the message receiver could face to the problem of cache coherency for the pointer to message container returned from MessageQ_get. And you could get the correct result by invalidating cache before touching it -- So, why the data related to message had already been cached before getting it via MessageQ ?
    I think you are recycling the memory between the cores for the messages by using MessageQ_alloc/free. MessageQ_alloc is done by writer core and MessageQ_free is done by reader core. So, if the reader core does not invalidate the message before calling MessageQ_free, I wonder the reader could get wrong data at the next MessageQ_get because the writer core could allocate the same memory for the message. That's why I suggested to invalidate the cache before MessageQ_free. 

    Best Regards,
    Kawada 

  • Hi, Kawada

    Naoki Kawada said:

    Yes. you said the message receiver could face to the problem of cache coherency for the pointer to message container returned from MessageQ_get. And you could get the correct result by invalidating cache before touching it -- So, why the data related to message had already been cached before getting it via MessageQ ?

    I think memory can be cached on say core A before being used for MessageQ on core B if for example previous message ends in the same cache line where the next message starts. Also I think processor can do read ahead for cache, pre-fetch next cache line.

    Alexey

  • Alexey said:

    I think memory can be cached on say core A before being used for MessageQ on core B if for example previous message ends in the same cache line where the next message starts. Also I think processor can do read ahead for cache, pre-fetch next cache line.

    Are you talking about the cache coherency issue for writer side (core B) ? 
    As you know, each core has a dedicated cache HW. It will not be shared among the cores.
    So, even though writer side (core B) has some issues in cache coherency (such like the scenario you mentioned), it will not give any effects directly to reader side cache (core A). Core B's cache status is none of Core A's business by cache HW point of view. And when you invalidate the cache before touching the received message in reader side (Core A), you could get success in the received message. This means writer side (core B) did not have any problem in cache coherency. Am I missing your points ?

    Anyway, I think it's time to get TI's comments on this issue. To get back to your first question -- where is the usage notes for Cache when using MessageQ  ? This is also the point I'm interested in.

    Best Regards,
    Kawada 

     

     

  • Naoki Kawada said:

    Anyway, I think it's time to get TI's comments on this issue. To get back to your first question -- where is the usage notes for Cache when using MessageQ  ? This is also the point I'm interested in.

     

    Hi,

    I don't think anybody from TI will answer. It was quite some time since the first post and nobody showed any interest.

    Alexey

  • MessageQ will perform cache management for the entire message (header + payload).

    The success of this depends on correct program configuration. You must configure SharedRegion with the correct cache properties on both sides, the program must correctly declare the memory cache property. If you plan to allocate the message on one side and free the message on the remote side, then the heap used by MessageQ must be an IPC heap.

    What version of IPC are you using?
    What device are you running on?
    What Operating Systems are you using on each core?

    ~Ramsey

  • Hi Ramsey,

    Thank you for helping. I use c6678, MCSDK 2.1.2.6, SYS/BIOS 6.35.4.50  and IPC 1.25.3.15 on all cores. I don't change any cache setting so my understanding is that everything is cached in L1D (often debugger marks regions as cached in L1). Shared region is created in bios config:

    var SHAREDMEM = 0x0C000000;
    var SHAREDMEMSIZE = 0x00100000;
    SharedRegion.setEntryMeta(0,
        { base: SHAREDMEM,
          len: SHAREDMEMSIZE,
          ownerProcId: 0,
          isValid: true,
          createHeap: true,
          name: "sharedmem",
        });

    I create shared heap on core 0 (HeapMemMP_create) and open on other cores (HeapMemMP_open). Not sure if this could cause a problem but I call HeapMemMP_open twice on each core for the same heap. All operations are successful and it seems that returned pointers point to right data. I compared ti_sdo_ipc_heaps_HeapMemMP_Object with memory content. Actually I noticed that HeapMemMP has minAlign parameter set to 128. Messages are allocated on one core and freed on receiving side (I use one messageq to exchange messages between cores and tasks on the same core).

    Thank you,

    Alexey

  • Alexey,

    When configuring your shared region entry, I would explicity set the cacheEnable property instead of realying on the default (it might be incorrect). The cacheEnable property should reflect the true cacheability of the memory, it does not set the cacheablity. Add the following to your shared region entry configuration:

    cacheEnable: true

    I also noticed that you are placing SharedRegion #0 (SR_0) in MCSM but not using the entire size. Your SR_0 is 1 MB, the remaining 3 MB cannot be used. You better not be placing anything else there. If you want to split MCSM into two parts, say SR_0 (1 MB) and program data, then you need to redefine your internal memory map and replace the MSCSRAM memory segment with two new ones. In other words, the shared region must be the only user of a memory segment. Keep in mind, if you want make a change to the internal memory map, you must redefine the entire memory map.

    It is not clear where you are getting the backing store for your HeapMemMP instance. You should be getting it from a shared region because that is how IPC knows if cache management is needed. If you want to use memory from SR_0, use the SR_0 heap to acquire the backing store, and then pass that address to your HeapMemMP create. You are probably already doing this.

    I've attached a simple MessageQ example for C6678. By default, it builds for all cores, but I suggest just building it for two cores and test it out. See the readme file for details. Maybe this example would help you start with something which works and build from there. Unfortunately, the example does not create a heap, but that should be easy to add. It simply allocates a message from the SR_0 heap and statically initializes it.

    ~Ramsey

    ex11_ping.zip
  • Ramsey,

    Thank you for your help. My code is essentially a copy of the example that comes with IPC. I tried to add cacheEnable, define cacheLine to be 128, set SharedRegion.numEntries = 1 and use default shared region via SharedRegion_getHeap(0) but this did not resolve my problem (everything works if I call invalidate but fails after a while if I don't). I did not try to split shared memory yet.

    One thing that I noticed is that you use manual memory allocation while I do MessageQ_regitsterHeap and then use MessageQ_allocate. Do you think this can make a difference?

    In any case knowing that MessageQ should sync whole messages I will look more closely at my config and program.

    Alexey

  • Sorry to hijack that Topic, but your answer really confused me.

    I also noticed that you are placing SharedRegion #0 (SR_0) in MCSM but not using the entire size. Your SR_0 is 1 MB, the remaining 3 MB cannot be used. You better not be placing anything else there. If you want to split MCSM into two parts, say SR_0 (1 MB) and program data, then you need to redefine your internal memory map and replace the MSCSRAM memory segment with two new ones.


    Could you clarify further. We have exactly the Setup you mention (SharedRegion in MSM, but also .text and other stuff). I see in the linker map-file, that the SharedRegion and other sections do not overlap, so I wonder how this should cause any Problems?

    Thx

  • Hi,

    I split shared memory in two parts: first 1M is shared region 0 for exchanging data between cores, remaining 3M hold master core code and heap (this stuff does not fit in L2).  Apparently such setup might cause some syncing or other issues. Why is it so I have no idea, maybe Ramsey will explain this. Ramsey said that it is better to split shared memory into 2 regions. That is create new platform and with 2 memory regions: 0x0c000000 - 0x0c100000 MSMCSRAM1 and 0x0c100000 - 0x0c400000 MSMCSRAM2. Somehow this is going to help drivers and sysbios work correctly.

    Just my 2 cents,

    Alexey

  • Apparently such setup might cause some syncing or other issues. Why is it so I have no idea, maybe Ramsey will explain this.

    I hope so ... still waiting for an explanation for this suggestion.

    Thx, Heinrich

  • Its okay to have MSMCRAM be 4MB and SR0 to be only 1MB of the 4 MB.  A section is generated for the SR0 so it won't overlap anything else that is placed in MSMCRAM.  I don't tink Splitting MSMCRAM will make much difference.

    MessageQ takes care of Cache coherency for the whole message.  Note:  On the receive side of the message, the shared addresses must not be already in the cached.  If it is, then your app code needs to invalidate those cached addresses before calling MessageQ_get().

  • Thanks for the clarification, I was already worried ;)

    Thx, Heinrich

  • Hi,

    judahvang said:

    MessageQ takes care of Cache coherency for the whole message.  Note:  On the receive side of the message, the shared addresses must not be already in the cached.  If it is, then your app code needs to invalidate those cached addresses before calling MessageQ_get().

    Ok, this explains why I was having problems. I allocate messages on one core and free on a different one. Senders relatively slowly processing data while receiver just waits for messages doing nothing. So the same shared memory block is constantly reused and thus cached on the receiving side.

    Thank you for your help,

    Alexey

  • Alexey,

    One clarification.  You only need to do the Cache_inv once for an address on the receive side not everytime you do a MessageQ_get.

    Lets say you have a really big heap 1MB and you split this into 4 messages of 256KB each.  Assuming that you've used this heap for other stuff besides MessageQ and the heap might be cached on the receiver side, I would Cache_inv the whole heap (1 MB) once before doing any MessageQ_get operation.  Now, after this one-time Cache_inv of the whole heap, no more Cache_inv are required as at this point MessageQ will be cache coherent on the messages.

    There is only 2 things a processor can do with a Message.  Send it to someone else or free it and both MessageQ_put and MessageQ_free do cache coherency operations on the whole message. MessageQ_put will Cache_wbInv so the message is coherent and its out of the senders cache.  MessageQ_free does Cache_inv so its out of the processors cache.

    Judah

  • Judah,

    I am reluctant to invalidate the whole shared heap because I think that might cause problems if I use several tasks on the same core that send and receive messages. Say Taks 1 allocated memory via MessageQ_alloc and is writing data, Task 2 decided that it is ready to receive another message so it does cache invalidate on the whole heap. My understanding is that data of Task 1 will be corrupted because it was in cache that got invalidated. Though I think calling Cache_wbInv on the whole cache won't cause such problem.

    Thank you,

    Alexey

  • Alexey,

    Good point but if your cache is already dirty with the shared heap, you should be doing a Cache_inv of the shared heap before any MessageQ_alloc or MessageQ_get.  When you do a MessageQ_get, you don't know what address of the heap you are going to get so how would you know what to invalidate?

    In the scenario above, you are correct, the data in the message allocated by task1 would be corrupted.  That's why the invalidate of the heap should be done before the alloc or get.  Doing Cache_wbInv won't solve your problem as this can also corrupt your message because it might write back something that you are not intending for it to do.

    Again this is only necessary if somehow the shared heap was already in your cache before any MessageQ usage.  Really, whatever code that was using this shared heap before MessageQ should clean itself up and invalidate the cache lines containing the shared heap.

    Judah

  • Judah,

    I don't think that invalidating cache before alloc will help in case I described in previous post. If Task 1 does invalidate before alloc then at this point cache is ok but after it starts to write data memory is cached . If while data is being written I decide to call invalidate + get then the data will be corrupted. Seems like I need a lock to prevent this so that cache invalidate is not called between alloc and put.

    You are right Cache_wbInv is not much safer, it might destroy newly arrived messages.

    Probably I can get away with calling invalidate after MessageQ_get. It seems that MessageQ uses pointers aligned to 128 boundary thus it is safe to invalidate region (message_pointer, message_pointer + 128). After invalidate call one can read correct size field from message header and then invalidate memory block (message_pointer, message_pointer  + size).

    Alexey

  • Alexey, Judah,


    We had similar problems with MQ/shared heap/multitasking a month ago and I was forced to investigate almost the whole IPC source code to fix it. Here are my findings:

    The only cache coherency operation in MQ is to wbInv() the whole message (header + payload) before passing it to the receiver. MQ transport implementations (including shm) don't inv() received messages. Additionally, when when you use HeapMemMP (and MQ_alloc()/MQ_free() use it under the hood) Memory_free() operation inv()'alidates the freed block.

    Here is an example on how it works:

    1) CORE0 executes Task0A and Task0B; CORE1 executes Task1A and Task1B

    2) somehow we've invalidated both CORE0's and CORE1's caches within the shared region;

    3) CORE0 Task0A allocates a message (M1) from the shared heap and fills it, M1 is now cached in CORE0 (but it is not cached in CORE1 which is of vital importance here);

    4) Task0A sends M1 to Task1A. MQ wbInv()'s M1; now M1 data are written in the shared memory and are not cached in CORE0 nor CORE1;

    5) CORE1 receives a pointer to M1, typically in a Hwi context. M1 is not cached in CORE1 so the Hwi simply reads destination queueId from the M1's header, finds the queue, enqueues M1, and ISync_signal()'s the queue's sync.

    6) Task1A receives the message with MQ_get(), processes M1 data and sends it to Task1B. In this case we even don't need wbInv().

    7) Task1B sends M1 to Task0B; repeat steps 4-5. Now M1 is not cached in CORE1 and is cached in CORE0

    8) Task1B calls MQ_free(M1). HeapMemMP_free() returns M1 to free memory and invalidates cache within M1. Now M1 does not exist and is not cached in CORE0 not CORE1.

    9) Task0A allocates a message from the shared heap and gets M2. Quite often &M1 == &M2 and it' ok (you can easily verify it, just take a look at steps 3-5).

    You can allocate temporary buffers from the shared heap, share data between tasks working on the same core etc. But you should never ever read/write outside the allocated buffers or received messages.

    Unfortunately it does not work out of the box because of bugs in IPC implementation. Specifically, HeapMemMP alloc() and free() operations don't invalidate cache within free block headers so when you receive a message allocated by another core you can have some stale data in your cache. Oops!..

    We've fixed it but I can't attach the code right now (I'm in the hospital and the code is in the repo).

    I should say, there is a plenty of other bugs in the vanilla IPC, so it's almost completely unusable out of the box. Think twice to design your application around it.

    Regards,

    Dmitry

  • Hi Dmitry,

    I think you are right: if HeapMemMP free was invalidating cache then this IPC problem might be solved. There are a couple of concerns though. First, I am not sure that HeapMemMP always aligns blocks on cache line boundary. Alloc function has align parameter. So if you allocate memory from heap directly (not through MessageQ) you might get unaligned block which can be "damaged" by invalidate when a neighbor is freed. Even if HeapMemMP aligns to 128 bytes now there is no guarantee it will in the future or on another ti architecture. The other thing is that it is possible that a core will do some type of cache prefetch. I did not look into if ti does this but again no guarantee it won't start doing this in the future and without bugs. In any case patching and testing IPC yourself is not fun, it would be nice if TI decided to implement and support this solution.

    So while your approach seems to be correct and more elegant for the time being I am going to stick with invalidating on receive which does not rely on cache state. That is:

    1. Before calling put write back message header + message body on the sender side (just in case).

    2. After receiving pointer from MessageQ get invalidate 128 bytes after it.

    3. Read size field from header.

    4. Invalidate whole message.

    This seems to solve my crashes and I have run my program for quite a while millions of message sends/receives.

    I agree that IPC has its problems. If I were to rewrite my code I would probably switch to Open Event Machine or OpenMP (they are not so low level). Though not sure how buggy those things are.

    Alexey

  • Hi Alexey,

    Indeed HeapMemMP aligns on cache line boundary:

    In instance_init():

        obj->minAlign = sizeof(ti_sdo_ipc_heaps_HeapMemMP_Header);
        if (SharedRegion_getCacheLineSize(obj->regionId) > obj->minAlign) {
            obj->minAlign = SharedRegion_getCacheLineSize(obj->regionId);
        }
    In alloc():

        /*
         *  Make sure the alignment is at least as large as obj->minAlign
         *  Note: adjAlign must be a power of 2 (by function constraint) and
         *  obj->minAlign is also a power of 2,
         */
        adjAlign = reqAlign;
        if (adjAlign & (obj->minAlign - 1)) {
            /* adjAlign is less than obj->minAlign */
            adjAlign = obj->minAlign;
        }
    I believe it's safe to rely on this behaviour because it's just impossible to use shared memory in multicore scenario if the alignment is not guaranteed. (Ok, it's too pathetic. Single-writer multiple-readers would work, but it would be too restrictive).

    Prefetching is already implemented in C66x and SYS/BIOS Cache_inv, Cache_wbInv invalidate the prefetch buffer as well (in older versions they forgot to invalidate the PB in Cache_wbInvAll(), I'm not sure if it is fixed now). Again, I believe an implementation of prefetching should not be an unpredictable black box or multicore shared memory would be impossible.

    Regarding your recipe:

    1) ok but is already done in MQ;

    2) does not solve the real problem (see below)

    3, 4) ok (to be on the safe side...)

    It's too late to invalidate message header after MQ_get(). MQ transport driver have already read it (without inv()) to get destination queueId and priority values. Thus if the message header was incorrectly cached the message goes in an invalid queue or A_invalidQueueId assertion is raised (and the system dies).

    MessageQ_put():

        if (dstProcId != MultiProc_self()) {
            /* put msg to remote processor using transport */
        }
        else {
            /* Assert queueId is valid */
            Assert_isTrue((UInt16)queueId < MessageQ_module->numQueues,
                          ti_sdo_ipc_MessageQ_A_invalidQueueId);

            /* It is a local MessageQ */
            obj = MessageQ_module->queues[(UInt16)(queueId)];

            /* Assert object is not NULL */
            Assert_isTrue(obj != NULL, ti_sdo_ipc_MessageQ_A_invalidObj);

            //...
        }

    Cheers,

    Dmitry

  • You are right about share heap, aligning and prefetch must be enforced and implemented by library for heap to work at all.

    The code you quoted is for put and I dont think there is any problems with send side. The messages I was sending were arriving to correct core and queue. I think the problem that I had was with the fact that the pointer I received from MessageQ_get was pointing to cached memory block. So while the shared memory contained correct data from other core the stuff I read from cache was invalid (old data from the same core). When I tried to parse data or reply to id from reply queue field my program crashed. Basically most crashes were due to invalid reply queue id.

    Alexey

  • Alexey,

    On the receiver side message is received by transport driver (in most driver implementations the driver gets only a single pointer) and the driver enques it into a local queue via MessageQ_put(). That's why I quoted MQ_put(). Take a look into TransportShm_swiFxn(), TransportShmCirc_swiFxn(), TransportShmNotify_notifyFxn() etc.

    Invalid replyId is just an example of the problem I'm talking about. Reply id is in the message header (and the whole header is in a single cache line). It's just a luck that the messages went into the right queue.

    Note that MessageQ_MsgHeader::dstProc field is not used on the receiver side. Transport drivers use MessageQ_getDstQueue() macro:

    #define MessageQ_getDstQueue(msg)                                             \
            ((msg)->dstId == (MessageQ_QueueIndex)MessageQ_INVALIDMESSAGEQ) ?     \
                (MessageQ_QueueId)MessageQ_INVALIDMESSAGEQ :                      \
                (MessageQ_QueueId)(((MessageQ_QueueId)MultiProc_self() << 16u)    \
                | (((MessageQ_Msg)(msg))->dstId))

    So received message will stay on the same core but its destination queue would be unpredictable.

    Dmitry

  • Alexey,

    This is what I'm talking about: it's too late to cache_inv() after MQ_get(). If extra coherency operations don't hurt your throughput, patch the transport driver you are using or use your own driver.

    It's actually quite simple to fix the driver:

    Void TransportShmNotify_notifyFxn(UInt16 procId,
                                UInt16 lineId,
                                UInt32 eventId,
                                UArg arg,
                                UInt32 payload)
    {
        UInt32 queueId;
        MessageQ_Msg msg;

        msg = SharedRegion_getPtr((SharedRegion_SRPtr)payload);

        // Added cache coherency operation
        {
            const Bool wait = TRUE;
            Cache_inv( msg, sizeof(* msg), Cache_Type_ALLD, wait );
        }

        queueId = MessageQ_getDstQueue(msg);

        MessageQ_put(queueId, msg);
    }

    Dmitry

  • Oh, I see. Thanks for pointing this out. It would be interesting to hear something from TI employees on this matter.

    Alexey

  • Hello,

    We did recently fix a cache related bug in HeapMemMP_free. This fix is available in our latest release:

    IPC 3.21.00.07
    http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/index.html

    I'm not sure if this is the same bug described in this post. The bug we fixed had to do with cache management when returning a buffer to the heap. When a buffer used by MessageQ was received, some or all of the payload may have been pulled into the cache (depending on application use). When the buffer was returned to the heap, it is placed back onto the free list. This required updating the first few bytes of the buffer. These bytes were then written to memory and invalidated from the cache.

    The bug is the we did not invalidate the entire payload address range from the cache (i.e. some of the buffer was still in the cache). On a remote processor, this same memory was allocated for a new message and data was written into the buffer. At some point in time, the cached buffer on the local processor was evicted from the cache due to a capacity miss. If the application had written to this payload, then the data in the cache would have been marked dirty. During the eviction, the dirty lines were written out to memory. This would overwrite the new data written into memory by the remote processor (assuming the remote processor had already written its data to memory). This corrupted the message payload.

    We fixed this by invalidating the entire message payload in the HeapMemMP_free function.

    We are actively developing and improving the IPC 3.xx stream. Unfortunately, older releases (i.e. DSPLink, SysLink, IPC 1.xx) have been frozen. If you have a test case using IPC 3.xx which illustrates a bug, please post it on the forums. We will have a look and make our best attempt to resolve any issues.

    Thank you,
    ~Ramsey

  • Ramsey,

    The whole buffer has already been invalidated at the top of HeapMemMP_free() before entering GateMP. IPC 3.21.00.07 HeapMemMP.c

    line 772:

        /*
         *  Invalidate entire buffer being freed to ensure that stale cache
         *  data in block isn't evicted later
         */
        if (obj->cacheEnabled) {
            Cache_inv(newHeader, size, Cache_Type_ALL, FALSE);
        }

    line 870 (added in 3.21.00.07):

           /*
            *  Invalidate entire buffer being freed to ensure that stale cache
            *  data in block isn't evicted later
            */
            Cache_inv(newHeader, size, Cache_Type_ALL, TRUE);  /* B2 */

    Now you invalidate the whole buffer twice.

    I'm not sure what did you try to fix but I believe the modifications made are wrong.

    Please take a look into HeapMemMP.c from our repo (attached). It contains the fixes described in my earlier post.

    Dmitry8463.HeapMemMP.c

  • Hi Ramsey,

    I didn't follow this thread very closely, but looking at the new code it seems that the changes are related to an earlier problem (SDOCM00096671), which was fixed in 1.25.01 and now is back again for HeapMemMP_free():

    http://e2e.ti.com/support/embedded/tirtos/f/355/t/220583.aspx

    Ralf

  • Dmitry,

    Thanks for the feedback. You are correct, the original buffer is invalidated at the beginning of HeapMemMP. The invalidate at the end of the function might be redundant, but the header and size might have changed (due to merging of neighboring buffers). I think we still need to manage the cache. We are looking into this and studying your changes.

    Ralf,

    Yes, this bug was fixed in IPC 1.25. We recently discovered that in our IPC 3.xx stream, this fix was lost. Therefore, we applied the fix again.

    ~Ramsey

  • Hi Ramsey,

    it's the recent change in IPC 3.21.0.07 which is causing the problem mentioned in SDOCM00096671 again.
    Cache operations on large buffers should be avoided within the GateMP_enter() / Gate_MP_leave() block, because this would limit cache operations to one core at the same time.

    Ralf