This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DCAN FIFO reading while receiving

Other Parts Discussed in Thread: TMS570LS3137, HALCOGEN

Hey guys,

we're currently experiencing some troubles with reading a DCAN Rx FIFO on a TMS570LS3137 while having bursts transfers on the bus. It seems as if messages are getting out of order or even getting lost. The protocol above is not working reliably therefore.

I already had a very careful look at the TRM for the TMS570 but there are still some details I'd need more information about. From my point of view, the TRM is not detailed enough regarding the FIFO operation mode.

The main question is, what the CAN core is doing when the software is reading out a FIFO while new frames are getting from the bus, targeting the same FIFO?

  • Does the CAN core place any new message to the starting position of the FIFO?
    This would indicate, that we need to read out the FIFO as fast as possible without allowing anything to interrupt us to be sure the reading happens faster than the writing.
  • Does the CAN core place any new message to the next higher free slot after the highest used one as long as the FIFO was not totally emptied (no active NewDat flags)?
    Would a loop based reading mechanism until no "NewDat" flag is found automatically prevent this?

To make it even more difficult, we'd like to use the DCAN in two different ways (application dependent):

  • Having Rx interrupts enabled (for every single message object in the FIFO).
  • Using polling mode only

Is there any different software handling needed to guarantee a perfectly fine in-order reception? 

Independent if using interrupt or polling mode: Can there be any situation where we need to start reading from another position than 1?

The only partly useful information, I could find in the TRM is the flowchart on page 1187. But this does not contain any text as an explanation (like polling mode, sequences, usage of multiple FIFOs, etc.).

Thank you very much in advance.

Regards,
Michael

  • Michael,

    The user does not have any control of the FIFO operation. It seems that you are receiving a burst of message with the same message ID and your setup cannot keep up with messages. I would suggest you to set up multiple mailboxes (10 for example) for receiving the messages with this ID. If the the first mailbox is not freed up when new message comes, the DCAN core will automatically place the new message into the second mailbox you configured. Polling and interrupts are the only methods available for handling the messages. For polling, you can only poll the the status of the mailboxes you plan to use. In the interrupt method, you may want to check if there are any new message arrived before your exit the interrupt service routine.

    Thanks and regards,

    Zhaohong
  • Zhaohong,

    we configured the FIFO to receive everything. So no filtering is actually done.

    I've rephrased my questions because I think that the issue was not really understood correctly. In my opinion, the TRM does not describe how the FIFO mechanism is working in certain situations.

    So once again:

    • What does the CAN core do when new frames are received from the bus while the software is reading out the FIFO and is only halfway trough? Where are those new messages are placed to?
    • Is a different reception handling for interrupt based mode and polling mode needed or do they work the same?
    • Can I assume that I can start reading always from the first FIFO position?

    Michael

  • (1) Assuming that the software triggers a transfer between the mailbox 1 and the interface registers and all other mailboxes are free. In the middle of the transfer, the DCAN module receives a new message. If mailbox 1 is not released, the new message will be saved to mailbox 2.

    (2) In my view, they are the same, polling is constantly checking status flags . In the interrupt mode, software only checks status when a new message is received.

    (3) Page 1186 provides information about how to set up a "FIFO" buffer with multiple mailboxes and how to read and write data. You need to be careful that this is not a real FIFO. The DCAN module will always save data to the mailboxes with descending priority. For example, assume that you have mailboxes 1-6 in the FIFO buffer. There are messages in mailbox 1,2,3. If you read the message in mailbox 1 and release the mailbox, the new message will go to mailbox 1 instead of mailbox 4. You need always mark which mail box you just read. In this example, You need to check mailboxes 2-6 before you go back to check mailbox 1.

    See if the above makes sense to you.

    Thanks and regards,

    Zhaohong
  • Zhaohong,

    this was still not all the information I was looking for.

    Here's how our FIFO reading routing is implemented currently:
    We always iterating via the interfaces from the first mailbox until either the EoB is hit (overflow) or a mailbox is reached which doesn't have the NewDat flag set. The interface command for reading a mailbox automatically clears the NewDat and IntPnd flags. According to your description this should be fine because the re-filling should start from the beginning again, right?

    At 99% of the time this works fine. But sometimes mailbox #1 is empty and there's a message only in mailbox #2, sometimes also in #3.
    How can this happen?

    And also: Does this indicate that the FIFO reading process shall NEVER EVER be interrupted by something with higher priority to avoid overtaking of messages?
    e.g.: 4 messages in the FIFO; 2 of them are read; the operation is interrupted and 3 more messages are received...

    Please also take a careful look to my questions in the first post.

    Thanks in advance.

    Michael

  • Michael,

    First, let's see if my understanding your use case correctly.

    A big data table is broken into multiple CAN messages. The CAN messages come in burst. You want to make sure that you can receive the data in order. If you use the "FIFO buffer" idea, The FIFO needs to be big enough for the entire burst. If your data is big, I would propose the following method.

    (1) Allocate multiple and consecutive mailboxes for this message ID (mailbox 1 to 10 for example).
    (2) Enable interrupt for mailbox 1 or some one in the middle (mailbox 5 for example)
    (3) In the interrupt service routine, starting from mailbox 1, read every mailbox which has data. To keep the data in order, this process cannot be interrupted. In the ISR, you can just move data from DCAN mailbox to the system RAM and do processing later. Assume that an interrupt is generated by mailbox 5. In the ISR, you need to read from mailbox 1 to the final mailbox (may pass 5) which has data. If there is new data coming in during this period of time, they will be saved to the mailboxes just freed (starting from mailbox 1).
    (4) the mailbox reading has to be done in the ISR so that they are not interrupted.

    Thanks and regards,

    Zhaohong
  • Zhaohong,

    our FIFO is configured to have 43 entries (plus one for the EoB, so 44 in total) and starts at mailbox #1.

    We never know, how much will be received because the whole protocol above is somewhat action triggered and is also dependent on the number of devices on the bus (which can vary). What we know indeed is that there might be several messages in sequence rather close to each other. Still the FIFO size should be far enough to cover any peak loads. As far as good. 

    For application dependent reasons, we want to handle the FIFO reading in two different ways:

    • interrupt based (having RxIE enabled for all mailboxes, but starting to read ALWAYS from the first position onwards until an empty mailbox or EoB was hit)
    • polling based (starting to read ALWAYS from the first position until an empty mailbox or EoB was hit)

    Now here's what I've observed:
    There's only one message in the FIFO and we start reading (by writing 0x3F and the mailbox number to IFCMD and waiting that the BUSY flag gets cleared). After the first mailbox was read, we're reading and checking the second one. The second mailbox hasn't set the NewDat flag, so we stop the reading operation, thinking that one frame was received and the FIFO is empty now. Afterwards suddenly a frame in the second mailbox can be seen while the first one is left empty.

    For me it looks as if there's a race condition between the way we're reading the FIFO and the CAN core fills it's content.

    Please tell me how I can avoid any possible race conditions.

    Thanks in advance.

    Regards,
    Michael

  • Michael, Since the data transfer to the mailbox and setting of NewDat flag are done by the CAN core, there is going to be a race condition on reading/clearing the flag by the code. Like you/Zhaohong mentioned before, the best way to handle this is to read the flags on entry into the (interrupt/poll) routine and service all mailboxes with the NewDat flag set in a loop before exiting. So any new data transferred by the CAN core can be handled later, after you empty-out the previous "FIFO". Avoid reading NewDat register while in the loop - avoid race conditions. If you cant keep up with the data, that is a different problem. Could you enable filter or use dma? Thanks, Joe
  • Maybe I should also mention that we're not reading the NEWDAT register.

    Instead we're using the interfaces only to get the mailboxes data and status information. This automatically clears also the flags, set by the core. From my perspective this should be somewhat like "transaction based", so should be consistent always .

    How should any looping look like in this case?

    After I read fresh data from mailbox #1 and I get no NewDat from mailbox #2, where should I start to try again? How can I guarantee the right order in all possible scenarios?

  • Zhaohong,

    Can you please tell me the IP designer you licensed the DCAN IP core from? Maybe it's better if I'm contacting them directly.

    Michael
  • I would suggest the following to check and read the mailboxes in an ISR or polling.

    (1) Read the NEWDAT registers to find mailboxes with newly received message.
    (2) Read the mailboxes with new message one by one using the interface registers starting from mailbox 1.
    (3) Check the NEWDAT status for the mailboxes with higher numbers. Read them if there are any with new message. Assume that mailboxes 1-5 have data in Step 1. You will need to check from mailbox 6. If there is new message coming in while reading mailbox 1, the message is saved to mailbox 6. After mailbox 1 is read and released, the new message will be saved to mailbox 1.
    (4) Quit.

    In order for the above process to work, the speed for reading the mailboxes has to be faster than the incoming speed of CAN messages. That is why the above process cannot be interrupted.

    Thanks and regards,

    Zhaohong
  • Dear Zhaohong,

    I'm working with Michael and I've taken over the analysis of this issue as we are yet to find a solution.

    I prepeared the following setup:
    - Vector CAN box sending 13 CAN messages every 5ms, each CAN message ID is incremented, starting from 0, to 7FF
    - Application on TMS570 checks the CAN FIFO for new messages every 2.5ms, FIFO is set to be 24+1 element long.
    - upon receive the application checks that the CAN ID is correct (previous + 1)

    Now what we see is that in 99% of the cases the reception is correct,
    but in some cases there is one message that gets wrongly placed in FIFO.

    An application variable dump e.g.:
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 666 = 0x029A),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 667 = 0x029B),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 668 = 0x029C),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 669 = 0x029D),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 670 = 0x029E),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 671 = 0x029F),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 672 = 0x02A0),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 673 = 0x02A1),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 674 = 0x02A2),
    (data = (1 = 0x1, 2 = 0x2, 3 = 0x3, 4 = 0x4, 5 = 0x5, 6 = 0x6, 7 = 0x7, 8 = 0x8), length = 8 = 0x8, id_format = 0 = 0x0, id = 660 = 0x0294),
    You can see here that id 660 got inserted somehow as last element in the sw FIFO, which is a copy of the TMS570 mailbox FIFO.

    In other cases, a message is inserted completly at the wrong place, to a place where it shouldn't have ever been inserted,
    since the FIFO is too large to ever exceed a given position with this cycle time checking for new messages.

    Example given below (trace32 dump of CAN RAM) where what we see is that a message got inserted around at the end of the FIFO,
    long after the "empty" mailboxes (we do not clear the mailboxes at all, we only clear the new data flag and that is it).

    Trace32 dump of CAN RAM when our application stopped on error breakpoint:
    - TEST flag is set in CTRL register and
    - in TEST register RDA field set to "Enabled"

    ________address|_0________4________8________C________0________4________8________C________0123456789ABCDEF0123456789ABCDEF
    SD:FF1BFF40| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1BFF60| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1BFF80| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1BFFA0| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1BFFC0| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1BFFE0| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0000|>00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0020| 05060708 01020304 50400008 00000001 00000043 00000000 00000000 00000000 ........P@.........C............
    SD:FF1C0040| 05060708 01020304 50C00008 00000001 00000043 00000000 00000000 00000000 ........P..........C............
    SD:FF1C0060| 05060708 01020304 51000008 00000001 00000043 00000000 00000000 00000000 ........Q..........C............
    SD:FF1C0080| 05060708 01020304 51400008 00000001 00000043 00000000 00000000 00000000 ........Q@.........C............
    SD:FF1C00A0| 05060708 01020304 51800008 00000001 00000043 00000000 00000000 00000000 ........Q..........C............
    SD:FF1C00C0| 05060708 01020304 51C00008 00000001 00000043 00000000 00000000 00000000 ........Q..........C............
    SD:FF1C00E0| 05060708 01020304 4E400008 00000001 00000043 00000000 00000000 00000000 ........N@.........C............
    SD:FF1C0100| 05060708 01020304 4E800008 00000001 00000043 00000000 00000000 00000000 ........N..........C............
    SD:FF1C0120| 05060708 01020304 4B400008 00000001 00000043 00000000 00000000 00000000 ........K@.........C............
    SD:FF1C0140| 05060708 01020304 44C00008 00000001 00000043 00000000 00000000 00000000 ........D..........C............
    SD:FF1C0160| 05060708 01020304 3E000008 00000001 00000043 00000000 00000000 00000000 ........>..........C............
    SD:FF1C0180| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C01A0| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C01C0| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C01E0| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C0200| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C0220| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C0240| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C0260| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C0280| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C02A0| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C02C0| 05060708 01020304 50800008 00000001 00000043 00000000 00000000 00000000 ........P..........C............
    SD:FF1C02E0| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C0300| 00000000 00000000 00000000 00000000 00000043 00000000 00000000 00000000 ...................C............
    SD:FF1C0320| 00000000 00000000 00000000 00000000 00000047 00000000 00000000 00000000 ...................G............
    SD:FF1C0340| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0360| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0380| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C03A0| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................

    As you can see
    - our FIFO starts at FF1C0020, ends at FF1C0320
    - elements are stored in FIFO between FF1C0020-FF1C0160
    - at addr FF1C02C0 there is a CAN message sitting with data "0102030405060708"

    This dump has been made when our application stopped on the

  • Hi, there.

    (1)How do you read the FIFO every 2.5ms? Do you always start with the FIFO object of the lowest message number?

    (2)In your post, message with ID660 is at the wrong place. Did you read this message before? I would suggest adding ID checking in your test to see if there are any missing messages. You can also record from the message object (mailbox) ID where each message is saved.

    On page 1186 of the TRM, there is note about the limitation of the FIFO buffers.

    "All message objects of a FIFO buffer needs to be read and cleared before the next batch of
    messages can be stored. Otherwise true FIFO functionality can not be guaranteed, since the
    message objects of a partly read buffer will be re-filled according to the normal (descending)
    priority."


    Thanks and regards,

    Zhaohong
  • Hi Zhaohong,

    Thank you for your reply.

    (1) We read the FIFO every 2.5ms, starting from lowest message object (in the above case from 1) to the higher numbered
    messagebox (in the above case it is 24)


    (2) Yes message ID660 is at the wrong place, that is our problem, since we are sending messages with incrementing ID,
    we expect that (in case of the FIFO is read more frequently than it can be filled fully) message IDs will be also
    in order. So this is our issue, that ID660 in this case was somehow placed in the wrong FIFO message box.
    And yes we do have that implementation of checking the ID, that is how we detected this issue.
    In the other example (trace dump) you can see that there is a message inserted somewhere around the end of the FIFO,
    which should not be even possible if the FIFO is read that frequently, also the previous elements are all 0,
    meaning that those have never been written since the ecu was reset, so it is pretty hard to imagine how can the processor
    place a message there without trying to place it at the lower numbered messageboxes.


    (3) I read the referenced section, but what does "cleared before the next batch" means? We clear the NEW_DAT flag
    when we read the corresponding FIFO messagebox, is there anything we could do?

  • Hi Zhaohong,

    Additional information that I was able to debug (please confirm if this makes sense and my assumption is right in points below).

    Lets assume the ECU receives 12 can messages as a burst, but only 11 messages are really stored in the FIFO at the time of checking,
    the 12th message is just being shifted into the 12th FIFO message box.
    Lets represent the FIFO mailboxes with [] and
    - 'n' {n=1...24} represents a message with flag NEW_DAT set, number stands for the ORDER of the message not the mailbox index,
    - '-' as empty mailbox,
    - '*' being filled, but not finished.
    In this case the FIFO looks like:
    [ 1][ 2][ 3][ 4][ 5][ 6][ 7][ 8][ 9][10][11][ *][-][-]...[-]

    So when checking all the NEW_DAT flags, the application will see 11 messages, and will clear all the 11 NEW_DAT flags.
    Upon next check (if no message is received) the FIFO will store and mark the 12th element as NEW_DATA:
    [ -][ -][ -][ -][ -][ -][ -][ -][ -][ -][ -][12][-][-]...[-]

    ------------------------------------------------------------------------------------------------------------------------

    Lets consider another case where we receive a burst of 3 messages, first 2 are already stored, 2nd is on the way,
    at the time of checking the FIFO it looks like this:
    [ 1][ 2][ *][ -][ -]...[ -]

    Just after reading the messages and clearing the NEW_DAT flags, the 3rd message arrives in the FIFO:
    [ -][ -][ 3][ -][ -]...[ -]

    Lets assume that in addition 4 messages new are received since we cleared the 1st and 2nd flag in the previous cycle
    [ 4][ 5][ 3][ 6][ 7]...[ -]

    ------------------------------------------------------------------------------------------------------------------------

    So my assumption is that when the CAN message is shifted to the first matching mailbox,
    there might be a race condition between checking the NEW_DAT and clearing the previous FIFO new_dat flags.
    The user can end up having a message in the "middle of the FIFO" without noticing it,
    or will end up in an unordered FIFO where there is no way to tell which message was received first.

    Of course this only explains the issue with wrongly ordered CAN messages, it still can not explain the issue
    of the above memory dump where the message was inserted completetly at the wrong place around at the end of the FIFO.
  • Hi Zhaohong,

    Another capture of the CAN buffer (CTRL.TEST=1, TEST.RDA=0):

    ________address|_0________4________8________C________0________4________8________C________0123456789ABCDEF0123456789ABCDEF
    SD:FF1C0000|>00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0020| 00000000 C0000000 1E400000 00001008 04030201 08070605 00000000 00000000 .........@......................
    SD:FF1C0040| 00000000 C0000000 1E200000 00001008 04030201 08070605 00000000 00000000 ......... ......................
    SD:FF1C0060| 00000000 C0000000 1E240000 00001008 04030201 08070605 00000000 00000000 .........$......................
    SD:FF1C0080| 00000000 C0000000 1E280000 00001008 04030201 08070605 00000000 00000000 .........(......................
    SD:FF1C00A0| 00000000 C0000000 1E2C0000 00001008 04030201 08070605 00000000 00000000 .........,......................
    SD:FF1C00C0| 00000000 C0000000 1E300000 00001008 04030201 08070605 00000000 00000000 .........0......................
    SD:FF1C00E0| 00000000 C0000000 1E340000 00001008 04030201 08070605 00000000 00000000 .........4......................
    SD:FF1C0100| 00000000 C0000000 1E380000 00001008 04030201 08070605 00000000 00000000 .........8......................
    SD:FF1C0120| 00000000 C0000000 1E3C0000 00001008 04030201 08070605 00000000 00000000 .........<......................
    SD:FF1C0140| 00000000 C0000000 1E000000 00001008 04030201 08070605 00000000 00000000 ................................
    SD:FF1C0160| 00000000 C0000000 1D040000 00001008 04030201 08070605 00000000 00000000 ................................
    SD:FF1C0180| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C01A0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C01C0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C01E0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0200| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0220| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0240| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0260| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0280| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C02A0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C02C0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C02E0| 00000000 C0000000 1E180000 00001008 04030201 08070605 00000000 00000000 ................................
    SD:FF1C0300| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000 ................................
    SD:FF1C0320| 00000000 C0000000 00000000 00001080 00000000 00000000 00000000 00000000 ................................

    as you can see, in FIFO entry #FF1C02E0 there is an element placed, the elements before (#FF1C0180-FF1C02C0)
    are totally cleared so the processor has never ever written a message into them (we do not clear anything but the NEW_DAT flag).

    The ID field "1E180000" stands for 1926,
    I think the datasheet is incorrect regarding the Message RAM representation, 24.5.3 chapter sais that:
    MsgAddr+0x08 [RSV:1][XTD:1][DIR:1][ID 28:16][ID 15:0]

    So 1E180000 would mean an ID=1E18000, in our case this is impossible (we are using 11bit addresses for this test).
    The only explanation is that the field description is actually
    [RSV:1][XTD:1][DIR:1][RESERVED :1][ID 10:0][RESERVED :2]+[NOT_USED 15:0]

    if you break down the hex values to binary, they are similar just not aligned:
    1E18 = 0001.1110.0001.1000 = 0001111000011000 < if you shift this down by 2 bits = 1926
    1926 = 0000.0111.1000.0110 = 0000011110000110

    1) This randomly inserted message at the end of the FIFO is still an issue
    2) Is the datasheet incorrect or did I missinterpret something?

  • Okay #2) together with chapter 24.17.20 (IF1/IF2 arbitration registers)can be explained, the 11bit CAN id is stored in 28:18 bits of ID field.
  • Could you please increase the severity of this issue as it seems others have the same issue with this DCAN interface (with another processor not TMS570), see:
    e2e.ti.com/.../1810212
  • Please keep in mind that the message objects will be re-filled according to the normal (descending) priority. That is, the DCAN module will always save the incoming message to the available message objects with lowest mail box ID.

    We also need to make sure that the speed of reading the message objects is faster than the incoming message. You need to disable interrupts before start reading the message objects. In your test, you can set one GIO pin high before reading and set it low after reading is complete. In this way, you can see on the scope how often reading occurs and if each reading can be complete within 2.5ms. This I/O pin toggling is required to prove that there is no other issue in the system.

    Please do the following in reading.

    (1) Check the NewDat flags. Assume that NewDat bit is set for 11 objects (from 1 to 11).
    (2) Read the message objects from the lowest ID.
    (3) After reading the last object (11 here), check if NewDat bit is set for the next object (12 here). If set, the message comes in after the NewDat is checked and before object 1 is read. Read the object. I believe that this is the step you are missing.

    Do not check NewDat bit for all message objects in step 3. If there is any new incoming message after object 1 is read, it will saved to object 1. You can read it it in the next 2.5ms period. For this mechanism to work, you need to be sure that reading speed is faster than the message incoming speed. This is why interrupts need to be disabled during reading. To ensure the speed in reading, do not do any data processing in the reading function.

    Thanks and regards,

    Zhaohong
  • Dear Zhaohong

    Please keep in mind that the message objects will be re-filled according to the normal (descending) priority. That is, the DCAN module will always save the incoming message to the available message objects with lowest mail box ID.
    > That is clear, I read the datasheet. According to my DCAN memory dump above this is violated, this requirement is not fulfilled. Please read again all the messages I posted.

    We also need to make sure that the speed of reading the message objects is faster than the incoming message.
    > This is ensured by having a FIFO with length of 24, and we only send 13 messages/5ms -> 2.5ms reading assures this.

    You need to disable interrupts before start reading the message objects.
    > Interrupt is not used, 2.5ms loop polls the fifo

    In your test, you can set one GIO pin high before reading and set it low after reading is complete.
    In this way, you can see on the scope how often reading occurs and if each reading can be complete within 2.5ms.
    This I/O pin toggling is required to prove that there is no other issue in the system.

    Please do the following in reading.

    (1) Check the NewDat flags. Assume that NewDat bit is set for 11 objects (from 1 to 11).
    (2) Read the message objects from the lowest ID.
    (3) After reading the last object (11 here), check if NewDat bit is set for the next object (12 here). If set, the message comes in after the NewDat is checked and before object 1 is read. Read the object. I believe that this is the step you are missing.

    Do not check NewDat bit for all message objects in step 3. If there is any new incoming message after object 1 is read, it will saved to object 1. You can read it it in the next 2.5ms period. For this mechanism to work, you need to be sure that reading speed is faster than the message incoming speed. This is why interrupts need to be disabled during reading. To ensure the speed in reading, do not do any data processing in the reading function.
    > This is exactly what we did and I only modified our implementation to proove you, that the message object is stored in the FIFO but in the wrong place. Again, please read our previous posts.
    > I would like to ask you to set up a measurement with the above parameters and check it with your expert before we continue this discussion.

    Thank you.

  • Dear Zhaohong,

    I was experimenting with different approaches and although havent found a solution yet I could conclude the followings:
    - Reading the FIFO element and one step later clearing it's NEW_DAT flag > increases the chance of having a wrongly placed message,
    doing it in one step (reading the FIFO with NEW_DAT set), improves the robustness
    - Adding atomic block (interrupt lock) to FIFO element read > reduces the chance of having wrongly placed message
    - Adding atomic block to the whole FIFO read > significantly reduces the chance of having wrongly placed message


    Although these methods greatly increases the robustness I still see some random inserted messages, and this is verified
    by the NEWDAT12 register (summary of FIFO mailboxes' NEW_DAT flags).
    I implemented a logger to save all the bits and NEWDAT12 at each and every read of a mailbox, the logs shows that
    still a new message gets inserted at random places.

    The following speudo code explains the trace dump below better than anything:

    clear_array(new_dat);
    clear_array(new_dat_fifo);

    for( i = 1; i < 24; i++ )
    {
    ...
    new_dat[i] = NEWDAT12;

    ifx = read_fifo(i);

    new_dat_fifo[i] = (ifx & NEW_DAT) << i;

    if((ifx & NEW_DAT) == 0)
       break; // end of fifo -> exit

    ...
    }

    // With this method I ended up having the following log:
    new_dat_fifo = (
    2 = 0x00000002 = 0y00000000.00000000.00000000.00000010,
    4 = 0x00000004 = 0y00000000.00000000.00000000.00000100,
    8 = 0x00000008 = 0y00000000.00000000.00000000.00001000,
    0 = 0x00000000 = 0y00000000.00000000.00000000.00000000,
    0 = 0x00000000 = 0y00000000.00000000.00000000.00000000,
    0 = 0x00000000 = 0y00000000.00000000.00000000.00000000,

    new_dat_x = (
    134 = 0x00000086 = 0y00000000.00000000.00000000.10000110,
    134 = 0x00000086 = 0y00000000.00000000.00000000.10000110,
    132 = 0x00000084 = 0y00000000.00000000.00000000.10000100,
    128 = 0x00000080 = 0y00000000.00000000.00000000.10000000,
    0 = 0x00000000 = 0y00000000.00000000.00000000.00000000,
    0 = 0x00000000 = 0y00000000.00000000.00000000.00000000,

    can_id_expected = 544 = 0x0220 < This is where my breakpoint was hit because the expected CAN id wasnt found.

    // dump of CAN memory:
    ____address|_0________4________8________C________0________4________8________C________
    SD:FF1C0000| 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    SD:FF1C0020| 00000000 C0000000 088C0000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0040| 00000000 C0000000 08840000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0060| 00000000 C0000000 08880000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0080| 00000000 C0000000 08600000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C00A0| 00000000 C0000000 08640000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C00C0| 00000000 C0000000 08680000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C00E0| 00000000 C0000000 086C0000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0100| 00000000 C0000000 08800000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0120| 00000000 C0000000 08740000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0140| 00000000 C0000000 08780000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0160| 00000000 C0000000 074C0000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C0180| 00000000 C0000000 1C380000 00001008 04030201 08070605 00000000 00000000
    SD:FF1C01A0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C01C0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C01E0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C0200| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C0220| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C0240| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C0260| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C0280| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C02A0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000
    SD:FF1C02C0| 00000000 C0000000 00000000 00001000 00000000 00000000 00000000 00000000

    CanID = 544 in CAN memory equals to 0x880 (found in #FF1C0100), indicated by "new_dat_x" correctly,
    the SW just could retreive it because I was reading till the last set NEW_DAT and only checked the next element.

  • Currently, we do not have a set up to repeat your test.

    (1) Would you please share your CCS project for us to understand exactly what you are doing?
    (2) Please add the I/O toggling in your test to ensure that there is no timing issue.

    Thanks and regards,
  • Could you tell me why you insist to the IO toggling?
    If we are sending packets with a rate of 13packets/5ms and checking the 24long FIFO in every 2.5ms
    we would notice if the period time is not correct (having more messages than 13 would indicate this).
    So I dont really understand what the IO toggling or the cycle time has to do with this issue.

    Nevertheless, I did it.
    Cycle time is correct and stable.
    Also the Vector tool is sending the 13 packets / 5ms correctly (as you would expect from 2 already certified tools).

    If others (see my forum reference above) using another processor having the same DCAN periphery has experienced the same issue,
    how can this be a TMS570 configuration related?

    Currently I can not hand over the CCS project as we have our own makefile system, but open to dump any DCAN memory or
    configuration you would like to see.

  • Would you please share the source code for setting up the FIFO buffers and the different ways you used to read the FIFO buffers?

    Thanks and regards,

    Zhaohong
  • The correct way to use the mailboxes as FIFO is already given by Zhaohong - step by step - in the first suggested answer. Give that a try.
    CAN controller will transfer data to the lowest available mailbox. NEWDAT12 register reflects the status the first 32 mailboxes, why wait for the IFX transfer to find the status of each mailbox?
  • According to the datasheet:

    "
    When the CPU initiates a data transfer between the IF1/IF2 Registers and Message RAM, the Message
    Handler sets the Busy bit in the respective Command Register to ‘1’. After the transfer has completed, the
    Busy bit is set back to ‘0’ (see Figure 24-7)."

    The "ReadMessageFifo"  uses the datafields of the "not anymore busy IFX", we only transfer the data once.

    But I'm open to simplifications, if you know any other documented method that needs no waiting in "normal mode" I will happily try it out.

  • I am not sure what you meant by "documented" method. Although TRM mention "FIFO" but, you know that it is "not" a true FIFO but, just a buffer with some overflow handling. With the steps given in the suggested answers, you can handle this buffer in the order it is received (kind of a FIFO ;-)

    • Consider that you have the first 32 mailboxes (NEWDAT12 gives the Status of all of them)
    • Also consider that the bus is at 1Mb and STD messages with 8 byte payload always => ~100 uS => 25 packets in 2.5 mS
    • Since your transmission involves 13 pkts every 5 mS, receiver should be able to keep up with just 13 buffers as long as the polling loop < 5 ms including the data processing.

    1. The polling loop should first read NEWDAT12 register to a local "Data-Ready" variable.
    2. The data copy loop should iterate through the "Data-Ready" bits starting at LSB and transfer into your own Q (the real FIFO ;).
    3. Read NEWDAT12 again and transfer any new higher mailboxes handled in the previous step. Note that this data was transferred after you read NEWDAT12 first and before making any lower mailboxes available.
    4. Process your real FIFO

    My preferred way to handle this situation is to use a dma channel, transfer data as it arrives into a Q and process it in the polling loop or generate an interrupt to let you know after receiving “n” (13) pkts to start processing the Q.
    Good Luck,
    Joe
  • I forgot to mention that the copy loop routine should not be interrupted for this to work correctly. If you get interrupted after you free up the MB#1 for example and the FIFO is filled out to MB#13, next packet received will end up in MB#1 and the packet received after that will end up in MB#14, since you haven’t cleared MB#2-13 yet, not something you want to handle!
  • Thank you for your answer,

    The FIFO is 24 long because in 2.5ms there is no chance to receive that many messages,
    it is just an example implementation to reproduce the issue and prevent any FIFO overflow (i made it really sure to avoid anything related to FIFO ovfl).

    As you can see in my attached sample code, this is exactly what I'm doing:
    - ATOMIC block read of the _COMPLETE_ "FIFO" _till_ new messages are in the FIFO, on first case of the new dat is not set -> exit

    If the DCAN implementation is according to the datasheet, EVEN if there is an incoming message
    while reading the "FIFO", by freeing the MB1 the DCAN should place the new message there.
    There is NO way to receive another (second) message during the reading, WITHOUT having the MB2 freed.

    Checking only the NEWDAT12 is making things worse.
    According to my experience, NEWDAT12 is way slower updated, than reading the actual FIFO status.

    What I have now currently is using both (NEWDAT12 and MB.NEWDAT) the following pseudo code,
    this works pretty stable but still in some cases the error appears:

    0640.ti_fifo_pseudo.c

  • And please, post an approved method how I should read the FIFO (register settings and loop).
  • Just to give you a hint what is really happening, I made a software logger where I'm saving each and every cycle's FIFO read result.
    Here is a nice piece of log.
    [cycle -1] shows the cycle before the error (breakpoint was hit)
    [cycle 0] shows the state when the breakpoint was hit:

    6431.ti_fifo_example.txt

    there is absolutely now way you can explain why the id 166 was placed in the 3rd MB!

  • Took a look at your data. Please see if my observation is correct.

    In your data, cycle -1 data is the CAN data acquired in the 2.5ms loop. Cycle 0 data is the data acquired in the next 2.5 ms loop. Assume that the lowest mailbox ID in the FIFO buffer is 1.

    At cycle -1:
    Message with ID 156 is saved to mailbox 1.
    Message with ID 157 is saved to mailbox 2.
    Message with ID 158 is saved to mailbox 3.
    Message with ID 159 is saved to mailbox 4.
    Message with ID 160 is saved to mailbox 5.
    Message with ID 161 is saved to mailbox 6.
    Message with ID 162 is saved to mailbox 7.
    Message with ID 163 is saved to mailbox 8.
    Message with ID 164 is saved to mailbox 9.

    From the code you sent earlier, the status and data are cleared in sequence starting from mailbox 1.

    At cycle 0
    Message with ID 165 is saved to mailbox 1.
    Message with ID 167 is saved to mailbox 2.
    Message with ID 166 is saved to mailbox 3.
    Message with ID 168 is saved to mailbox 4.

    If message with ID 166 came after the message with ID 165, it should be saved to mailbox 2 instead of mailbox 3.

    If the understanding of your data is correct, I do not think that there is an arbitration issue in the DCAN module because mailbox 2 is cleared earlier than mailbox 3. There must be something else wrong. Would you please share the definition of the data struct you defined for DCAN registers and the compiling option you used? What run-time library do you use? If you have a CAN analyzer, I would suggest using it to monitor the CAN bus for cross reference.

    Thanks and regards,

    Zhaohong
  • Zhaohong is correct, cannot explain the behavior, there is something wrong with the setup. Definitely not the same issue as before.

    Also the pseudo code is not checking the correct bit of NEWDAT12 register for a given MB, you are checking one higher - explains your experience of bit getting updated late (until the next MB gets filled).

    Please check this modified code :
    {
    atomic_start(); // disable interrupts?
    {
    l_NewDat12 = NEWDAT12; // read NEWDAT12 register first
    for (i = 0; i < 24; i++) // Start from Bit0 => lowest MB
    {
    if (l_NewDat12 & (1 << i)) // work on the copy of NEWDAT12
    {
    mb_read(i+1, buffer); // Read MB into buffer, set flags to clear NEW_DAT during the ifx transfer.
    }
    else
    {
    break; // done with NEWDAT12 first pass
    }
    }
    if (i < 24)
    {
    l_NewDat12 = NEWDAT12; // read NEWDAT12 register again
    for (; i < 24; i++) // read any new higher MB only to keep FIFO
    {
    if (l_NewDat12 & (1 << i))
    {
    mb_read(i + 1, buffer); // Read MB into buffer, set flags to clear NEW_DAT during the ifx transfer.
    }
    else
    {
    break; // done with NEWDAT12 second pass
    }
    }
    }
    else
    {
    // Overflow?
    }
    }
    atomic_end();
    }
  • @Zhaohong: Yes, you understood the problem correctly, this is what we also do not understand HOW this can happen if the DCAN works as you described (that a new message is always inserted to the highest priority (lowest indexed) mb.

    @Joe: I think I'm correct with checking the bits in the NEWDAT12. Please re-check the datasheet (there is no mb #0, also newdat12 has valid bits in range [31:1])
    but to prove it i'm also logging the newdat and the fifo NEW_DAT bits in each readout of a mailbox, the attached text shows an errornoues breakpoint hit again.

    - Data is the same [-1] and [0] cycles. Here the 1817 is located at the end of the FIFO stream in cycle [0]
    - The "new_dat_fifo" array: each element stores the NEW_DAT bit which was read from the mailbox (and set to its proper position, the index of the bit represents the mailbox index.
    - The "new_dat" array is similar but im saving the NEWDAT12 before reading out a MB.

    In the example you can see that in cycle [0] I read 5 elements, 5 bits are set in the FIFO's MB.NEW_DAT, but NEWDAT12 is incosistent(!) with the FIFO.
    As I see it either I'm (and probably the other thread with a different CPU also) facing a huge misscompilation bug or simply the DCAN interface has some issues under this burst load.

    8688.ti_fifo_example_2.txt

    We are using canalyzer to generate the packages and also have cross-checked the CAN already, that was the first thing we verified that the CAN messages are sent correctly and it is not a setup failure, actually we are investigating this problem because we had issues during development of a new system where totally different setup was used and we saw that the messages are/might wrongly placed in the FIFO..

    @Zhaohong: Could you clarify what you meant by data struct of the DCAN registers? Full DCAN register configuration dump? I dont know how to dump this nicely but i was able to copy the trace32 periphery setup to a file, hope this is what you are looking for (note that the CPU was halted on a breakpoint when I copied this data):

    6327.ti_dcan_config.txt

    Runtime library? Nothing special that I know of.
    Compiler options (hope this is enough):
    3581.ti_compiler_options.txt

  • @Zhaohong- You are suggesting something like a circular buffer here. Nice :)
  • @michael- Take a look at the answer given by Bob here. e2e.ti.com/.../501318
  • Are you sure that the message at the wrong place in the FIFO is not as a result of the CAN protocol itself(i.e retransmission). Check the error counters just to make sure.
  • Apologize, NEWDAT12 indeed is wrongly handled, bit0 reflects the state of MB#1.NEW_DAT.
  • (1) Can you add the mailbox ID in your data log? It will confirm what we discussed earlier?
    (2) Can you log data from the CAN analyzer so that we can compare the data captured by the CAN analyzer and the data captured by TMS570 for the same scenario? This very important in confirming the issue.
    (3) From your compiling options, you are using the eabi runtime library. Can you check the assembly code generated for the following instructions?

    ifx->IFCMD = CAN_BASE_IFCMD_MASK
    | CAN_BASE_IFCMD_ARB
    | CAN_BASE_IFCMD_NW_DAT
    | CAN_BASE_IFCMD_CTRL
    | CAN_BASE_IFCMD_CLR_INT_PND
    | CAN_BASE_IFCMD_DAT_A
    | CAN_BASE_IFCMD_DAT_B;

    ifx->IFNO = i_u8;

    Please be aware that the above two operations write data into different byte location of the same register. If the above are separate writes, they should be byte writes (STB) for the operation to work correctly. The assembly code generated by the compiler depends on how the data is defined and the run-time library used. We have DCAN test code which works correctly with ARM9 run-time library. When we switched to the EABI library, all byte writes are changed to word (32 bit) writes for the way we defined the registers in a data struct. That is why I would like to how DCAN register is defined in your ifx struct. I just want to see the definition.

    Thanks and regards,

    Zhaohong
  • Zhaohong,

    I also invested a lot of time to figure out what's going on (similar to what user4642853 did).

    Here's my conclusion:
    When receiving bursts of data frames and the FIFO is being read during that bursts it sometimes happens that newly received frames ARE DEFINITELY NOT placed to the lowest mailbox number (1). When reading a FIFO that is filled to the Nth element, sometimes a new frame appears at the N-1th position.

    I tried many things with polling the NEWDAT before and doing a second round but nothing helps. So I'm only using the ITF is the NEWDAT indicates there is dome data. Here's an example where I've been reading a FIFO from mailbox 1 to 4 (4 was the last element). Afterwards there's suddenly a new message in mailbox 3. Mailbox 1 and 2 are empty... :-( 5340.Mailbox3_CAN.txt

    I should mention that in this case, the FIFO reading operation is interrupt triggered. The RxIE is set for all the FIFO's elements. In the ISR, the FIFO is always being read starting from position 1. If the interrupt register tells me something different than 1, I now that I caught the error case again. This is something to easily detect the fault. Maybe it's interesting for you to reproduce... If the FIFO reading is working perfectly fine, the interrupt register shall NEVER EVER report something else than 1 (where my one used FIFO starts that has interrupts enabled).

    I also verified that the compiler is using STRB instructions. So this is also fine:

            3243|                        ifx->IFCMD = CAN_BASE_IFCMD_MASK
     SR:0012A298|E3A0C07F____________mov_____r12,#0x7F________;_r12,#127
     SR:0012A29C|E5C6C001            strb    r12,[r6,#0x1]
                |                                   | CAN_BASE_IFCMD_ARB
                |                                   | CAN_BASE_IFCMD_CTRL
                |                                   | CAN_BASE_IFCMD_CLR_INT_PND
                |                                   | CAN_BASE_IFCMD_NW_DAT
                |                                   | CAN_BASE_IFCMD_DAT_A
                |                                   | CAN_BASE_IFCMD_DAT_B;
            3250|                        ifx->IFNO = i_u8;
     SR:0012A2A0|E5C67003            strb    r7,[r6,#0x3]     ; i_u8,[r6,#3]

    Here's the register definition:

    typedef volatile struct CSM_CAN_IF12
    {
    #if (CPU_BYTE_ORDER == LOW_BYTE_FIRST)
        uint8       IFNO;          /**< IF1/2 Command Register, Msg Number */
        uint8       IFSTAT;        /**< IF1/2 Command Register, Status     */
        uint8       IFCMD;         /**< IF1/2 Command Register, Command    */
        uint8       reserved5;     /**< IF1/2 Command Register, Reserved   */
    #endif
    #if (CPU_BYTE_ORDER == HIGH_BYTE_FIRST)
        uint8       reserved5;     /**< IF1/2 Command Register, Reserved   */
        uint8       IFCMD;         /**< IF1/2 Command Register, Command    */
        uint8       IFSTAT;        /**< IF1/2 Command Register, Status     */
        uint8       IFNO;          /**< IF1/2 Command Register, Msg Number */
    #endif
        uint32      IFMSK;         /**< IF1/2 Mask Register                */
        uint32      IFARB;         /**< IF1/2 Arbitration Register         */
        uint32      IFMCTL;        /**< IF1/2 Message Control Register     */
        uint8       IFDATx[8U];    /**< IF1/2 Data A and B Registers       */
        uint32      reserved6[2U]; /**< Reserved                           */
    } CSM_CAN_IF12_t;

    Any ideas?

    Michael

  • Zhaohong,

    here's another error case....

    I got an interrupt and the FIFO is read from the beginning (1) to the last mailbox with new data, indicated by the NEWDAT register, which is number 4.
    Afterwards, another interrupt is thrown, telling me in the interrupt register, that there's a message waiting again (!!!) in mailbox number 4.

    I really want to avoid to 'grab' in the middle of the FIFO in this case, as it would be not very clean and doesn't work at all for the polling mode.

    I attached a full peripheral dump.6683.Mailbox4_CAN.txt

    Regards,
    Michael

  • One thing to note:
    In my peripheral dump, the debugger is doing an invalid interpretation of one register. Inside IF1MCTRL, the debugger shows the EoB as 'Single/Last' if it is 0. The opposite is true. Don't be confused by that...
  • Michael,

    I think that the message (which causes the new interrupt) came in before you start reading the FIFO buffers. If my guess is true, you observed expected behavior. I would suggest you changing the data in each message so that you know if you have missed any. You can also use I/O pin toggling to show the the timing. If you can use an CAN analyzer to monitor the CAN bus, please always compare the data capture by the analyzer and the data captured by TMS570. Can you share your project for us to see exactly what you are doing?

    Thanks and regards,

    Zhaohong
  • Michael,

    I discussed this thread with Charles and Zhaohong. We could not understand the behavior you are seeing so I tried to recreate it. I think I have recreated the problem and can explain what is happening. I used two TMS570LS3137 devices talking to each other in the CAN bus at 500K baud. One device is simply sending frames out with the same ID and EoB cleared. The payload is a simple 32 bit number that is incremented with each frame. The other device has a 16 frame FIFO. I use the RTI to generate an interrupt every 1.5mS. In each interrupt the device reads the FIFO starting with mailbox 1 and incrementing mailboxes until it finds an empty one. It checks that the value received is always one more than the pervious value. If it finds a value out of order, the program hangs in a loop. If it detects an overrun (in mailbox 16) it also hangs in a loop.

    At first I did not detect any errors. Then I halted the receive CPU with the debugger and checked that it was still running properly. Restarting the receive CPU did not cause an error because with only two devices on the net, the send device kept resending the same frame until the receive device resumed from suspend and gave the acknowledge. After stopping and starting the receive CPU with the debugger several times, I was able to catch an error.  The receive CPU read a frame that was 2 greater than the previous frame. The missing value was in a different mailbox location that was totally out of sequence.

    I repeated the results several times. It was always necessary to suspend and resume the receive CPU with the debugger for me to make this error occur. It appears that suspending the DCAN state machine while a CAN message is coming in can cause the state machine to load the message into the wrong mailbox. (Note: I am using the default from HALCoGen that IDS (DCAN CTL bit 8) is 0. Suspend happens at the end of a transmission or reception.)

    To verify that the FIFO works properly when not suspended by a debugger, I have the two units talking to each other with no debugger attached. I am monitoring the state by watching the frequency of the ECLK pin with an oscilloscope. So far it has run an hour with no errors. I will let it run overnight and then post an update.

    I have attached the two projects I used. I used CCS Version: 6.1.1.00022, HALCoGen Version 04.05.02 and compiler version 5.2.6.

    /cfs-file/__key/communityserver-discussions-components-files/312/8132.DCANFIFO.zip

  • Thank you for your answer.

    It is good to see that - although its not the exact same - you have managed to recreate the issue.
    The problem is that we do not halt the CPU, only in case of an error (application breakpoint was set where the ID is not as expected).
    Our guess is that - since we have other peripheries running in the background using DMA - the CPU bus is overloaded (by a DMA) in the background and yields in an "interrupted" DCAN access.

    I will try to flash you receive application and check it with the CANalyzer.
    If it works then it is certainly some background issue that we do not see currently!

    Best Regards,
    Tamas

  • OK, the search continues. (I thought I had found it.) As an update, my routine has now run over 14 hours without an error. At least in my test the FIFO mode is working as expected. I cannot see how DMA would affect the CAN state-machine. The state-machine to DCAN RAM interface is not affected by DMA accesses. The software doesn't set the INIT bit in the DCAN CTL register other than the one time you call canInit(), right? That would also interrupt the CAN state-machine.

  • Wohooo, I managed to reproduce the issue with your code! Now we have at least a common ground.

    Setup:
    - TI Hercules board
    - CAN analyzer (Vector VN1630) sending 13 msgs/ 5ms (same as before), message order is cross&double checked with PCAN-View
    - Project was modified: Since I'm checking the CAN IDs I hadd to add the ID read to canGetData (see attachment).

    You can see that in "ids" (attached txts shows watch window and can memory dump) there is a message "345", wrongly placed.
    We have tried it out with and without the debugger attached, same behaviour,
    the time needed to the error to occur varies between 1s-30s after starting the CAN transmission.

    What I will try next is to flash another TMS570 with the send example, modify it to use incrementing ID and check if it causes this,
    maybe the sender behaves differently and that is why it works on your side(?).

    PS: what we meant is that while DMA uses the internal CPU bus, there will be a delay between reading and writing
    the CAN interface registers (if the CPU internal bus load is huge)

    6013.DCAN1_MBOX_REG_AREA.txt8360.watch_window_state_of_bp_hit.txt

    7065.DCANReceive.zip

    Thanks,

    Tamas

  • Oh and 2 additional info:

    - I am using standard format messages (not extended, mailboxes are configured to standard)

    - your project was compiled with CSS and latest compiler, I only converted the out file to flashable HEX. While our code (the one where we detected the issue) is compiled using custom makefile system and 5.1.6 TI compiler. So we can also rule out the compiler or compiler settings as well.

  • I've just tried out your sender example. Bit modified to fit our test setup: instead of incrementing payload, the ID is incremented.
    Using the above attached receiver, and the modified sender (see attachment below) the issue is 100% reproducable.

    0815.DCANSend.zip

    Next step: I will try out your original implementation to play with the payload, maybe this issue only occurs when mb is configured to accept all CAN IDs and the ID varies(?).

  • Update: I tried out to use a fixed CAN ID (tms board as tx - our device as rx) and use the payload to transmit the incremented value.
    Same issue, after a while there is a missplaced can message.

    Update2: I think I've found something. I set up 2 TMS Hercules boards, used your original code and this was indeed running. But then, I wanted to have some debuggability (logging messages and etc) I again ran into the issue. So I started to chop and get rid of every change I made in your original project.
    I ended up having a single tiny difference which clearly can cause the issue.

    So if you could just modify the internal check of your "checkCanMessages" function like pasted below (where lastCount used as "expected id"), then you should see that the endless loop is hit:

    else if (status == 1)
    {
        count = candata[0] + (candata[1]<<8) + (candata[2]<<16) + (candata[3]<<24);
    
        if (lastCount == 0)
        {
            lastCount = count;
        }
    
        if(count != lastCount)
        {
            while(1)
            {
                systemREG1->SYSPC4 ^= 1U;  // Toggle ECLK, stay here (50% duty cycle)
            }
        }
    
        lastCount++;
    }

    Dont ask me what is the difference here, the only explanation we could come up is some timing. Your implementation requires ~51 cycles to perform one loop while mine takes 97 or so. :)