This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320F28335: eCAN memory overwrite issue

Part Number: TMS320F28335
Other Parts Discussed in Thread: C2000WARE

Hello,

I am having an issue with CAN inbound mailboxes apparently overwriting other variables when a message is received. 

In the reference guide , it is stated that 

The message mailboxes are the RAM area where the CAN messages are actually stored after they are
received or before they are transmitted.
The CPU may use the RAM area of the message mailboxes that are not used for storing messages as
normal memory.

1. I was wondering under which conditions the CPU will attempt to use the mailboxes RAM area, especially if a mailbox is opened sometime during the execution of the program.

2. I am also wondering if there is a way to ensure that the mailbox RAM will not be used by the CPU in order to avoid conflicts. 

 I would be very grateful for any help with these issues.

Thank you.

Edit: we found a bug in a function that processes a particular CAN message, the problem was not due to misconfiguration of the CAN module.

  • What part of RAM are these "other variables" stored?

    The mailbox RAM of the CAN module may be used as general-purpose RAM, if the CAN module is not used in an application. Even for this, the mailbox RAM must be defined in the linker-command file and sections explicitly assigned to it. If the CAN module is enabled, the mailbox RAM is under the control of the CPU or the CAN module depending on whether the RAM locations are part of a transmit mailbox or a receive mailbox.

    The only time a CPU will be able to write to mailbox RAM is for a transmit mailbox , under program control. The CPU on its own does not use any RAM area, so there is no room for conflict.
  • Thank you very much for your reply. 

    1. What part of RAM are these "other variables" stored?

    Unfortunately I am not quite sure how to determine this.

    One of the variables that I have confirmed to have been corrupted is a constant array of strings defined in a header file. Inspecting the .asm output I believe the data is located in a section .econst defined as:

     .econst                 : > FLASH,              PAGE = 0

    with the FLASH block defined in MEMORY as

    FLASH           : origin = 0x30000c, length = 0x037FF0

    which seems fine to me (the ECAN blocks are located between 0x6100 and 0x63ff and match the layout described in the specification document). 

    2. Looking at the linker-command file I can confirm that no memory block overlaps with the ECAN blocks.

    However, I noticed that some regions are defined for both PAGE 0 and PAGE 1. For example, 

    PAGE 0: 
    RAMM : origin = 0x000000, length = 0x000800

    PAGE 1: 
    RAMM : origin = 0x000000, length = 0x000800

    with another 5 or 6 regions defined in the same way in both PAGE 0 and PAGE 1. 

    Is it OK to do so? The F28335.cmd file warns that "the same memory region should not be defined for both PAGE 0 and PAGE 1". 

    Thank you very much.

  • If you can find out the address of the variables that are getting corrupted, you could examine the memory map output (xxxx.map file) or the linker command file to determine the section allocation.

    Flash memory cannot get "corrupted" the way a RAM location could. A very specific/precise sequence of low-level commands must be executed to erase/program a flash location.

    CCS would give an error during the compile/link process if there are over-lapping sections of memory within a PAGE. Could you use the linker command file released as part of C2000ware? Would you be able to copy/paste the contents of the file in your post?

  • Regarding your question #2 (regions mapped in both PAGE 0 and PAGE 1), this is not allowed. The linker can’t detect this unfortunately, so CCS won’t complain. Note that the 28x follows the unified-memory model. i.e. there is no separate Program/Data/I-O memory, like we had in some older (24x/24xx) c2000 devices.

  • Thank you very much for your invaluable help. I am not sure whether I would be allowed to post the entire .cmd file here - I will gladly do so if I can. 


    >> Flash memory cannot get "corrupted" the way a RAM location could. A very specific/precise sequence of low-level commands must be executed to erase/program a flash location


    This is interesting. The constant strings themselves are defined in a header file and referred to by a pointer in the application  along the lines of

    [.h file]
    const char strings1[n1][n2][n3] = {{ ...}, ... {...}};

    [.c file]

    const char (*pStrings1)[n1][n2][n3];

    pStrings1 = &strings1;

    ...

    Thus, if I understand your comment correctly, it is much more likely that the data being corrupted is the value of the pointer pStrings, rather than the content of array strings1 itself. Is this correct? This would be very helpful to pinpoint the root cause of the problem. 

    >> Regarding your question #2 (regions mapped in both PAGE 0 and PAGE 1), this is not allowed

    As far as I understand, this means that defining the same region in both pages can potentially lead to different data being mapped to the same address. What I can say is that in our .cmd section definitions, each page is consistently linked to either PAGE 0 or 1 but not both; for example there is no example of

    .cinit : > FLASH, PAGE = 0
    .pinit : > FLASH, PAGE = 1

    which, as far as I understand, is where trouble would really begin. I've also thoroughly inspected the .map and linkInfo.xml files and could not find any conflicts in memory mapping, so I am starting to think that the problem lies elsewhere than in the .cmd file. 

    Nevertheless, I don't see any good reason to define the same region in both pages and will suggest that we fix this for safety. Please correct me if I am wrong here. 

    >> the 28x follows the unified-memory model. i.e. there is no separate Program/Data/I-O memory, like we had in some older (24x/24xx) c2000 devices

    So this basically means that the page number is not important, and that the separation in page 0 & 1 is mostly used for stylistic/clarity purposes ? 

    I noticed that the section .CsmPwlFile has been moved from page 1 to page 0 in our code, relative to the sample file DSP2833x_Headers_nonBIOS.cmd, i.e., we have

    .CsmPwlFile             : > CSM_PWL,            PAGE = 0

    which I found slightly odd, but in light of what you remarked I presume that this is of no consequence by itself.

    Thanks again and happy holidays,

    Laurent Badel

  • Thus, if I understand your comment correctly, it is much more likely that the data being corrupted is the value of the pointer pStrings, rather than the content of array strings1 itself. Is this correct? This would be very helpful to pinpoint the root cause of the problem. 

    Answer --> If the strings are stored in flash, that is likely to be the case. As mentioned before, the flash contents are altered only by executing the flash API in a precise sequence (Erase followed by program). Application code cannot "casually" write to flash like it could to RAM. A pointer corruption cannot alter flash contents the way it could corrupt RAM.

     

    Nevertheless, I don't see any good reason to define the same region in both pages and will suggest that we fix this for safety.

    Answer --> Correct.

  • In case you are not aware of it already, this page gives some good information: processors.wiki.ti.com/.../C28x_Compiler_-_Understanding_Linking
  • Laurent,
    I haven’t heard from you for a while, so I’m assuming you were able to resolve your issue. If this isn’t the case, please reject this resolution or reply to this thread. If this thread locks, please make a new thread describing the current status of your issue.
  • Thank you for the message; unfortunately the issue isn't solved yet, but it is likely that it is not due to the CAN bus after all. It turned out we had two unrelated issues, one of which was a simple bug in the handling of a particular CAN message; the other is still a mystery.

    I am now looking into the possibility of a stack overflow and related to this, I am wondering if the stack frame of ISRs is pushed directly on the current stack or whether there might be a separate stack for these.

    Thank you

  • There is only one stack. Please refer to SPRU430 for more information about the stack. http://www.ti.com/lit/an/spra820/spra820.pdf provides helpful tips to resolve stack issues.

     

    Please close the original post. If a new/different issue is identified, please open a new post.

  • I will close the post. Thank you very much for your kind assistance.