This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6678: PA multicore example: handling packets larger than 300 bytes

Hello,

I am attempting to setup a basic UDP loopback server that runs on all 8 cores on a C6678 chip. I am working from the PA_multicoreExample project in PDK 1.1.2.6.

I noticed that the TX_BUF_SIZE and RX_BUF_SIZE defines are set to 304 bytes which seems to require received packets to be 300 bytes or less. I changed the RX_BUF_SIZE to 1518 so that I can send packets as large as the mtu. When I only used 1 core this worked, but when I try to get all 8 cores running, cores 1-7 fail to setup PASS and print the message:

Error obtaining a Tx free descriptor
PASS setup failed

When I also set TX_BUF_SIZE to 1518 core 0 fails to setup PASS and prints the message:

Add_IPAddress:Found an entry in PA response queue with swinfo0 = 0x00010000, expected 0x55550001
PASS setup failed

What is the simplest method for modifying PA_multicoreExample_exampleProject to allow for larger packets?

Regards,
Chris Johnson
Signalogic, Inc

  • Another note on this problem,

    I had set the NUM_CORES define to 8. When I set it back to 4, it looks like all cores still work okay with the default RX_BUF_SIZE value. With NUM_CORES defined to 4 and RX_BUF_SIZE defined to 1518, I get a different error message from my first post on cores 1-7: 

    Timeout waiting for reply from PA to Pa_addPort command
    PASS setup failed

    Is there a simple way to allow for larger packet sizes in PA_multicoreExample_exampleProject? 

    Chris

  • Chris,

    Can you set the values for MAX_NUM_PACKETS

    Did you modifyTX_BUF_SIZE  and RX_BUF_SIZE only?

    #define TX_BUF_SIZE (((300+15)/16)*16)
    #define RX_BUF_SIZE TX_BUF_SIZE

    /* Number of packets to be used for testing the example. */
    #define MAX_NUM_PACKETS 10u

  • Pubesh,

    I have MAX_NUM_PACKETS defined to 50. What effect does this value have on the size of packets that can be received?

    I modified the test part of the project so that it sits in an infinite loop calling ReceivePacket() immediately after the point where it waits for all cores to reach the barrier, so I don't think MAX_NUM_PACKETS is even used anymore. 

    With the default settings I can receive some packets that are larger than 300 bytes, but eventually I stop getting packets (Qmss_getQueueEntryCount() stops returning anything other than 0). The number of packets I am able to receive depends on how big the packets are, but it is at most 127 packets.

    Chris

  • Chris,

    I have requested your query to the mcsdk team and let you know the status.

  • I tested the PA_multicoreExample_exampleProject inside PDK 2.1.2.6 on a C6678 EVM. The code is changed as follows in cppi_qmss_mgmt.c:

    #define TX_BUF_SIZE   (((1518+15)/16)*16) 
    #define RX_BUF_SIZE   TX_BUF_SIZE 

       //dataBufferSize  =   sizeof (pktMatch);
        dataBufferSize  = 1510;

    The NUM_CORES is 4. I didn't see any problem. Attached is a screenshoot at switch registers level, showing 0x28 = 40 decimal packets sent, with 0xec90 bytes = 60560 = 40 * (1510+4).

    Regards, Eric

    ************************************************

    Waiting for global config...

    [C66xx_0] ************************************************

    [C66xx_1] ************************************************

    [C66xx_0] *** PA Multi Core Example Started on Core 0 ***

    [C66xx_1] *** PA Multi Core Example Started on Core 1 ***

    [C66xx_0] ************************************************

    [C66xx_1] ************************************************

    [C66xx_0] Initializing Free Descriptors.

    [C66xx_1] Waiting for global config...

    [C66xx_0] QMSS successfully initialized

    CPPI successfully initialized

    PASS successfully initialized

    Ethernet subsystem successfully initialized

    Tx setup successfully done

    Rx setup successfully done

    PASS setup successfully done

    Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_1] QMSS Local successfully initialized

    [C66xx_2] QMSS Local successfully initialized

    [C66xx_3] QMSS Local successfully initialized

    [C66xx_1] Rx setup successfully done

    [C66xx_2] Rx setup successfully done

    [C66xx_3] Rx setup successfully done

    [C66xx_1] PASS setup successfully done

    [C66xx_2] PASS setup successfully done

    [C66xx_3] PASS setup successfully done

    [C66xx_1] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_2] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_3] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_1] Packet Transmission Start ...

    [C66xx_2] Packet Transmission Start ...

    [C66xx_3] Packet Transmission Start ...

    [C66xx_1] Packet Transmission Done.

    [C66xx_2] Packet Transmission Done.

    [C66xx_3] Packet Transmission Done.

    [C66xx_1] Wait for all packets to be Received ...

    [C66xx_2] Wait for all packets to be Received ...

    [C66xx_3] Wait for all packets to be Received ...

    [C66xx_1] Core 1: Packets Sent = 10

    [C66xx_2] Core 2: Packets Sent = 10

    [C66xx_3] Core 3: Packets Sent = 10

    [C66xx_1] Core 1: Packets Received = 10

    [C66xx_2] Core 2: Packets Received = 10

    [C66xx_3] Core 3: Packets Received = 10

    [C66xx_1] **********************************************

    [C66xx_2] **********************************************

    [C66xx_3] **********************************************

    [C66xx_1] *** PA Multi Core Example Ended on Core 1 ***

    [C66xx_2] *** PA Multi Core Example Ended on Core 2 ***

    [C66xx_3] *** PA Multi Core Example Ended on Core 3 ***

    [C66xx_1] **********************************************

    [C66xx_2] **********************************************

    [C66xx_3] **********************************************

    [C66xx_0] Packet Transmission Start ...

    Packet Transmission Done.

    Wait for all packets to be Received ...

    Core 0: Packets Sent = 10

    Core 0: Packets Received = 10

    Wait for all packets to be Received in all cores...

    Test passed on core 0

    Test passed on core 1

    Test passed on core 2

    Test passed on core 3

    **********************************************

    *** PA Multi Core Example Ended on Core 0 ***

    **********************************************

    Also attached my .out file, you need load to cores 0-3 and run.

    8524.PA_multicoreExample_exampleProject.out

     

  • Eric,

    My initial test with these settings gives me the same message about PA timeout on add_port() I mentioned previously in the thread.

    #define TX_BUF_SIZE   (((1518+15)/16)*16)  
    #define RX_BUF_SIZE   TX_BUF_SIZE 

    #define NUM_CORES 4

    I am testing on a DSPC-8681E card and have made several modifications to the project for this reason. I'm going to try starting with a project from a fresh PDK install. 

    You mentioned you are using the project in PDK 2.1.2.6 is that a typo or is there a more recent version of PDK than what I am using (1.1.2.6)? Also, is there a download page specifically for PDK or is it only available with MCSDK?

    Regards,
    Chris

  • It is typo, it is PDK 1.1.2.6 inside MCSDK 2.1.2.6. There is no seperate download page for PDK, you can only download MCSDK.

    Regards, Eric

  • Eric,

    I'm using a DSPC-8681E card to run this test. Working from a clean PDK install, I have added the following excerpt to main() to start cores 1-7:

        if(DNUM == 0)
        {
           int core;
           CSL_BootCfgUnlockKicker();
    
           for (core = 1; core < NUM_CORES; core++) {
             CSL_IPC_genGEMInterrupt( core, 0 );
           }
        }

    I have also modified the .cfg file as follows:
     1. commented out lines 27-29
     2. changed .text and .const to be placed into MSMCSRAM instead of L2SRAM
     3. added these lines for the log:
       var SysMin = xdc.useModule('xdc.runtime.SysMin');
       SysMin.bufSize = 102400;

    I am seeing the same results as you but if I change NUM_CORES from 4 to 8, I see problems with initialization on cores 1-7. e.g.:

    ************************************************
    *** PA Multi Core Example Started on Core 6 ***
    ************************************************
    Waiting for global config...
    QMSS Local successfully initialized
    Rx setup successfully done
    Pa_addPort returned error -11
    PASS setup failed

    Can the example be ran on all 8 cores with larger TX/RX buffers? If not, what is limiting it to 4 cores?

    Thanks,

    Chris

  • I didn't see the issue, only thing I changed:

    1)  ti\pdk_C6678_1_1_2_6\packages\ti\drv\pa\example\multicoreExample\cppi_qmss_mgmt.c:

    #define TX_BUF_SIZE   (((1500+15)/16)*16) 
    #define RX_BUF_SIZE   TX_BUF_SIZE 

        //dataBufferSize  =   sizeof (pktMatch);
        dataBufferSize  = 1500;

    2) ti\pdk_C6678_1_1_2_6\packages\ti\drv\pa\example\multicoreExample\multicore_example.h

    #define         NUM_CORES      8

    Then I used CCS to load all 8 cores and run.

    Can you make sure the basic is working before using core 0 to trigger cores 1-7 with all cores use the same MSMC memory? Where did you keep the stack for each core? It should be locally for each core, if you use MSMC, one core may corrupt anoyther's stack. The changes in sysMin I thought is irrelevant.

    I attached my code you can see if anything different.

    Regards, Eric

    ===

    [C66xx_6] ************************************************

    *** PA Multi Core Example Started on Core 6 ***

    ************************************************

    Waiting for global config...

    [C66xx_7] ************************************************

    *** PA Multi Core Example Started on Core 7 ***

    ************************************************

    Waiting for global config...

    [C66xx_0] ************************************************

    [C66xx_1] ************************************************

    [C66xx_2] ************************************************

    [C66xx_3] ************************************************

    [C66xx_4] ************************************************

    [C66xx_5] ************************************************

    [C66xx_0] *** PA Multi Core Example Started on Core 0 ***

    [C66xx_1] *** PA Multi Core Example Started on Core 1 ***

    [C66xx_2] *** PA Multi Core Example Started on Core 2 ***

    [C66xx_3] *** PA Multi Core Example Started on Core 3 ***

    [C66xx_4] *** PA Multi Core Example Started on Core 4 ***

    [C66xx_5] *** PA Multi Core Example Started on Core 5 ***

    [C66xx_0] ************************************************

    [C66xx_1] ************************************************

    [C66xx_2] ************************************************

    [C66xx_3] ************************************************

    [C66xx_4] ************************************************

    [C66xx_5] ************************************************

    [C66xx_0] Initializing Free Descriptors.

    [C66xx_1] Waiting for global config...

    [C66xx_2] Waiting for global config...

    [C66xx_3] Waiting for global config...

    [C66xx_4] Waiting for global config...

    [C66xx_5] Waiting for global config...

    [C66xx_0] QMSS successfully initialized

    CPPI successfully initialized

    PASS successfully initialized

    Ethernet subsystem successfully initialized

    Tx setup successfully done

    Rx setup successfully done

    PASS setup successfully done

    Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_1] QMSS Local successfully initialized

    [C66xx_2] QMSS Local successfully initialized

    [C66xx_3] QMSS Local successfully initialized

    [C66xx_4] QMSS Local successfully initialized

    [C66xx_5] QMSS Local successfully initialized

    [C66xx_6] QMSS Local successfully initialized

    [C66xx_7] QMSS Local successfully initialized

    [C66xx_1] Rx setup successfully done

    [C66xx_2] Rx setup successfully done

    [C66xx_3] Rx setup successfully done

    [C66xx_4] Rx setup successfully done

    [C66xx_5] Rx setup successfully done

    [C66xx_6] Rx setup successfully done

    [C66xx_7] Rx setup successfully done

    [C66xx_1] PASS setup successfully done

    [C66xx_2] PASS setup successfully done

    [C66xx_3] PASS setup successfully done

    [C66xx_4] PASS setup successfully done

    [C66xx_5] PASS setup successfully done

    [C66xx_6] PASS setup successfully done

    [C66xx_7] PASS setup successfully done

    [C66xx_1] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_2] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_3] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_4] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_5] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_6] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_7] Waiting for all cores to reach the barrier before transmission starts ...

    [C66xx_1] Packet Transmission Start ...

    [C66xx_2] Packet Transmission Start ...

    [C66xx_3] Packet Transmission Start ...

    [C66xx_4] Packet Transmission Start ...

    [C66xx_5] Packet Transmission Start ...

    [C66xx_6] Packet Transmission Start ...

    [C66xx_7] Packet Transmission Start ...

    [C66xx_1] Packet Transmission Done.

    [C66xx_2] Packet Transmission Done.

    [C66xx_3] Packet Transmission Done.

    [C66xx_4] Packet Transmission Done.

    [C66xx_5] Packet Transmission Done.

    [C66xx_6] Packet Transmission Done.

    [C66xx_7] Packet Transmission Done.

    [C66xx_1] Wait for all packets to be Received ...

    [C66xx_2] Wait for all packets to be Received ...

    [C66xx_3] Wait for all packets to be Received ...

    [C66xx_4] Wait for all packets to be Received ...

    [C66xx_5] Wait for all packets to be Received ...

    [C66xx_6] Wait for all packets to be Received ...

    [C66xx_7] Wait for all packets to be Received ...

    [C66xx_1] Core 1: Packets Sent = 10

    [C66xx_2] Core 2: Packets Sent = 10

    [C66xx_3] Core 3: Packets Sent = 10

    [C66xx_4] Core 4: Packets Sent = 10

    [C66xx_5] Core 5: Packets Sent = 10

    [C66xx_6] Core 6: Packets Sent = 10

    [C66xx_7] Core 7: Packets Sent = 10

    [C66xx_1] Core 1: Packets Received = 10

    [C66xx_2] Core 2: Packets Received = 10

    [C66xx_3] Core 3: Packets Received = 10

    [C66xx_4] Core 4: Packets Received = 10

    [C66xx_5] Core 5: Packets Received = 10

    [C66xx_6] Core 6: Packets Received = 10

    [C66xx_7] Core 7: Packets Received = 10

    [C66xx_1] **********************************************

    [C66xx_2] **********************************************

    [C66xx_3] **********************************************

    [C66xx_4] **********************************************

    [C66xx_5] **********************************************

    [C66xx_6] **********************************************

    [C66xx_7] **********************************************

    [C66xx_1] *** PA Multi Core Example Ended on Core 1 ***

    [C66xx_2] *** PA Multi Core Example Ended on Core 2 ***

    [C66xx_3] *** PA Multi Core Example Ended on Core 3 ***

    [C66xx_4] *** PA Multi Core Example Ended on Core 4 ***

    [C66xx_5] *** PA Multi Core Example Ended on Core 5 ***

    [C66xx_6] *** PA Multi Core Example Ended on Core 6 ***

    [C66xx_7] *** PA Multi Core Example Ended on Core 7 ***

    [C66xx_1] **********************************************

    [C66xx_2] **********************************************

    [C66xx_3] **********************************************

    [C66xx_4] **********************************************

    [C66xx_5] **********************************************

    [C66xx_6] **********************************************

    [C66xx_7] **********************************************

    [C66xx_0] Packet Transmission Start ...

    Packet Transmission Done.

    Wait for all packets to be Received ...

    Core 0: Packets Sent = 10

    Core 0: Packets Received = 10

    Wait for all packets to be Received in all cores...

    Test passed on core 0

    Test passed on core 1

    Test passed on core 2

    Test passed on core 3

    Test passed on core 4

    Test passed on core 5

    Test passed on core 6

    Test passed on core 7

    **********************************************

    *** PA Multi Core Example Ended on Core 0 ***

    **********************************************

    0451.multicoreExample.zip

     

  • Also, add a screenshoot showing number of bytes transmitted for all 80 packets (8 cores x 10 packet/core). Each packet is 1500 bytes.

  • Eric,

    I have reverted the changes to the location of .text and .const sections. Comparing your source to what I have, they are nearly identical except for the SysStd -> SysMin change in the .cfg file and the code for core 0 starting cores 1-7. I also reduced the size of the SysMin buffer to 10 KB. I now get the same message about PA timeout as I was getting originally on cores 1-7. e.g.:

    ************************************************
    *** PA Multi Core Example Started on Core 3 ***
    ************************************************
    Waiting for global config...
    QMSS Local successfully initialized
    Rx setup successfully done
    Timeout waiting for reply from PA to Pa_addPort command
    PASS setup failed

    I don't think there should be any issues with stack. I didn't make any changes to its location and every reference to "stack" in the .map file is at an L2 mem address. 

    Can you clarify what you mean in this question:
       Can you make sure the basic is working before using core 0 to trigger cores 1-7 with all cores use the same MSMC memory? 

    I was under the impression that core 0 had to trigger cores 1-7 and that there wasn't another way to start cores 1-7. 

    -Chris

  • Eric,

    My suspicion is that this issue is hardware related. I can set NUM_CORES as high as 6 on an A103 revision of the DSPC card and don't see any issues, but see issues with NUM_CORES set to 2 on an A101 revision of the DSPC card.

    My understanding is that with the default settings for the project, internal loopback would be done and the board that the C66x chip is on should be irrelevant as the test would run entirely on the chip and even DDR3 memory would be unused. Is this correct?

    If this is correct, is it possible that the issue is related to the C6678 chip itself and has been fixed in newer revisions?

    -Chris

  • Chris,

    You DSPC-8681E card has different PHY and DDR3 than our 6678 EVM. But this is internal loopback example so the PHY shouldn't matter and DDR is not used at all.

    I don't know any issue for 6678 DSP itself from erreta. The way you trigger cores 1-7 is different from my test. Maybe you are loading the code via PCIE, that is why you use core 0 to trigger core 1-7? Can you simply use a CCS JTAG to load all 8 cores, and run at the same time. This can verify is code issue or the way run the code.

    Regards, Eric

    .

     

  • Eric-

    > I don't know any issue for 6678 DSP itself from erreta. The way you trigger
    > cores 1-7 is different from my test. Maybe you are loading the code via PCIE,
    > that is why you use core 0 to trigger core 1-7? Can you simply use a CCS
    > JTAG to load all 8 cores, and run at the same time. This can verify is code
    > issue or the way run the code.

    Yes we are loading via PCIe on 8681 card.  I've not used JTAG on the card other than for firmware upgrade, being worried about causing the server to lock up.  Can you clarify what is the concern about starting cores 1-7?  They need to all start exactly simultaneously?  Otherwise?  Thanks.

    -Jeff

  • This is no such requirement that all 8 cores need to start at the same time. And starting with CCS or IPCGR method should have the same results.

    Just say that using JTAG to load code and run all cores is easier to debug, to make sure this works first then we can have a baseline.

    So your DSPC-8681 is plugged into a server, I don't know why use JTAG could lock server up. In the past when we use 8681 card, we always use JTAG/CCS to develop code first for multicores, then move to PCIE load and use core 0 to trigger other cores.

    Regards, Eric  

  • Eric-

    Do you have a procedure for this?  For example, my guess would be:

    1) Server is off, card inserted.  Physically connect JTAG to card, but no connection to target in CCS. 

    2) Power on and boot the server.  Allow C6678 devices to boot from I2C EEPROM firmware on the card, PCIe enumeration by server CPU BIOS.

    3) Connect to target in CCS on C6678 device 0... connect all cores?

    4) Run tests.

    Step 3 will stop any C66x code from running, so this is the crucial point -- hopefully no effect on PCIe connectivity and no IRQ or other issues with BIOS or kernel.  If you guys have a document for this please advise, or comment on my plan.  Thanks.

    -Jeff

  • This is the right steps. In step 3, just connect all 8 cores. When you run the test application for the next time, you need to do DSP system reset from CCS, this will tear down the PCIE link, but no other impact to your Linux server.

    If you need the PCIE link back again, you need to power cycle the Linux server for re-enumeration.

    Regards, Eric 

  • Eric,

    We finally have a JTAG debugger setup working with an EVM board. I have tested my project without the code that has core 0 start the other cores and it seems to be working okay with the larger buffer sizes. I noticed that I have to do a "System Reset", as opposed to a "CPU Reset", in order to run the project again though.

    My plan is to start testing the PCIe card next, but I had a quick question for you. Is there a way to have CCS Debug just load cores 1-7 and have core 0 use the kickstart code to start cores 1-7? I.e. Is there a way to test the project with the JTAG connection and CCS that allows the code to run the way I'll need it running on the PCIe card without the JTAG connection?

    Regards,
    Chris

  • Chris,

    If your program is in shared memory, I believe you can just use CCS to load core 0 and inside that code, use kickstart method, to run cores 1-7 as well. You don't need to connect to core 1-7 to load program.

    Regards, Eric

     

  • Eric,

    I have done some testing with the pcie card now and am getting the same results as I was on the EVM.

    I tried what you suggested and it doesn't seem like the other cores get started. Core 0 ends up waiting for the other cores to reach the barrier:

    [C66xx_0] ************************************************
    *** PA Multi Core Example Started on Core 0 ***
    ************************************************
    Initializing Free Descriptors.
    QMSS successfully initialized
    CPPI successfully initialized
    PASS successfully initialized
    Ethernet subsystem successfully initialized
    Tx setup successfully done
    Rx setup successfully done
    PASS setup successfully done
    Waiting for all cores to reach the barrier before transmission starts ...

    It seems like the original problem I was facing is caused either by loading over PCIe or using core 0 to start the other cores. Is there an alternative method for starting cores 1-7 over PCIe and why would the kickstart code cause the problem?

    Thanks,
    Chris

  • If only core 0 started but not the other cores (as you saw core 0 is waiting others to sync-up), where is the Program Counter (PC) of cores 1-7? Are they still in bootrom (0x20b0_xxxx) or DDR3A (your shared prgram memory)? If inside DDR3A, can you load symbol to core 1, to see where the PC is, this is to get idea whether core 1-7 started running or not?

    Regards, Eric

  • Eric,

    I'm seeing that the other cores are still in bootrom

    PC = 0x20B002A8

    I have all sections of the code going into MSMCSRAM (I know this will cause problems with data memory, but I figured the cores should at least be able to start). 

    I am only connecting to core 0 and leaving the other cores disconnected, other than when I checked the PC of the other cores.

    Regards,
    Chris

  • Eric,

    We have resolved the issue. The problem was entirely due to how the cores were being started as you suspected.

    Thanks for you help.

    Chris