This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

DSPC-8681E (quad) card exhaustive memory read/write errors

TI C667x experts-

In an HP DL380 server with a DSPC-8681E (quad) card, running exhaustive tests for robustness, we see intermittent memory read/write errors with the A103 version of the card.  Here is a problem description and notes:

1) The test program writes two (2) random memory locations with random 32-bit data and then reads these locations back.  Each access is 32-bits (no block accesses in this test).  L2 cache is disabled.

2) Errors do not occur:

  -with A101 version cards

  -with shared mem (MCSRAM)

3) Errors occur:

  -with L2SRAM and DDR3 mem

  -only on core0 of each C6678 device

  -only on lower 16 bits of read-back values

  -more often with a 1 GHz C667x clock rate.
   A 1.25 GHz clock rate completely removes
   errors from core0 of DSP0, but there are still
   errors with core0 of DSPs 2-3.  Errors are
   always worse on DSP3

It's possible this is a PCIe vs. C667x timing issue, as they run on different clocks, which could be affected by PCIe bridge priority in the PLX chip card (with DSP3 being last), or even physical placement and trace length of the devices on the card.

What tests can we run to affect PCIe timing?  Can we change C667x PCIe register settings?  Any ideas on how to shine more light on this problem welcome.  We've tried various server BIOS settings, such as disabling C-states, enabling PCIe 1.0 transaction rate, and changing x86 CPU clock rate.

Thanks.

-Jeff
Signalogic

  • Jeff,

    I want to know:

    1) how soon you start the read back after a write?

    2) DL380 is a Linux PC or a Windows PC?  

    3) The observation A101 always working and A103 may fail based on the test on multiple A101 or A103 boards or only 1 board of each revision?

    4) What is the typical percentage of failure rate?

    5) Do you have sample test code (with build instructions, how to run ...) for us to duplicate the issue?

    6) In PCIE driver code, are you able to disable relaxed order in both below to see if it make any difference?

     ---- 3.1.6 TLP Attribute Configuration Register (TLPCFG):

    0 = Disable relaxed ordering for all outgoing TLPs. ===================> set this bit to 0

    ----- 3.7.4 Device Status and Control Register (DEV_STAT_CTRL)

    4 RELAXED Enable Relaxed Orderin ===================> set this bit to 0

    7) When you read/write to DDR3, is cache enabled or not?

    Regards, Eric

  • Eric-

    > 1) how soon you start the read back after a write?

    With Signalogic software, the test sequence is as follows:

      -write a block of random data to random mem address A

      -write a 2nd block of (different) random data to random
       mem address B (addresses A and B do not overlap)

      -read back block A, compare with what was written

      -read back block B, compare with what was written

    With Advantech software, the test sequence is:

      -write an N byte file to mem area under test,
       filling all of allowable memory

      -read back to file

      -compare files

    > 2) DL380 is a Linux PC or a Windows PC?

    RedHat 6.2 Linux, kernel version 2.6.32-220.el6.x86_64, 2.2 GHz CPU clock rate.

    > 3) The observation A101 always working and A103 may fail
    > based on the test on multiple A101 or A103 boards or
    > only 1 board of each revision?

    We've tried multiple cards of each revision, and so has one of our customers, so we're confident the issue is not a card outlier.

    > 4) What is the typical percentage of failure rate?

    Very low, and an interesting thing is that after an error occurs at a particular mem location, the error is less likely to occur there.  Eventually if we repeat the test often enough it becomes error free.  We can make this happen faster by not seeding the random number generator used to randomly generate memory addresses.

    > 5) Do you have sample test code (with build instructions,
    > how to run ...) for us to duplicate the issue?

    So far we can reproduce the problem with the Signalogic driver and memTest utility and the Advantech driver and dsp_loader utility.  Next we'll try to do it with TI driver and SDK software.  But, I think in order for you guys to reproduce, you will need a similar server and/or OS install.  I say this as we have another nearly identical HP server with Ubuntu 12.04 installed (kernel version 3.2.0-49-generic-x86_64) and we cannot reproduce the errors, so whatever is going on does have some server / OS dependency.

    > 6) In PCIE driver code, are you able to disable relaxed order
    > in both below to see if it make any difference?

    With both Signalogic and Advantech drivers, relaxed ordering is disabled.

    > 7) When you read/write to DDR3, is cache enabled or not?

    Cache is disabled -- or at least we think so.  In the case of Signalogic software test, the TI "init.out" code has been downloaded and run.  In the case of Advantech software test, init.out has not been downloaded.  No application C66x code has been downloaded and run.

    New info:  The most straightforward reproduction of the error is using only L2SRAM.  In that case, errors typically:

      -are always on core 0 of each C6678

      -are confined to addresses 0x1000 or lower

      -decrease in frequency, eventually there are none

      -do not re-appear until the server is rebooted (restart, not power down)

    -Jeff

  • All, 
    Here is additional information on how to reproduce the problem using Advantech software: 
    1. Reboot the server.
    2. Load the Advantech driver. 
    3. All files should be created under the directory: /Lightning_PCIE/dsp_loader/app/bin/
    4. Create the file which contains all zeros using command below: 
     dd if=/dev/zero of=512KB_Random_0 bs=512k count=1 
    5. Create the file which contains all ones using the command below: 
     tr '\000' '\377' < /dev/zero | dd of=512KB_Random_1 bs=512K count=1 
    6. Copy and paste the attached "l2sram_capacity.sh" to the directory mentioned in step 3.
     Run the scripts "
    /Lightning_PCIE/dsp_loader/app/bin/ shl2sram_capacity.sh "
    This creates files(cmp_coreX.txt) for each core which contains data difference between loaded vs saved data.

    In the first run, there are a lot of differences between loaded and saved file for core 0,8,24.
    After running the same script 10 times,differences decreased between loaded and saved file.
     For Example:
     First Run: 
    Core 0, 2 difference between loaded data and saved files.
    core 8, 14 difference between loaded data and saved files. 
    Core 16, 6 difference between loaded data and saved files. 
    core 24 , 24 difference between loaded data and saved files.
    Tenth Run:
     Core 0, there is no difference between loaded data and saved files.
     core 8, there is no difference between loaded data and saved files. 
    core 16, nearly 16000 differences between loaded data and saved files.
     core 24 , there is no difference between loaded data and saved files. 

    We ran the same script multiple times by rebooting the server. Differences in loaded and saved files
    typically reduce for all cores, but in some cases, at least one core gets "stuck" on a few addresses,
    or errors 
    continue to get worse.

    Regards,
    Shradha
  • Hi Support,

    We are also able to reproduce the issue using TI driver and SDK software, as follows:

    1. Load the TI driver /opt/ti/desktop-linux-sdk_01_00_00_07/demos/scripts sh driver_init.sh

    2. All the files should be created under the directory /opt/ti/desktop-linux-sdk_01_00_00_07/demos/dsp_utils/bin

    3. I have attached the updated dsp_utils(added the functionality Advantech Similar functionlity load binary, save binary).

    4. Ran the attached script TI_L2SRAM.sh

    In the first run, there are a lot of differences between loaded and saved file for DSP 0,1,2,3.
    After running the same script 10 times,differences decreased between loaded and saved file.

    For Example:

    First Run:
    DSP 0, 1024 difference between loaded data and saved files.
    DSP 1, 18 difference between loaded data and saved files.
    DSP 2, 8 difference between loaded data and saved files.
    DSP 3 , 12 difference between loaded data and saved files.

    Second Run onwards :

    DSP 0, there is no difference between loaded data and saved files.
    DSP 1, there is no difference between loaded data and saved files.
    DSP 2, nearly 8 differences between loaded data and saved files.
    DSP 3, there is no difference between loaded data and saved files

    The difference in load and saved files are reduced for all cores except for DSP 2.
    As with the Advantech and Signalogic software, errors are intermittent, server and/or card dependent, and tend to reduce with repeated test runs.

    Srinivasalu

    -Signalogic

    4370.TI_Driver_ L2Sram.zip

  • Does this depend on the board revision and OS of the Linux PC?

    Regards, Eric

  • Hi Eric,

    we have tested with A101 and A103 in Ubuntu 12.04 didn't get any errors.
    Similarly with A101 in Redhat 6.2 didn't get any errors but A103 getting errors.

    The Ubuntu and Redhat machines are identical (HP DL380p G8), although there could b some slight differences we haven't been able to identify.

    Srinivasalu
    -Signalogic

  • Srinivasalu,

    Thanks for your efforts to reproduce the issue on TI Linux SDK driver. I can't find Redhat PC here to reproduce the issue. And it looks more HW related, can you check with Advantech in parallel?

    Regards, Eric 

  • Eric,

    We have replicated the same issue in Advantech driver and also updated information to the Advantech support team.

    -Srinivasalu

    Signalogic