This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: DDR4 initialization failure

Part Number: AM6442
Other Parts Discussed in Thread: SYSCONFIG,

Tool/software:

We are currently in the process of testing multiple manufacturers' DDR4 on our custom board and are experiencing issues with an ISSI part. The initial symptom with the ISSI DDR4 was it would not boot even with what appears to be a valid SysConfig. After enabling debug output, DDRSS register dumps, and R5 memtester we can see the ISSI part fails the initial Stuck Address test but passes all subsequent tests over multiple loops. So, it seems like our SysConfig generated .dtsi file for the ISSI DDR4 is correct, and it appears there is training happening based on my rudimentary understanding of the register dump.

Is there any obvious reason we would be failing *just* the initial Stuck Address test?

Additionally, I think our design and configurations are ok since we have boards with both Micron and Alliance DDR4 that appear to work fine, boot, and pass all memory tests.

The DDR4 configuration is single chip 1GB (16x512MB) at 800MHz.

The TI Processor SDK version is 10.01.10.04.

The parts we are testing are:

Micron: MT40A512M16LY-062E IT:E

Alliance: AS4C512M16D4A-62BIN

ISSI: IS43QR16512A-083TBLI

Attached a boot log from an ISSI board that includes the debug, register dump, and a few loops of memtester to show what we are seeing. I'll also attach our ISSI .dtsi file so you can see how it's configured.

Any help is greatly appreciated.

Greg

k3-am64-ddr-ISSI_DDR4-2400T_CL12_CWL9.zip

ISSI_DDR4_WarmBoot_RegDump_Memtester.zip

  • HI Greg, 

    first, thanks for the details you provided on your initial post.

    I checked your register dump, everything looks to be good, and i checked the ISSI register configuration, and this also checks out.  Do you have register dumps with the other devices?  Would be good to compare against those as well.

    I noticed that the ISSI device is the only one of the three that doesn't support Read DBI.  Without Read DBI, you might be seeing the effects of additional supply noise which may be affecting the memtester results.  We don't recommend using devices that do not support Read DBI.  They can be used, but you are adding this unnecessary risk.  

    You can also try testing the ISSI device at a reduced speed (maybe 667MHz) to see if that changes behavior.  If it does, that could also indicate some signal integrity issues, maybe because of the increased supply noise.

    How many boards have you tried with the ISSI device?  

    You may also want to contact ISSI about the issue, and ensure you have the latest lot code devices.  We did have some issues early last year with some devices needing updated production trims.  

    Regards,

    James

  • Hi James,

    Thanks for your reply.

    I've tried two boards with the ISSI device and get the same result. I'll try a slower speed to see if that helps. I will upload a Micron log, but the Alliance one will have to wait until Monday.

    For what it's worth, what I'm assuming is the date code on the ISSI part is 2240, which maybe seems kind of old?

    One other quick question: How big of a role does the enable timing of VTT and VREFCA play? Initially I had it enabled immediately but noticed it should be after VDD and VDDQ. It doesn't seem to make any difference and might not be necessary with only one DDR4 device.

    Again, thanks for the help. I'll be sure to post the results of my testing.

    Regards,

    Greg

  • I also wanted to add that our power supply is based on the TPS6521903 and follows the Application Note:

    https://www.ti.com/lit/pdf/slvafe9

  • Hi Greg, i think that date code means 40th week of 2022.  May need to check with ISSI.  At least they should be able to send you latest parts to check before you start chasing old issues

    VTT should be valid and stable before DDR initialization and training begins.  Typically it is enabled along with power (with just a pull up on the enable of the VTT regulator) or with a GPIO before the DDR initialization begins.  You are correct, if you only have one DDR device, VTT is not necessary.  We have an EVM which demonstrates this for reference

    Regards,

    James

      

  • Hi James,

    Here is a Micron Register Dump along with our Micron .dtsi configuration.

    One thing I've noticed when comparing register dumps is the ISSI always has non-zero values at

    0x0f30a024 0x00000003 //DDRSS_PI_9_DATA
    0x0f30a028 0x0000000f //DDRSS_PI_10_DATA

    while the Micron and Alliance are always zero here. I'm not a DDR4 expert, so I don't know how to interpret what the Reference Manual is telling me about those registers, i.e. is a non-zero max number of DFI clock cycles between req and ack bad or indicate an error? Anyway, just something I noticed.

    I'll try the ISSI at 667 MHz next.

    Greg

    Micron_DDR4_RegDump(Micron_Settings).zipk3-am64-ddr-Micron.zip

  • Hi Greg,

    Those PI registers are just some internal timing status, not really indicative of anything associated with the error.  

    The Micron register dump looks fairly equivalent to the ISSI.  ISSI has higher processor VREFdq (70% vs 66%), but right now i don't think that is the cause of the failures.  

    I did look thru some of my archives, and a customer with the same ISSI memory did have issues on his board with VTT which caused boot failures.  Just double check the VTT voltage and ensure it is stable at 0.6V before PORz transitions high and remains stable until the memtester runs.

    Regards,

    James

  • For what it's worth, attached is a Reg Dump of the ISSI device at 667MHz, CWL=9, and CL=10. Same result with the initial Stuck Address test failing.

    It's always just the first test. I'm curious what that test does. I have the source for memtester, but I'm not sure how that test works. Maybe understanding that will lead me toward finding the problem.

    Today I will work on verifying VTT voltage and enable timing.

    Thanks again,

    Greg

    ISSI_DDR4_667MHz_CWL=9_CL=10.zip

  • Greg, i believe the stuck address test writes either the address as data (or the complement of that address) alternatively thru the memory and then does a readback and verify.  Although i don't think you can actually rule out data issues as well just because of this test. 

    i've never actually seen just the address test fail.  Usually if you have a bad address, then some of the data tests would fail too.  Is the failure on the ISSI memory always at the same address?  I still think everything so far points to a bad memory cell or some other issue with the memory device.

    One other idea, can you try to change the size of the memory being used (log shows 0x2000000)

    Testing memory starting 0x82000000, size 0x2000000...

    This may hit the issue in more of the loops (test run will be much slower).  If you hit the error at the same address everytime, this will really point to a memory issue.

    Or let the memtester run overnight.  i believe it allocates a different block of memory on each loop. This will help with more coverage.

    Regards,

    James

  • Hi James,

    Thanks for the idea on the memory size. I'll run some tests tomorrow.

    For what it's worth, I've run the test overnight before and it always passes every loop after the initial address fail on Loop 1 (20+ loops). I've tested two different ISSI boards many times, and they always pass after the Loop 1 address fail. Also, the address that fails is different in subsequent cold boots. That all leads me to believe there is something that happens on initialization or reset. I've looked at the timing of voltage rails and it all looks nominal.

    I contacted ISSI and their sales rep promised to get me in touch with one of their FAEs, but I haven't heard back yet. I'll for sure update with any useful information they provide.

    At some point I had planned to look at running the TI DDR Margin Test Firmware, but I just read that firmware is LPDDR only. Bummer.

    Does your parsing of the register dumps give you confidence I have all the DDRSS settings correct? I can't help but think there is something wrong with my ISSI settings.

    Thanks for all the help,

    Greg

  • Hi Greg, yes it is very strange that only the first loop has the failure.  With that register dump prior, there should be plenty of time after initialization.  Weird.  

    Everything i saw from the regdumps looks normal.  I will double check the settings tomorrow.  

    Do you know if it is failing on the very first access to the memory every time?  Or is somewhere in the middle of the Stuck Address test.

    Regards,

    James

  • Greg, i double check the regdumps, and didn't see any issues with the timings.  

    The only thing i saw which may require a different setting is this in the DDR datasheet:

    This table is implying that if you run 3.9us refresh rate, then only extended temperature is applicable, even with TCR disabled.  You can try changing the online tool to this:

    I don't have high hopes that this will change behavior, but based on the datasheet, this would be technically correct.

    Regards,

    James

  • Hi James,

    Thanks for finding that. Unfortunately, you were right and that didn't seem to make any difference. That said, I'll leave TCR range set to Extended from here on out.

    For what it's worth, I ran a number of tests to see if the failing address was truly random and it seems so to me. Here are ten trials at 800MHz:

    Cold boot #1, failed at address: 0x00000000831122a8
    Cold boot #2, failed at address: 0x00000000831224a8
    Cold boot #3, failed at address: 0x00000000831731b8
    Cold boot #4, failed at address: 0x000000008311216c
    Cold boot #5, failed at address: 0x000000008311e2a8
    Cold boot #6, failed at address: 0x00000000831662e8
    Cold boot #7, failed at address: 0x0000000083170878
    Cold boot #8, failed at address: 0x00000000831128a8
    Cold boot #9, failed at address: 0x0000000083122ba8
    Cold boot #10, failed at address: 0x0000000083174778

    Note, these are always in the 0th iteration of the function test_stuck_address in test.c.

    My next test will be to lower the size in /arch/arm/mach-k3/am642_init.c from

    #if defined(CONFIG_SPL_MEMTESTER)
     /* test for 32MB at 0x82000000 which is R5 SPL stack on DDR */
     memtester(0x82000000, 0x100000 * 32);
    #endif

    to

    #if defined(CONFIG_SPL_MEMTESTER)
     /* test for 16MB at 0x82000000 */
     memtester(0x82000000, 0x010000 * 32);
    #endif

    Greg

  • I just realized that size should have been 0x080000*32 for 16MB.

    Strangely, the size I accidently set it to (0x0100000*32) seems to work.

    memtester version 4.6.0-ti (64-bit)
    Copyright (C) 2001-2020 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).

    Testing memory starting 0x82000000, size 0x200000...

    Loop 1:
    Stuck Address: ok
    Random Value: ok
    Compare XOR: ok
    Compare SUB: ok
    Compare MUL: ok
    Compare DIV: ok
    Compare OR: ok
    Compare AND: ok
    Sequential Increment: ok
    Solid Bits: ok
    Block Sequential: ok
    Checkerboard: ok
    Bit Spread: ok
    Bit Flip: ok
    Walking Ones: ok
    Walking Zeroes: ok

    Loop 2:
    Stuck Address: ok
    Random Value: ok
    Compare XOR: ok
    Compare SUB: ok
    Compare MUL: ok
    Compare DIV: ok
    Compare OR: ok
    Compare AND: ok
    Sequential Increment: ok
    Solid Bits: ok
    Block Sequential: ok
    Checkerboard: ok
    Bit Spread: ok
    Bit Flip: ok
    Walking Ones: ok
    Walking Zeroes: ok

    ----------------------------------------------------------------

    Edit: I ran it with a memtester size 0x080000*32 and it also passed:

    memtester version 4.6.0-ti (64-bit)
    Copyright (C) 2001-2020 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).

    Testing memory starting 0x82000000, size 0x1000000...

    Loop 1:
    Stuck Address: ok
    Random Value: ok
    Compare XOR: ok
    Compare SUB: ok
    Compare MUL: ok
    Compare DIV: ok
    Compare OR: ok
    Compare AND: ok
    Sequential Increment: ok
    Solid Bits: ok
    Block Sequential: ok
    Checkerboard: ok
    Bit Spread: ok
    Bit Flip: ok
    Walking Ones: ok
    Walking Zeroes: ok

    Loop 2:
    Stuck Address: ok
    Random Value: ok
    Compare XOR: ok
    Compare SUB: ok
    Compare MUL: ok
    Compare DIV: ok
    Compare OR: ok
    Compare AND: ok
    Sequential Increment: ok
    Solid Bits: ok
    Block Sequential: ok
    Checkerboard: testing 63

  • Since these passed, I thought maybe there was a chance Linux would boot with this configuration. Nope - still gets stuck.

  • Hi Greg, strange behavior.  can you send the full boot log of when it get stuck?

    Do you happen to have a JTAG connection to the board?

    Have you tried both CWL=9 and CWL=11?  

    I'm a little at a loss at the moment about what is going on.  Any luck contacting the vendor?

    Regards,

    James

  • Hi James,

    I've been testing with CWL=9, but I'm pretty sure I tried CWL=11. I'll try it again to make sure.

    We do have JTAG with an XDS110 Debug Probe.

    Our ISSI rep just got back to me today as he was out of the office last week. I'm hoping he can get me in touch with the ISSI engineering team. I'll be sure to update this thread when I find out more.

    In the meantime, I've modified tests.c to print some debug data so I can see where it's failing the address test. I was somewhat surprised to see that it fails over halfway through the i loop. For what it's worth, here's 4 memtests that output the failing *p1 inequality along with i/count.

    memtester version 4.6.0-ti (64-bit)
    Copyright (C) 2001-2020 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).
    
    Testing memory starting 0x82000000, size 0x2000000...
    
    Loop 1:
                       Stuck Address: testing   0
    *p1 = 0xd25ad25a != 0x83170878, i = 0x45c21e/0x800000
    FAILURE: possible bad address line at physical address 0x0000000083170878.
    Skipping to next test...
                        Random Value:  ok
    
    memtester version 4.6.0-ti (64-bit)
    Copyright (C) 2001-2020 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).
    
    Testing memory starting 0x82000000, size 0x2000000...
    
    Loop 1:
                       Stuck Address: testing   0
    *p1 = 0xd25ad25a != 0x83172480, i = 0x45c920/0x800000
    FAILURE: possible bad address line at physical address 0x0000000083172480.
    Skipping to next test...
    
    memtester version 4.6.0-ti (64-bit)
    Copyright (C) 2001-2020 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).
    
    Testing memory starting 0x82000000, size 0x2000000...
    
    Loop 1:
                       Stuck Address: testing   0
    *p1 = 0xd25bd25a != 0x831428c8, i = 0x450a32/0x800000
    FAILURE: possible bad address line at physical address 0x00000000831428c8.
    Skipping to next test...
    
    memtester version 4.6.0-ti (64-bit)
    Copyright (C) 2001-2020 Charles Cazabon.
    Licensed under the GNU General Public License version 2 (only).
    
    Testing memory starting 0x82000000, size 0x2000000...
    
    Loop 1:
                       Stuck Address: testing   0
    *p1 = 0xd25ad25a != 0x83112da8, i = 0x444b6a/0x800000
    FAILURE: possible bad address line at physical address 0x0000000083112da8.
    Skipping to next test...
    

    Is it possible the initial DDR4 training values are marginal and at some point it retrains to better values? I left a test running over the weekend and it successfully completed close to 100 loops after the initial address failure.

  • Ok, thanks for the added info.  Something weird is going on especially if you can run overnight without failure!

    My idea with JTAG was to break in right after a failure to see what the behavior is at the failing address.  Opening up a memory browser at that location to see what you can read, and see if you can peek/poke values successfully.  This might give us an indication on whether it is a read or write failure.

    Actually, i just thought of something else.  I thought the address test was just writing the address to itself as data (or its complement).  Why does it keep failing with 0xd25ad25a?  Did you happen to change the dts to accommodate the fact that you only have 1GB?  SDK inherently has 2GB (to support the total memory on the EVM), so there are a couple of entries in the dts that would need to be changed.  Some information is here: https://dev.ti.com/tirex/explore/content/am64x_academy_10_00_00_01/am64x_academy_10_00_00_01/source/linux/ch-porting/porting-uboot.html#modifying-ram-size

    Let me know which SDK version you are working with, some of this info may have changed with the latest SDKs. 

    We have seen issues with speculative fetching or aliasing if the SDK is not configured with the representative memory that is on the board.  Not sure why this wouldn't show up with other memories, but it needs to be corrected nonetheless.

    Regards,

    James

  • I'm using the latest SDK (ti-processor-sdk-linux-am64xx-evm-10.01.10.04). I did modify k3-am642-evm.dts in the memory section, but I very easily could be doing something wrong:

    memory@80000000 {
    bootph-all;
    device_type = "memory";
    /* 2G RAM */
    /* reg = <0x00000000 0x80000000 0x00000000 0x80000000>;*/ /* GED: 2GB EVM DDR4 size */
    /* 1G RAM */
    reg = <0x00000000 0x80000000 0x00000000 0x40000000>; /* GED: Our 1GB size. This is per Resource Explorer > Training > AM64x Academy - 9.02.00.00 v1 > Linux > Porting Linux to Custom Hardware > Porting U-Boot */
    };

  • ok, that looks correct.

    I'm still a little confused on the failure results, and if the reoccurrence of 0xd25ad25a is a clue.  I expected either the address or the complement of the address for each 32bit word.  

    Also, i wonder if you can remove the code which switches to the next test upon failure.  I'd like to see if there are further failures.

    Regards,

    James

  • That's an excellent idea. I think I can just return 0 on the mismatch instead of -1. I also started printing the previous values to see what they were (of course they match). I'm not sure how this test works, but the values are the complement of 0x8312eea4, which makes sense since that would be an odd address:

    Loop 1:
    Stuck Address: testing 0
    prev_*p1 = 0x7ced115b, prev_tern = 0x7ced115b
    *p1 = 0xd25ad25a != 0x8312eea8, i = 0x44bbaa/0x800000

    I'll update when I have some more results.

    Greg

  • James,

    Thank you for suggesting letting the test run. Surprisingly there were only around 80 failed addresses.

    Below is one test I ran. The code prints out the two previous iteration passes followed by the failures. The failures always happen in the j=0 outer loop. The test continues without failure through the rest of the j=0 loop and all subsequent j loops. I'm running another longer test now to verify there are no address errors on the next iteration of the main loop. 

    I guess there is no way to know whether the writes were wrong, or the reads are wrong. Where does the 0xd25ad25a come from? Thoughts?

    Testing memory starting 0x82000000, size 0x2000000...
    
    Loop 1:
                       Stuck Address: testing   0
    i = 4524976, p1 = 0x83142ec0, *p1 = 0x83142ec0, t = 0x83142ec0
    i = 4524977, p1 = 0x83142ec4, *p1 = 0x7cebd13b, t = 0x7cebd13b
    i = 4524978, p1 = 0x83142ec8, *p1 = 0xd25ad25a, t = 0x83142ec8
    i = 4524979, p1 = 0x83142ecc, *p1 = 0xd25ad25a, t = 0x7cebd133
    i = 4524980, p1 = 0x83142ed0, *p1 = 0xd25ad25a, t = 0x83142ed0
    i = 4524981, p1 = 0x83142ed4, *p1 = 0xd25ad25a, t = 0x7cebd12b
    i = 4524982, p1 = 0x83142ed8, *p1 = 0xd25ad25a, t = 0x83142ed8
    i = 4524983, p1 = 0x83142edc, *p1 = 0xd25ad25a, t = 0x7cebd123
    i = 4524984, p1 = 0x83142ee0, *p1 = 0xd25ad25a, t = 0x83142ee0
    i = 4524985, p1 = 0x83142ee4, *p1 = 0xd25ad25a, t = 0x7cebd11b
    i = 4524986, p1 = 0x83142ee8, *p1 = 0xd25ad25a, t = 0x83142ee8
    i = 4524987, p1 = 0x83142eec, *p1 = 0xd25ad25a, t = 0x7cebd113
    i = 4524988, p1 = 0x83142ef0, *p1 = 0xd25ad25a, t = 0x83142ef0
    i = 4524989, p1 = 0x83142ef4, *p1 = 0xd25ad25a, t = 0x7cebd10b
    i = 4524990, p1 = 0x83142ef8, *p1 = 0xd25ad25a, t = 0x83142ef8
    i = 4524991, p1 = 0x83142efc, *p1 = 0xd25ad25a, t = 0x7cebd103
    i = 4524992, p1 = 0x83142f00, *p1 = 0xd25ad25a, t = 0x83142f00
    i = 4524993, p1 = 0x83142f04, *p1 = 0xd25ad25a, t = 0x7cebd0fb
    i = 4524994, p1 = 0x83142f08, *p1 = 0xd25ad25a, t = 0x83142f08
    i = 4524995, p1 = 0x83142f0c, *p1 = 0xd25ad25a, t = 0x7cebd0f3
    i = 4524996, p1 = 0x83142f10, *p1 = 0xd25ad25a, t = 0x83142f10
    i = 4524997, p1 = 0x83142f14, *p1 = 0xd25ad25a, t = 0x7cebd0eb
    i = 4524998, p1 = 0x83142f18, *p1 = 0xd25ad25a, t = 0x83142f18
    i = 4524999, p1 = 0x83142f1c, *p1 = 0xd25ad25a, t = 0x7cebd0e3
    i = 4525000, p1 = 0x83142f20, *p1 = 0xd25ad25a, t = 0x83142f20
    i = 4525001, p1 = 0x83142f24, *p1 = 0xd25ad25a, t = 0x7cebd0db
    i = 4525002, p1 = 0x83142f28, *p1 = 0xd25ad25a, t = 0x83142f28
    i = 4525003, p1 = 0x83142f2c, *p1 = 0xd25ad25a, t = 0x7cebd0d3
    i = 4525004, p1 = 0x83142f30, *p1 = 0xd25ad25a, t = 0x83142f30
    i = 4525005, p1 = 0x83142f34, *p1 = 0xd25ad25a, t = 0x7cebd0cb
    i = 4525006, p1 = 0x83142f38, *p1 = 0xd25ad25a, t = 0x83142f38
    i = 4525007, p1 = 0x83142f3c, *p1 = 0xd25ad25a, t = 0x7cebd0c3
    i = 4525008, p1 = 0x83142f40, *p1 = 0xd25ad25a, t = 0x83142f40
    i = 4525009, p1 = 0x83142f44, *p1 = 0xd25ad25a, t = 0x7cebd0bb
    i = 4525010, p1 = 0x83142f48, *p1 = 0xd25ad25a, t = 0x83142f48
    i = 4525011, p1 = 0x83142f4c, *p1 = 0xd25ad25a, t = 0x7cebd0b3
    i = 4525012, p1 = 0x83142f50, *p1 = 0xd25ad25a, t = 0x83142f50
    i = 4525013, p1 = 0x83142f54, *p1 = 0xd25ad25a, t = 0x7cebd0ab
    i = 4525014, p1 = 0x83142f58, *p1 = 0xd25ad25a, t = 0x83142f58
    i = 4525015, p1 = 0x83142f5c, *p1 = 0xd25ad25a, t = 0x7cebd0a3
    i = 4525016, p1 = 0x83142f60, *p1 = 0xd25ad25a, t = 0x83142f60
    i = 4525017, p1 = 0x83142f64, *p1 = 0xd25ad25a, t = 0x7cebd09b
    i = 4525018, p1 = 0x83142f68, *p1 = 0xd25ad25a, t = 0x83142f68
    i = 4525019, p1 = 0x83142f6c, *p1 = 0xd25ad25a, t = 0x7cebd093
    i = 4525020, p1 = 0x83142f70, *p1 = 0xd25ad25a, t = 0x83142f70
    i = 4525021, p1 = 0x83142f74, *p1 = 0xd25ad25a, t = 0x7cebd08b
    i = 4525022, p1 = 0x83142f78, *p1 = 0xd25ad25a, t = 0x83142f78
    i = 4525023, p1 = 0x83142f7c, *p1 = 0xd25ad25a, t = 0x7cebd083
    i = 4525024, p1 = 0x83142f80, *p1 = 0xd25ad25a, t = 0x83142f80
    i = 4525025, p1 = 0x83142f84, *p1 = 0xd25ad25a, t = 0x7cebd07b
    i = 4525026, p1 = 0x83142f88, *p1 = 0xd25ad25a, t = 0x83142f88
    i = 4525027, p1 = 0x83142f8c, *p1 = 0xd25ad25a, t = 0x7cebd073
    i = 4525028, p1 = 0x83142f90, *p1 = 0xd25ad25a, t = 0x83142f90
    i = 4525029, p1 = 0x83142f94, *p1 = 0xd25ad25a, t = 0x7cebd06b
    i = 4525030, p1 = 0x83142f98, *p1 = 0xd25ad25a, t = 0x83142f98
    i = 4525031, p1 = 0x83142f9c, *p1 = 0xd25ad25a, t = 0x7cebd063
    i = 4525032, p1 = 0x83142fa0, *p1 = 0xd25ad25a, t = 0x83142fa0
    i = 4525033, p1 = 0x83142fa4, *p1 = 0xd25ad25a, t = 0x7cebd05b
    i = 4525034, p1 = 0x83142fa8, *p1 = 0xd25ad25a, t = 0x83142fa8
    i = 4525035, p1 = 0x83142fac, *p1 = 0xd25ad25a, t = 0x7cebd053
    i = 4525036, p1 = 0x83142fb0, *p1 = 0xd25ad25a, t = 0x83142fb0
    i = 4525037, p1 = 0x83142fb4, *p1 = 0xd25ad25a, t = 0x7cebd04b
    i = 4525038, p1 = 0x83142fb8, *p1 = 0xd25ad25a, t = 0x83142fb8
    i = 4525039, p1 = 0x83142fbc, *p1 = 0xd25ad25a, t = 0x7cebd043
    i = 4525040, p1 = 0x83142fc0, *p1 = 0xd25ad25a, t = 0x83142fc0
    i = 4525041, p1 = 0x83142fc4, *p1 = 0xd25ad25a, t = 0x7cebd03b
    i = 4525042, p1 = 0x83142fc8, *p1 = 0xd25ad25a, t = 0x83142fc8
    i = 4525043, p1 = 0x83142fcc, *p1 = 0xd25ad25a, t = 0x7cebd033
    i = 4525044, p1 = 0x83142fd0, *p1 = 0xd25ad25a, t = 0x83142fd0
    i = 4525045, p1 = 0x83142fd4, *p1 = 0xd25ad25a, t = 0x7cebd02b
    i = 4525046, p1 = 0x83142fd8, *p1 = 0xd25ad25a, t = 0x83142fd8
    i = 4525047, p1 = 0x83142fdc, *p1 = 0xd25ad25a, t = 0x7cebd023
    i = 4525048, p1 = 0x83142fe0, *p1 = 0xd25ad25a, t = 0x83142fe0
    i = 4525049, p1 = 0x83142fe4, *p1 = 0xd25ad25a, t = 0x7cebd01b
    i = 4525050, p1 = 0x83142fe8, *p1 = 0xd25ad25a, t = 0x83142fe8
    i = 4525051, p1 = 0x83142fec, *p1 = 0xd25ad25a, t = 0x7cebd013
    i = 4525052, p1 = 0x83142ff0, *p1 = 0xd25ad25a, t = 0x83142ff0
    i = 4525053, p1 = 0x83142ff4, *p1 = 0xd25ad25a, t = 0x7cebd00b
    i = 4525054, p1 = 0x83142ff8, *p1 = 0xd25ad25a, t = 0x83142ff8
    i = 4525055, p1 = 0x83142ffc, *p1 = 0xd25ad25a, t = 0x7cebd003

  • This is really perplexing!  Can you break in with JTAG to inspect those addresses?    

    Regards,

    James

  • I'm not sure I know how to or if I can break in with JTAG, but I can certainly give it a try. Currently I'm compiling u-boot and moving the files over to an SD card to boot. My usual procedure is to power cycle the board, so it's a full POR.

    I've run a number of tests, and the address fails can be at different addresses and the total number can be in the hundreds. Whatever is happening it seems asynchronous to the memory tests.

    Also, our board has an FPGA connected to the reset signals on the AM6442 and I can assert SOC_Warm_RESETz. When I issue a warm reset, I still see failures, but there are no 0xd25ad25a values. Instead, they look like previous tests that are off by one address. This leads me to think there is something affecting the writes during a window of time. Could 0xd25ad25a be a power-on reset value in the ISSI device and the first address test is failing to write to those locations? 

    I'll attach a portion of debug output from a warm boot below.

    Testing memory starting 0x82000000, size 0x2000000...
    
    Loop 1:
                       Stuck Address: testing   0
    i = 4475608, p1 = 0x83112b60, *p1 = 0x83112b60, t = 0x83112b60
    i = 4475609, p1 = 0x83112b64, *p1 = 0x7ceed49b, t = 0x7ceed49b
    i = 4475610, p1 = 0x83112b68, *p1 = 0x7ceed497, t = 0x83112b68
    i = 4475611, p1 = 0x83112b6c, *p1 = 0x83112b6c, t = 0x7ceed493
    i = 4475612, p1 = 0x83112b70, *p1 = 0x7ceed48f, t = 0x83112b70
    i = 4475613, p1 = 0x83112b74, *p1 = 0x83112b74, t = 0x7ceed48b
    i = 4475614, p1 = 0x83112b78, *p1 = 0x7ceed487, t = 0x83112b78
    i = 4475615, p1 = 0x83112b7c, *p1 = 0x83112b7c, t = 0x7ceed483
    i = 4475616, p1 = 0x83112b80, *p1 = 0x7ceed47f, t = 0x83112b80
    i = 4475617, p1 = 0x83112b84, *p1 = 0x83112b84, t = 0x7ceed47b
    i = 4475618, p1 = 0x83112b88, *p1 = 0x7ceed477, t = 0x83112b88
    i = 4475619, p1 = 0x83112b8c, *p1 = 0x83112b8c, t = 0x7ceed473
    i = 4475620, p1 = 0x83112b90, *p1 = 0x7ceed46f, t = 0x83112b90
    i = 4475621, p1 = 0x83112b94, *p1 = 0x83112b94, t = 0x7ceed46b
    i = 4475622, p1 = 0x83112b98, *p1 = 0x7ceed467, t = 0x83112b98
    i = 4475623, p1 = 0x83112b9c, *p1 = 0x83112b9c, t = 0x7ceed463
    i = 4475624, p1 = 0x83112ba0, *p1 = 0x7ceed45f, t = 0x83112ba0
    i = 4475625, p1 = 0x83112ba4, *p1 = 0x83112ba4, t = 0x7ceed45b
    i = 4475626, p1 = 0x83112ba8, *p1 = 0x7ceed457, t = 0x83112ba8
    i = 4475627, p1 = 0x83112bac, *p1 = 0x83112bac, t = 0x7ceed453
    i = 4475628, p1 = 0x83112bb0, *p1 = 0x7ceed44f, t = 0x83112bb0
    i = 4475629, p1 = 0x83112bb4, *p1 = 0x83112bb4, t = 0x7ceed44b
    i = 4475630, p1 = 0x83112bb8, *p1 = 0x7ceed447, t = 0x83112bb8
    i = 4475631, p1 = 0x83112bbc, *p1 = 0x83112bbc, t = 0x7ceed443
    i = 4475632, p1 = 0x83112bc0, *p1 = 0x7ceed43f, t = 0x83112bc0
    i = 4475633, p1 = 0x83112bc4, *p1 = 0x83112bc4, t = 0x7ceed43b
    i = 4475634, p1 = 0x83112bc8, *p1 = 0x7ceed437, t = 0x83112bc8
    i = 4475635, p1 = 0x83112bcc, *p1 = 0x83112bcc, t = 0x7ceed433
    i = 4475636, p1 = 0x83112bd0, *p1 = 0x7ceed42f, t = 0x83112bd0
    i = 4475637, p1 = 0x83112bd4, *p1 = 0x83112bd4, t = 0x7ceed42b
    i = 4475638, p1 = 0x83112bd8, *p1 = 0x7ceed427, t = 0x83112bd8
    i = 4475639, p1 = 0x83112bdc, *p1 = 0x83112bdc, t = 0x7ceed423
    i = 4475640, p1 = 0x83112be0, *p1 = 0x7ceed41f, t = 0x83112be0
    i = 4475641, p1 = 0x83112be4, *p1 = 0x83112be4, t = 0x7ceed41b
    i = 4475642, p1 = 0x83112be8, *p1 = 0x7ceed417, t = 0x83112be8
    i = 4475643, p1 = 0x83112bec, *p1 = 0x83112bec, t = 0x7ceed413
    i = 4475644, p1 = 0x83112bf0, *p1 = 0x7ceed40f, t = 0x83112bf0
    i = 4475645, p1 = 0x83112bf4, *p1 = 0x83112bf4, t = 0x7ceed40b
    i = 4475646, p1 = 0x83112bf8, *p1 = 0x7ceed407, t = 0x83112bf8
    i = 4475647, p1 = 0x83112bfc, *p1 = 0x83112bfc, t = 0x7ceed403
    i = 4475648, p1 = 0x83112c00, *p1 = 0x7ceed3ff, t = 0x83112c00
    i = 4475649, p1 = 0x83112c04, *p1 = 0x83112c04, t = 0x7ceed3fb
    i = 4475650, p1 = 0x83112c08, *p1 = 0x7ceed3f7, t = 0x83112c08
    .
    .
    .
    i = 4475833, p1 = 0x83112ee4, *p1 = 0x83112ee4, t = 0x7ceed11b
    i = 4475834, p1 = 0x83112ee8, *p1 = 0x7ceed117, t = 0x83112ee8
    i = 4475835, p1 = 0x83112eec, *p1 = 0x83112eec, t = 0x7ceed113
    
    

  • The memory should have random data at power up, so i'm still not sure where 0xd25ad25a is coming from.  It could be data from the training that is performed during initialization, but i've never seen that before.

    I think the warm reset test is showing that writes are failing, especially if you think the data in the memory is stale data from the previous test run.  

    The last log seems to be showing a failure on every other address, while the one previous to that shows solid failures on consecutive addresses (am i reading that right?). The failures seems to be in groups of addresses and are more or less on a bank boundary.  Is there anything out of the ordinary with the BA signals on the board (ie, different routing relative to other address signals, VTT termination different, etc.)

    Regards,

    James

  • I heard back from ISSI and they said there is an issue with false CA Parity alarms on these older die revs, and the command gets skipped. Does that sound like it could be the source of missing writes? Would that also indicate where we might see these errors, e.g. on a certain address boundary?

  • Ok, well that might be an issue.  If the memory thinks there is a CA Parity error, it will ignore the command.  So that may be why you see stale data in that region.  CA parity is enabled by default in the DDR tool.

    You can disable CA parity in DRAM Timing A section of the DDR tool.  I think that should be all you need to do.  Give it a try with the new configuration

    Regards,

    James

  • I had already discovered I could disable CA Parity and gave it a try. Sure enough it passed all memory tests and I have it booting Linux! What if any are the ramifications to disabling CA Parity? 

    ISSI is sending us newer date code parts and we'll most likely rework our prototypes to verify the new parts work. I'm a little frustrated that there are no errata mentioning this and there are known marginal parts in the supply chain.

    I think the mystery is solved. I really appreciate all your help. 

    Greg