DDR3 problem with 6678 board

Jeff Brower73

All-

We are testing DDR3 mem on a custom 6678 board based on the EVM 6678 design. One change is the DDR3 devices, which are Micron MT41K256M16, but the circuit and layout still closely follow the EVM. The only significant differences are:

-15 address bits connected instead of 13

-two (2) chip selects connected to each device instead of one

We are using 67 MHz input and 667 MHz output DDR3 clocks, so we have left timings in the EVM6678L.gel file as-is. We have set

IBANK_POS = 0
IBANK = 3
PAGESIZE = 3
EBANK = 0

If we do a Fill Memory with a 32-bit unsigned constant, we see this:

12345678 1234xxx 12345678 1234xxx

where xxxx indicates intermittent, inconsistent data (tends to be zero, but not always).

Questions:

1) With device placement sequence in the layout the same as the EVM, in the above data pattern, would DQ[63:48] (4th device) be the one not working?

2) If this should be a timing problem, is there a systematic way to adjust leveling and DDR3 device settings in order to "bracket" the issue?

Thanks.

-Jeff
Signalogic

over 11 years ago

0 Bill Taboada over 11 years ago

TI__Mastermind 42545 points

Hi Jeff,

1) The routing of the extra chip select and the extra address lines shouldn't have any effect, assuming that they are routed correctly. Note that the connection of an extra CS implies you are using the twin-die version of the MT41K256M16. As has been stated in the following thread, the dual-rank option for the C6678 has not been verified at this time. Also note that you've set EBANK to 0 so you are currently only using one rank.

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/p/317347/1114777.aspx#1114777

2) I agree that this appears to be a timing problem. We have had a number of problems with customers who have closely followed the routing on the EVM but have not verified that all the routing rules have been met. In one of the responses for the link below you will find the DDR rules spreadsheet. Have you filled out a spreadsheet similar to this to check your routing lengths match the requirements? The leveling is automatic. We don't have a reliable method for adjusting the leveling manually.

http://e2e.ti.com/support/dsp/c6000_multi-core_dsps/f/639/t/245304.aspx

Regards, Bill

0 Jeff Brower73 over 11 years ago in reply to Bill Taboada

Genius 3420 points

Bill-

Thanks for your reply. We are using single-die devices which is why we have EBANK = 0.

Yes we did a critical net list / routing tolerance spreadsheet prior to the design, carefully following the TI 6678 hardware design guide. We are going through that again now.

Could I ask you to confirm my question about which device? We need to make absolutely sure we're focusing on the exact area of the layout that may be having trouble, thanks.

-Jeff

0 Jeff Brower73 over 11 years ago in reply to Jeff Brower73

Genius 3420 points

Bill-

Also, can you send us the latest PHY_CALC spreadsheet? It appears the various TI links for this are currently broken.

Thanks.

-Jeff

0 Bill Taboada over 11 years ago in reply to Jeff Brower73

TI__Mastermind 42545 points

Jeff-

From the numbers you've provided it's difficult to tell which device is failing. It depends on the endian and the how the data is displayed. Ideally, you can set up a DMA transfer so that the data is accessed in 64bit transfers. The phycalc spreadsheet is attached. The link should be fixed soon.

Regards, Bill

4530.sprabl2a.zip

0 Jeff Brower73 over 11 years ago in reply to Jeff Brower73

Genius 3420 points

Bill-

Thanks again. Here is an update.

1) We are set up as little endian. If this were an EVM 6678, which device would it be? From that I can translate to our board.

2) We are using a Rev 1.0 silicon device. We have checked all errata.

3) We went through the PHY_CALC spreadsheet and used resulting values to modify our version of evm6678l.gel (the values were not much different). No improvement.

4) We reduced clock rate from 1333 MHz to 1266 MHz. It works. We are now doing more complex memory tests, but so far everything is passing.

Question -- if we suspect a "victim route" or other one-off problem affecting a byte lane or one out of the four DDR3 devices, and we go in with our highest speed digital scope, based on the above info, what signals would you start with first?

Thanks.

-Jeff

0 Bill Taboada over 11 years ago in reply to Jeff Brower73

TI__Mastermind 42545 points

Hi Jeff,

If you are doing 32bit accesses in little endian starting with address 0x00 followed by 0x04 etc, I would suspect that you are seeing a problem with the memory attached to data bits 47-32. Using code composer is another way you can tell. Bring up the SDRAM in a memory window configured to display memory in hex64bit mode. That should point to the data lines that are failing pretty quickly.

When the memory is failing, are you seeing the same wrong value returned for each write or does the value change. The same wrong value would indicate a write failure. A different value would indicate a read failure. It's odd that you were seeing zeros on that word and that it was effecting both bytes of the device. Since accesses are always in burst we normally see an incorrect value from an adjacent memory location. The fact that it's both bytes implies that it's a problem with a control signal or an address line.

Regards, Bill

0 Jeff Brower73 over 11 years ago in reply to Bill Taboada

Genius 3420 points

Bill-

We are using CCS Memory Fill and Memory Browser window -- I didn't see an option for hex64bit . For the results I gave in my initial post, we are using "32-bit unsigned integer" for "Type-size". If that changes your suspicion away from bits 47-32 please let me know.

Re. incorrect values -- yes they are intermittent. Scrolling the memory display window causes changes in the afflicted byte lanes, so it does appear to be a read problem. Among DDR3 control signals is there a "usual suspect' that is likely to succumb to marginal timing?

Also another data point: the board has the ECC memory device populated, but currently ECC is not enabled in our Gel file DDR3 config. Could that alter termination of control signals and cause a problem?

Thanks.

-Jeff

0 Bill Taboada over 11 years ago in reply to Jeff Brower73

TI__Mastermind 42545 points

Hi Jeff,

I don't suspect that the unpopulated ECC memory would cause this type of issue. Normally when we see this type of error we're dealing with a data length matching error. Are you seeing any failures during leveling? The fact that it effects both bytes on one device could just be an artifact of the close proximity of the data balls on that device. I would double check the lengths on the data lines and ensure that the lengths match your spreadsheet. Be sure to account for any portion of the trace that is routed an outer layer. These lengths need to be adjusted due to the different propagation delay on the outer layer. In addition, be sure you are using the latest phycalc and regcalc spreadsheets.

Regards, Bill

Processors

Processors forum

DDR3 problem with 6678 board