TMS320C6748: Sporadic corruption of SATA Read data

joeb

Part Number: TMS320C6748
Other Parts Discussed in Thread: MATHLIB

We are experiencing sporadic corruption of SATA read data using the TMS320C6748 hosted on both a custom developed circuit card assembly, and the TMS320C6748 LCDK, interfacing with a SATA II drive (have used several drives, including an M.2 SATA II solid state drive, and a WD SATA II HDD).

We are using the following software packages:
- XDC Tools 3.25.6.96
- SYS/BIOS 06.37.3.30 (includes FatFS R0.08a)
- C6748 PDK 2.00.00.00 (includes BIOSPSP 3.00.01.00 and NSP 1.10.00.03)
- EDMA 02.11.14.18
- DSPLIB 3.4.0.0
- MATHLIB 3.1.0.0
- C6000 Code Generation Tools (CGT) 7.4.21
- C6748 StarterWare 1.20.04.01
- SATA driver and Block Media driver from BIOSPSP 1.30.00.05

We are using Code Composer Studio 7.1.0.00016.

Some data points of interest:
- We are experiencing the same issues on our target CCA hosting the TMS320C6748, and also on the L138/C6748 Development Kit (LCDK…board rev A6). We originally thought we had layout issues on our target, but when the same behavior was seen on the LCDK, we considered our target layout validated.
- In the instances in which a sector read of the M.2 SATA II solid-state drive results in corrupted data, an external SATA Analyzer in series (Teledyne LeCroy Sierra M6-2) reports seeing the correct data, even though the C6748 returns corrupted data in our software application
- We have a software workaround in place that performs multiple reads of the same sector, which seems to alleviate the corruption. We added this workaround at the Block Media layer at the point at which a SATA sector read is commanded. We read the same sector multiple times until we get two successive reads that return the same sector contents. Most of the time, the first 2 reads return the same data (i.e. no corruption). However, roughly 0.028% of the time a 3rd read is required (i.e. data returned by first 2 reads doesn’t match, but 3rd read matches 2nd read). This works out to 280 instances out of 1,000,000 sector reads.
- If we extend our software workaround to the SATA write path (i.e. multiple reads following a SATA write to validate the data written), we have very rare cases in which we have 2 successive corroborating reads that do not match the data written. In this case, we perform the write again and validate. We have seen this at a rate of 0.0013% (e.g. 13 instances out of 1,000,000 sector writes).
- The M.2 SATA SSD and WD SATA HDD work perfectly when mounted by Linux, so the drives themselves are considered validated.

We were suspicious of some of the Port PHY Control Register values (e.g. RXEQ, RXCDR, RXTERM) but haven't been able to come up with a combination that makes any appreciable difference in our corruption. Our current values are assigned as follows:
RXEQ: 0x1 (Adaptive)
LB: 0x1 (Ultra High Bandwidth)
RXCDR: 0x6 (First order, threshold of 1 with fast lock)
RXTERM: 0x1 (Common point set to 0.8 VDDA)

Any assistance or direction would be appreciated.

over 5 years ago

0 Yordan Kovachev over 5 years ago

TI__Guru**** 161600 points

Hi,

I've notified the RTOS team. They will post their feedback directly here.

Best Regards,
Yordan

0 joeb over 5 years ago in reply to Yordan Kovachev

Prodigy 90 points

Thank you...looking forward to hearing their feedback.

0 Mark Mckeown over 5 years ago in reply to joeb

TI__Mastermind 29290 points

Hi,

I will investigate this issue.
When you say the SATA drives work perfectly when mounted by Linux, what hardware are you using?
Where does your Teledyne LeCroy Sierra M6-2 attach to observe the signals?
Does your custom board use the same components, connector, and oscillator as the LCDK EVM?
How do you power the SATA drives?
What data rate are you using? Is it reproducible at slower data rates?
See e2e.ti.com/.../116687
It does not sound like you are running into SPRZ301M Advisory 2.3.22 SATA: Link Establishment Fails With SATA GEN3 Capable Targets. Please confirm.

Regards,
Mark

0 joeb over 5 years ago in reply to Mark Mckeown

Prodigy 90 points

Mark,

Thank you for your assistance.

I agree that the issue is not the PRZ301M Advisory 2.3.22. We deliberately are using SATA II drives (one is an M.2 SATA II SSD from Fortasa, and the other is a WD SATA II HDD) to circumvent that. And, we have absolutely no issues with link establishment. The SATA controller enumerates the drive without issue.

As far as mounting via Linux, we use a M.2 to SATA adapter sled, and then interface to a SATA-to-USB adapter, and then connect the drive to the Linux desktop via USB. Linux mounts the file system, and we can read and write files all day long without any CRC failures in our write/read/compare testing. So we know our drives are good.

When we had the SATA analyzer in the loop, it was inserted serially between the M.2 connector on our custom CCA and the M.2 drive (needed to use some adapters and external SATA cables to wire up).

On our custom board, the connector to the M.2 SSD is a surface mount M.2 connector (on the eval kit, it's a SATA connector). We are using the same 0.01uF coupling caps. Instead of the100 MHz VCO used as a reference clock on the eval kit, we are using a 150 MHz low-jitter high-performance LVDS oscillator on our custom CCA that's designed for SATA/SAS/fiber channel (and of course have assigned the MPY field in the P0PHYCR accordingly to create a PHY PLL output frequency of 1.5 GHz, just like the eval kit). On our custom board, the M.2 drive is powered via a on-board 3.3V supply that's under software control and fed to the drive via the M.2 connector. On the eval kit, the drive is powered externally via a AC/DC power brick.

On our custom board, and the eval kit, we are able to switch between the M.2 SATA II SSD from Fortasa and the WD SATA II HDD via a variety of adapters that we have (e.g. M.2-to-SATA adapter, SATA-to-M.2 adapter, etc).

As I stated, we see the issue on both the eval kit and our custom board. When we do see corruption, it's typically in the first few bytes of the sector. We are using a FatFS file system and block media layer, which never issues a write/read greater than 1 sector.

We are running at GEN 2 speeds (3.0 Gbps). We have tried GEN 1 speeds (1.5 Gbps) and saw corruption, but did not test extensively to characterize. Note that we looked at the TX and RX differential signals using a high speed differential scope, and the eye diagram plots (RX taken between the coupling caps and the DSP itself) look great, so we are not leaning towards a signal integrity issue on our custom CCA (and the fact that we see the issue on the eval kit lends credence to that opinion).

Please let me know if you have any other questions. Thank you,

Joe

0 Mark Mckeown over 5 years ago in reply to joeb

TI__Mastermind 29290 points

Hi Joe,

Very sorry for the late response. Any new updates?

I'm not able to reproduce the issue as I don't have a Gen 2 drive or the right adapter handy.

I asked around and nobody has encountered this type of issue before. I have found out that not many people make use of the SATA on the C6748 due to the Gen3/Gen2 errata.

I will reach out to the automotive and design teams.

-Mark

0 Mark Mckeown over 5 years ago in reply to Mark Mckeown

TI__Mastermind 29290 points

Hi Joe,

After following up with the design team, I learned something for you to try out with the SATA corruption issue.

Most of the problems that were encountered around SATA data corruption were around a race condition between the descriptor completion and the data completion. i.e. the SATA would signal ‘complete’ to the processor and the processor would go read the memory location BEFORE the data from the SATA controller actually landed.

A workaround was achieved by throwing in a some sort of MMR read between the completion signal and reading the data. Would you give that a try?

Regards,
Mark

Processors

Processors forum

TMS320C6748: Sporadic corruption of SATA Read data