TMS320DM8168CCYGA2 BCH-8 ECC implementation

Glenn Varnon42

Other Parts Discussed in Thread: 4430

I am trying to verify the BCH-8 ECC code in a flash device that was generated by a TI TMS320DM8168CCYGA2. We are trying to get a programmer to program a flash device that will generate the same BCH code that the TI part generates. For some reason the BCH codes do not match. We need to understand how the code is generated in your chip.

over 9 years ago

0 Sivaraj Kuppuraj over 9 years ago

TI__Mastermind 35645 points

Hi,

Moving your post to right forum to be better answered.

Thanks & regards,
Sivaraj K

0 Glenn Varnon42 over 9 years ago in reply to Sivaraj Kuppuraj

Prodigy 100 points

What forum does it belong to?

0 Pavel Botev over 9 years ago

TI__Guru**** 170625 points

Hi Glenn,

You can flash files in NAND from u-boot, linux kernel and CCS tool. From what I understand you need more details on the CCS tool. Please see if the below pointers will help:

processors.wiki.ti.com/index.php
processors.wiki.ti.com/.../TI81XX_PSP_Flashing_Tools_Guide
processors.wiki.ti.com/.../DM816x_AM389x_PSP_Flashing_Tools_Guide
processors.wiki.ti.com/.../TI81XX_PSP_FAQ
processors.wiki.ti.com/.../Raw_NAND_ECC
processors.wiki.ti.com/.../TI81XX_PSP_NAND_Driver_User_Guide
www.ti.com/.../spry164.pdf

BR
Pavel

0 Glenn Varnon42 over 9 years ago in reply to Pavel Botev

Prodigy 100 points

What I really need to know is how the TMS320DM8168CCYGA2 generates the BCC-8 ECC code.

0 Pavel Botev over 9 years ago in reply to Glenn Varnon42

TI__Guru**** 170625 points

Glen,

Refer to:

DM816x datasheet, section 9.8 General-Purpose Memory Controller (GPMC) and Error Locator Module (ELM)
DM816x TRM, sections 1.15 1.15 Error Location Module, chapter 9 GPMC

BR
Pavel

0 Glenn Varnon42 over 9 years ago in reply to Pavel Botev

Prodigy 100 points

Well, my understanding is that when the TMS320DM8168CCYGA2 writes the BHC error correction code to the user area of the Micron MT29F4G16ABADAH4-IT we use four BCH-8bit (13 bytes long) ECC code for each 2K block (one for each 512 bytes). Does that sound correct?

0 Pavel Botev over 9 years ago in reply to Glenn Varnon42

TI__Guru**** 170625 points

Glenn,

See if the below wiki page will help:
processors.wiki.ti.com/.../TI81XX_PSP_UBOOT_User_Guide

BR
Pavel

0 Glenn Varnon42 over 9 years ago in reply to Pavel Botev

Prodigy 100 points

I looked at the user guide and it tells me how to command the chip to output the 8 bit BCH ECC to a NAND chip. What I really need is the algorithm that generates the 8 bit BCH error correction code. I am trying to get my programmer to generate the exact error correction code that this chip does.

Currently I have some C code that I am using to generate the BCH-8 ECC but it is not giving me the same results that this chip is giving me. So I thought I would ask if any one has any C source code that successfully emulated the TI TMS320DM8168CCYGA2's BCH-8 ECC generation to see maybe where I went wrong.

0 Pavel Botev over 9 years ago in reply to Glenn Varnon42

TI__Guru**** 170625 points

Glenn,

Regarding SW BCH8 ECC, I can provide you the below pointers:

processors.wiki.ti.com/.../Linux_Core_NAND_User's_Guide
processors.wiki.ti.com/.../Error_Correction_User_Guide

e2e.ti.com/.../214826
e2e.ti.com/.../1415407

u-boot/arch/arm/cpu/arm_cortexa8/omap3/board.c
u-boot/drivers/mtd/nand/omap_gpmc.c
u-boot/drivers/mtd/nand/nand_base.c

BR
Pavel

0 Matthijs van Duin over 9 years ago in reply to Glenn Varnon42

Mastermind 8040 points

Although Pavel already linked to my detailed explanation (second e2e link) let me summarize: it's exactly the same as a standard CRC32 calculation, except:

1. it's considerably longer: 104-bit. If you have an uint128_t available then this is not really an issue.
2. no pre/post inversion

And that's it! The "CRC" polynomial (including leading 1-bit) is: bch8 = 0x115f914e07b0c138741c5c4fb23

All usual speed-ups for CRCs apply. Unlike typical CRCs, the bch8 polynomial is not irreducible but splits into eight factors of degree 13, so you can also calculate two 52-bit or four 26-bit CRCs in parallel and then recombine the results at the end. This may be considerably faster.

UPDATE: added missing leading 1-bit

0 Glenn Varnon42 over 9 years ago in reply to Matthijs van Duin

Prodigy 100 points

That is interesting. Where did you get the polynomial?

Below is my code for creating the CRC for the 104 bit BCH code. I do not have unit128 (boy I wish I did) since I'm writing in Microsoft C#. But I should be your typical CRC generation code just exapnded to a UInt64 (64 bit number). This is as big as I can go. Take a look and let me know if this general code would work with the only addition being the use of "unit128.

Basically, this code lets you select a file, then run the text data in it through the CRC process.

*******************************************************************************************************************************************

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace CR_104
{
public partial class Form1 : Form
{
public static class Constraint
{
public static String CRC_Version = "0.0";
public static int FALSE = 0;
public static int TRUE = 1;

}
public static class Global_variables
{
public static String FileData = "F"; // can change because not const
public static String File_Name = "F";
public static String data = "F";
public static System.IO.StreamReader Datafile;
public static UInt64 crc_tab104_init = 0;
public static UInt64[] crc_tab104 = new UInt64[256];
//public static BigInteger crc = new BigInteger(0);
;

}

public Form1()
{
int i =0;
InitializeComponent();

}
/****************************************************************************************************************************************
* Start Button:
* When you hit the start button, the data from the file selected will be manipulated to provide a BCH-8 output for the entire file.
*
* *********************************************************************************************************************************************
*/

private void Start_Button_Click(object sender, EventArgs e)
{

UInt64 crc_code = 0;
string Data; // Clear out DATA.
int number_of_elements = 0;

Data = Global_variables.FileData;

while (number_of_elements < Data.Length )
{

crc_code = update_crc_104(crc_code,Data[number_of_elements]);
++number_of_elements;

}

string hexValue = crc_code.ToString("X");

Output_Data_Box.Text = String.Empty; // This clears the data in the box.
Output_Data_Box.Text += hexValue; // This puts the data in the box.

}

private void Form1_Load(object sender, EventArgs e)
{

}

/********************************************************************************************************************************
*
* File Select button:
* This opens the file and puts the data into "FileData".
*
* ******************************************************************************************************************************
*/

private void File_Select_Button_Click(object sender, EventArgs e)
{
if (openFileDialog1.ShowDialog() == DialogResult.OK)
{

if ((openFileDialog1.OpenFile()) != null)
{
System.IO.StreamReader Datafile = new System.IO.StreamReader(openFileDialog1.OpenFile());
Global_variables.FileData = Datafile.ReadToEnd();

File_Selected_Box.Text = String.Empty; // This clears the data in the box.
File_Selected_Box.Text = "" + openFileDialog1.FileName + ""; // This puts the data in the box.
}

}
}

//***************** If any one tries to put data in the data box, clear it out. ***************************************
private void Output_Data_Box_TextChanged(object sender, EventArgs e)
{

}
//********************* If anyone tries to put data in the file box, clear it out. ******************************
private void File_Selected_Box_TextChanged(object sender, EventArgs e)
{

}

public void init_crc104_tab()
{

int i, j;
UInt64 crc = 0;
UInt64 c;

UInt64 P_104 = 1583319698367058823; // This is the decimal value for Hex value = 0x15f914e07b0c138741c5c4fb23

for (i = 0; i < 256; i++)
{
//crc = 0;

c = (UInt64)i;

for (j = 0; j < 8; j++)
{
if (((crc ^ c) & 0x01L)>0) crc = (crc >> 1) ^ P_104;
else crc = crc >> 1;

//c = c >> 1;
}

Global_variables.crc_tab104[i] = crc;
}

Global_variables.crc_tab104_init = 1;

}

public UInt64 update_crc_104(UInt64 crc, char next_data_byte)
{

UInt64 tmp, long_c;
UInt64 decimal_value = 0;

decimal_value = Convert.ToUInt64(next_data_byte);

long_c = (0xffL & (UInt64) next_data_byte);

if (Global_variables.crc_tab104_init < 1) init_crc104_tab();

tmp = crc ^ long_c;
crc = (crc >> 8) ^ Global_variables.crc_tab104[tmp & 255];

return crc;
}

}

}

0 Glenn Varnon42 over 9 years ago in reply to Matthijs van Duin

Prodigy 100 points

So I rewrote my code to add something called BigInteger to my code. This allowed me to create a CRC for a text file that I read that had some data and create the CRC.

I used a known set of data that was loaded into a flash device by the TI TMS320DM8168CCYGA2. I then loaded that file data into my program and let it calculate a CRC for the data (in hex).

Here is the data that the TI chip says should be put in the OOB area of the nand flash device:

ff ff a8 ac 28 af b4 30
81 07 cf 4e 66 1c dc 00
1b b9 70 e6 f7 5f 09 a9
e2 b9 0d 7b 11 00 85 ed
70 73 40 29 e3 44 ee 7b
ee 4e f8 00 13 34 42 6b
ce 7b 49 a6 e3 be 54 37
59 00 ff ff ff ff ff ff

I can send you the data if you want to check that out and see if you can get a similar output.

Here is the outcome of my program.

4A93130DE595DDE.

As you can see, there are many more bytes that are produced by the BCH-8 algorithm than my CRC produces.

If you want, I can send you the code.

0 Matthijs van Duin over 9 years ago in reply to Glenn Varnon42

Mastermind 8040 points

Glenn Varnon42 said:
That is interesting. Where did you get the polynomial?

The GPMC documentation is reasonably detailed except for the actual BCH parameters used. Using this knowledge the polynomial for each of the three BCH codes was easily obtained by feeding test data into the GPMC: feeding it all-zeros except for a final 0x01 byte makes the ECC code equal to the polynomial (except for the leading 1-bit, which is actually the 0x01 byte of the data).

CAUTION: the GPMC chapter is damaged is several TRMs including the one of the DM816x, specifically all superscripted text seems to have disappeared there, which is rather inconvenient when discussing polynomials, or powers of two. An undamaged version can be found e.g. in the OMAP 4430 TRM, chapter 15.4.4.13.3 for the ECC calculation.

I then further examined the polynomials using PARI/GP, which revealed that the BCH4 polynomial was a factor of the BCH8 polynomial which itself was a factor of the BCH16 polynomial, i.e. they all came from the same BCH family, which makes perfect sense. Factoring the BCH4 polynomial showed one factor to be x¹³+x⁴+x³+x+1, which is a nice pentomial and the lexicographically first primitive GF(2)-polynomial of degree 13, hence the most obvious choice for constructing the field GF(2¹³). If we call its generator α and let f_k be the minimum polynomial of α^k then the BCH4 polynomial is f₁ · f₃ · f₅ · f₇ and similarly the BCH8 and BCH16 polynomials are the products of f_k for all odd k up to 15 and 31 respectively. (The even k play no role because f_2k = f_k for all k.)

Unless you have a solid background in algebra the paragraph above may be rather hard to follow. There's also a DSP way of interpreting it: the field GF(2¹³) has 8191-th roots of unity, which is precisely what you need to be able to perform a length-8191 discrete fourier transform. The ECC code that's appended to the data (at least formally, even if it's physically stored elsewhere) ensures that the first 8 odd spectrum lines (in case of BCH8) are zero. Calculation of such a code is easily done in the time domain using an IIR filter of order 104 (all poles, no zeros). Since we're doing arithmetic in finite fields, all filters are stable so there's no need to flee in terror upon seeing that filter order ;-) If later any of those spectrum lines turn out nonzero, reconstructing which combination of impulses is responsible is left as an exercise to the reader.

(Note there's actually no specific reason to select the number of correctible errors to be a power of two. If for example the OOB area of your NAND flash has enough room to accomodate 18 ECC bytes per sector then you could use BCH11 by letting the GPMC calculate the BCH16 code while writing data and then reducing it to BCH11 in software. Reading would be a bit trickier, although configuring GPMC and ELM to BCH8 would at least work for up to 8 errors, more creativity would be needed to get the full 11 bits of error correction.)

Below is my code for creating the CRC for the 104 bit BCH code. I do not have unit128 (boy I wish I did) since I'm writing in Microsoft C#. But I should be your typical CRC generation code just exapnded to a UInt64 (64 bit number).

There's something I do want to comment about this fairly commonly seen implementation (slightly rewritten from your version thereof):

code = (code >> 8) ^ table[(code & 255) ^ input_u8];

Both CRC32 (the most common variant) and the BCH codes used by TI are big-endian, hence bytes should actually be shifted in from the right:

code = (code << 8) ^ table[(code >> (NUMBITS-8)) ^ input_u8];

The reason the first version works nevertheless is because every step of the calculation operates only on entire bytes. It requires that you perform byte-reversal when reading or storing multi-byte quantities (such as the final code itself), but since the data is supposed to be big-endian and you're using a little-endian platform this byte-reversal in fact already happens implicitly, and therefore the right-shift version actually avoids having to perform byte-reversal to compensate for the different endianness. This matters in particular if you process data in larger chunks, e.g.

code ^= input_u32le; // wrong endianness compensated by wrong shift direction
code = (code >> 8) ^ table[code & 255];
code = (code >> 8) ^ table[code & 255];
code = (code >> 8) ^ table[code & 255];
code = (code >> 8) ^ table[code & 255];

It's a nice trick, but it is something to beware of since it somewhat obfuscates the code, requires the table entries to be byte-reversed, and it fails spectacularly if you ever want to process a chunk of data smaller than a byte. For example, if the current code is 0x1100 then repeatedly shifting in four zero-bits would need to have the following sequence of intermediate codes:

0x120000
0x200100
0x001200
0x002001
0x000002 ^ table[ 0x01 ]
0x000000 ^ table[ 0x12 ]

This very clearly reveals it's actually left-shifting, but byte-reversed.

So I rewrote my code to add something called BigInteger to my code.

Yeah I was going to suggest that, although it'll be slow. A faster way to calculate BCH8 would be to use a pair of 64-bit integers to simulate a 128-bit one, or split the whole calculation into two parallel ones using polynomials 0x14523043ab86ab and 0x141ef659bed539 and then recombine the two results at the end using the Chinese Remainder Theorem.

You could also practice first with generating BCH4 codes, since those fit in a 64-bit integer, although to be able to compare with the GPMC on the DM816x you'd need to use the same wrong polynomial (see its errata) and accept that the code is useless for error-correcting purposes.

I used a known set of data that was loaded into a flash device by the TI TMS320DM8168CCYGA2.

BTW, there's no reason to be this specific about the processor. Exactly the same BCH8 code is used on numerous TI SoCs. The only difference is that some (including the DM816x) have a bug in its BCH4 implementation, and older ones (e.g. OMAP3) do not support BCH16.

Here is the data that the TI chip says should be put in the OOB area of the nand flash device:

I've reformatted it to make its structure more clear:

ff ff // bad block mark
a8 ac 28 af b4 30 81 07 cf 4e 66 1c dc 00 // ecc for sector 0
1b b9 70 e6 f7 5f 09 a9 e2 b9 0d 7b 11 00 // ecc for sector 1
85 ed 70 73 40 29 e3 44 ee 7b ee 4e f8 00 // ecc for sector 2
13 34 42 6b ce 7b 49 a6 e3 be 54 37 59 00 // ecc for sector 3
ff ff ff ff ff ff // unused space

Beware of the null-byte appended to each ecc as padding.

4A93130DE595DDE

That looks implausibly short. In fact it would still fit in a 64-bit integer, so something clearly went very wrong there.

I suggest validating your code first with the simple test case I mentioned at the top of this post: a sector consisting of 511 0x00 bytes and a final 0x01 byte should produce the polynomial as ECC code (except for the leading 1-bit, which is in fact the 0x01 of the last data byte):

15 F9 14 E0 7B 0C 13 87 41 C5 C4 FB 23

UPDATE: I just noticed my previous post accidently omitted the leading 1-bit of the polynomial; I have corrected this now.

Processors

Processors forum

TMS320DM8168CCYGA2 BCH-8 ECC implementation