Sitara AM3352ZCZ60 BCH (ECC) algorithm source code request

Frantisek Zabecky

Expert 2090 points

Hello,

For mass-production nand flash programming, we need to pre-process raw binary image from our client and modify it in a way compatible with Sitara AM3352ZCZ60 processor GPMC ECC controller. ECC algorithm in use is BCH (BCH-8).

Particularly, our programmer need to calculate required ECC checksums and append them to client's data. Available images don't contain spare area.

So we are looking for BCH algorithm source code or any other kind of clear stuff that can help us to implement necessary software engine.

Frantisek

over 10 years ago

0 Biser Gatchev-XID over 10 years ago

TI__Guru**** 393215 points

Hi Frantisek,

I have forwarded your request to the factory team.

0 Matthijs van Duin over 10 years ago

Mastermind 8040 points

Performing a BCH calculation is basically the same as a CRC calculation, but with a different "polynomial" / "magic constant". Also, the BCH codes are much wider than a typical CRC: a BCH-n tag is n*13 bits wide, so you need a compiler that supports uint128_t or split the code into separate words. Still, since you only need to shift and xor, this isn't too bad.

In this post I identify the particular BCH code used and include an example of converting a BCH-8 tag to a BCH-4 tag (as workaround for older devices whose BCH-4 calculator in the GPMC was broken). It includes a table-driven implementation similar to many CRC32 implementations (although I use a 16-entry lookup table rather than a 256-entry table). Note however that this one operates in "reverse direction" (shifting right rather than left) because it's used to reduce a tag, working backwards from the end.

Using uint128_t and a 256-entry lookup table, the bch-8 code would look something like

typedef uint128_t bch8_t; // 13 bytes (104 bits) used

static bch8_t table[256];

bch8_t calc_bch8( const uint8_t data[], size_t len, bch8_t code = 0 ) {
    for( size_t i = 0; i < len; i++ )
        code = ( code << 8 ) ^ table[ ( code >> 96 & 0xff ) ^ data[i] ];
    // mask off garbage that's in the top 3 bytes
    return code & 0x000000ffffffffffffffffffffffffff;
}

All the usual techniques used to speed up CRC calculation can be applied to BCH also. Precomputing the table is likewise similar, using "CRC polynomial" 0x15f914e07b0c138741c5c4fb23.

Since the data is treated as being in big-endian (the code is shifted left for each data byte), the code must be appended in big-endian order as well. (Note however that little-endian order is used for the code in the GPMC's and ELM's registers.)

0 Matthijs van Duin over 10 years ago in reply to Matthijs van Duin

Mastermind 8040 points

An alternative that processes wordwise:

static bch8_t table[256];  // put the index into byte 13 of each table entry

bch8_t calc_bch8( const uint32_t data[], size_t nwords, bch8_t code = 0 ) {
    for( size_t i = 0; i < nwords; i++ ) {
        code ^= (bch_t)__builtin_bswap32( data[i] ) << (104 - 32);
        code = ( code << 8 ) ^ table[ code >> (104 - 8) ];
        code = ( code << 8 ) ^ table[ code >> (104 - 8) ];
        code = ( code << 8 ) ^ table[ code >> (104 - 8) ];
        code = ( code << 8 ) ^ table[ code >> (104 - 8) ];
    }
    return code;
}

In this case I'm keeping bytes 13-15 of the code zero at all times hence avoid the need for masking. This is maintained by putting the table index (which is byte 12 of the code) into byte 13 of the table entry, which will cancel against byte 13 of ( code << 8 ). You can use the same principle to process per doubleword (or any other amount up to 13 bytes at a time), so if efficiency is a concern you can experiment with that (and with alternatives to uint128_t which is likely to be slow if supported at all).

0 Matthijs van Duin over 10 years ago in reply to Matthijs van Duin

Mastermind 8040 points

And as a final remark, note that BCH (like CRC, ignoring pre/post-inversion done by some CRCs) is xor-linear, meaning bch8(data1 ^ data2) = bch8(data1) ^ bch8(data2). This is a really convenient property for updating a code when applying a small patch (as long as it doesn't move data).

data ^= patch;
code ^= bch8( patch );

Since the patch will be mostly zeros, you can "fast-forward" through those instead of having to read data from memory (or flash even). An additional benefit is that if the original (data,code) pair contains a correctable error, the patched one is still correctable, while if you discard and recalculate the code the data would be corrupted in that case, so this trick avoids the need to first perform error correction (which is really not-funny-complicated if you don't have a hardware engine like ELM to do it for you).

0 Frantisek Zabecky over 10 years ago in reply to Matthijs van Duin

Expert 2090 points

I'm affraid I'm lost somewhere in your thoughts...
From your words, it seems that Sitara's GPMC can calculate BCH-8 correctly.
There is some problem for other numbers of desired corrected errors, but it is out of the scope actually, since I need exactly BCH-8. Am I right?

You identified primitive polynomial 0x201B (x^13+x^4+x^3+x+1).
Can I use it with m=13 and t=8 (number of corrected bits) and any general BCH implementation, e.g. with the one in Linux kernel MTD driver to get the same results as Sitara produces?

I would prefer to use something what I already have (and have tested) than implement 128-bit arithmetic...

Frantisek

0 Matthijs van Duin over 10 years ago in reply to Frantisek Zabecky

Mastermind 8040 points

I'm sorry if I caused confusion by referencing that post. The BCH-4 issue it specifically addresses is not relevant here, I merely referenced it because it contained an explanation of the BCH algorithms used by the GPMC.

Yes you should be able to use any standard implementation, such as the one in linux. (In fact, older TI SoCs didn't have ELM yet so there linux is responsible for doing the error correction, so it can definitely be configured to the format that GPMC uses.)

Note also that if you're going to be preparing the images on a modern 64-bit host machine the compiler most likely has a built-in 128-bit integer type already, so generating the ECC data directly actually seems easier to me than using the MTD driver's implementation, but you can choose whichever you prefer.

0 Wolfgang Muees1 over 10 years ago

Genius 3685 points

Frantisek,

your programmer will be a very complex software:

- ECC

- Bad Block handling

- File System handling (you NEED a NAND file system even for a Read-Only Image on modern MLC NAND chips).

regards

Wolfgang

0 Matthijs van Duin over 10 years ago in reply to Wolfgang Muees1

Mastermind 8040 points

Good point, calculating the ECC is a triviality compared to coping with bad blocks (including having to adapt the filesystem image). They can however avoid that if they're just pre-flashing the guaranteed-valid part (usually just the first erase block I think?), i.e. just the bootloader or a small application.

0 Frantisek Zabecky over 10 years ago in reply to Matthijs van Duin

Expert 2090 points

Hi,

Our programmer software is already very complex :-). For our customers we MUST implement ECC (various algorithms), BB handling (also various algorithms) and file system handling (various algorithm, by customers demands). You can check our software in DEMO mode (www.elnec.com/.../pg4uwarc.exe)

So, our application is 32b application and is compiled with 32b compiler, programmer SW is universal and must run on both 32B and 64b systems. Used compiler doesn't support 128b integer types (max. available integer is 64b).

The NAND memories will be programmed in off line mode, whole image must be programmed with the programmer. There is no space program NAND flash in end application.

0 Biser Gatchev-XID over 10 years ago in reply to Frantisek Zabecky

TI__Guru**** 393215 points

Frantisek,

Why don't you contact your nearest TI representative and request support for this?

0 Frantisek Zabecky over 10 years ago in reply to Biser Gatchev-XID

Expert 2090 points

Biser,

Because there is no direct business between Elnec and nearest TI representative. We'll provide service for our customer, which will program chip that were bought somewhere else. Few years ago existed chance to contact "TI technicals" through asktexas@ti.com, where front-end TI staff dispatched question to appropriate person in TI. Now we're "pressed" (exactly we got advice) to use E2E community to get technical support from TI.

If you can provide me direct contacts for any kind of technical questions, you'll seriously help us.

Best regards,
Frantisek

0 Biser Gatchev-XID over 10 years ago in reply to Frantisek Zabecky

TI__Guru**** 393215 points

I have asked the factory team for support on this last Friday, and just now I renewed the request. I suppose the persons involved were out of office.

0 RonB over 10 years ago in reply to Biser Gatchev-XID

TI__Mastermind 30736 points

Frantisek,

I'm contacting you privately.

0 John Westmoreland43 over 9 years ago in reply to RonB

Guru 13945 points

Hello,

I was wondering if I could get the BCH source code also.

I need to run this on a Hercules - but saw this thread and wanted to ask.

Thanks,
jwest

0 Matthijs van Duin over 9 years ago in reply to John Westmoreland43

Mastermind 8040 points

johnw said:

I was wondering if I could get the BCH source code also.

I need to run this on a Hercules - but saw this thread and wanted to ask.

It is probably more useful to create a new thread for this. Note that this thread is about offline programming of NAND flash and therefore only about generating correct ECC codes. Based on you mentioning the Hercules processor I'm guessing you also need to read data from NAND and be able to perform error correction. Generating the codes is really nothing more than a big CRC (see earlier posts in this thread or this one and its follow-up). Performing error correction however is much, much harder. See for example the linux BCH implementation.

0 John Westmoreland43 over 9 years ago in reply to Matthijs van Duin

Guru 13945 points

Hello Matthjis,

Yes - that's what I was talking about - and wondering if TI had code for that. I found some code for the ti81xx which looks like what I am looking for or along those lines - ti81xx_nand.c to be specific. I will look into that more.

Thanks,
John

0 Matthijs van Duin over 9 years ago in reply to John Westmoreland43

Mastermind 8040 points

johnw said:
Yes - that's what I was talking about - and wondering if TI had code for that. I found some code for the ti81xx which looks like what I am looking for or along those lines - ti81xx_nand.c to be specific. I will look into that more.

The OMAP4 and Netra SoCs and their descendents (that includes the dm81xx, am335x, am437x, am57xx) have BCH support implemented in hardware, including error location, so that file will probably just be a driver for that. For a software solution of error location you need to go all the way back to the omap3/am35xx.

Note that doing error location in software can be quite slow. The new linux implementation is relatively fast but also quite complicated, and of course GPL.

0 Frantisek Zabecky over 9 years ago in reply to John Westmoreland43

Expert 2090 points

Hello,

I got all necessary documentation to be able create ECC codes into programmed NAND Flash memories. But I'm not sure if that documentation is not covered by the NDA so I can't dispense it.

Frantisek

0 John Westmoreland43 over 9 years ago in reply to Frantisek Zabecky

Guru 13945 points

Hello Frank,

I am not sure what NDA you are talking about - please PM me if you need to talk about this privately; but I would like to see your documentation if possible.

Thanks,
John W.

0 John Westmoreland43 over 9 years ago in reply to Matthijs van Duin

Guru 13945 points

Hello Matthijs,

Thanks for your replies - I will take a look at the new linux implementation.

The issue I have run into with the Spansion NAND code is wrapped around big endian - the code has evidently been used on little endian and not entirely proven on a big endian processor.

Regards,
John W.

Processors

Processors forum

Sitara AM3352ZCZ60 BCH (ECC) algorithm source code request