RTOS/AM3358: Possible cache issue with multi-threading

Sean Bedford

Part Number: AM3358

Tool/software: TI-RTOS

Good evening all,

I have a problem where I am seeing the first 56 bytes of either a 128k global buffer array or a calloc'ed buffer get corrupted when a isr has fired. However if I move this buffer to be a local the problem goes away.

The code in question is a ftp client running with LWIP. We are using a custom AM335x PCB, CCS v6.1.1.00022, SYS/BIOS v642335, XDC tools v331024_core. I have cut the code back as far as I can where I can still see the bug so just the ftp and LWIP isr's running and as far as I can tell the corruption only happens when an isr fires during this code in the ftp task and either gp_ftp_buffer or the buffer inside the MlsdBuffer class get the corruption.

						fs_fread((uint8_t *)gp_ftp_buffer, btr, &br, ftp_instance->file_read);
//						sys_prot_t hwi_state = sys_arch_protect();
						ftp_instance->MlsdBuffer->push(gp_ftp_buffer, br);

						if(ftp_instance->bytes_read == 0)
						{
							char buf[4];
							//ftp_instance->MlsdBuffer->push(gp_ftp_buffer, br);
							ftp_instance->MlsdBuffer->peep(buf,4);
							if((gp_ftp_buffer[3] != 'S') || (buf[3] != 'S'))
								stdout("Bad start to file!!! %.2x %.2x", (int)gp_ftp_buffer[3], (int)buf[3]);
							ftp_instance->dbg_file_start = 1;
							ftp_instance->tx_state = SM_FTP_TRANSFER_IN_LIMBO;
						}
//						sys_arch_unprotect(hwi_state);

As I said if i move the buffer to be local the problem goes but this is quite a large application and we have seen corruption else where that may or may not be related so I would like to get to the bottom of this issue and understand what is going on.

I have ruled out:

Any particular part of the task code being the cause of the corruption by detecting where the corruption can happen and introducing delays to make the isr call happen in different places.
stack or heap overflow by its position in the memory map, observing that nothing around it gets corrupted and doubling all the stacks and heaps to make sure.

What I am left with is suspecting that this is cache related and that how I have the cache setup/how i am using it is not thread safe. When I disable the cache or set bufferable to false in the mmu setup for the ddr3 region I have not observed any corruption however the performance is terrible and LWIP keeps reporting asserts and eventually kills the connection. Does this explanation sound sound feasible? If so what can I try to stop the corruption without ruining the performance?

This is how I set up the mmu and cache in my project cfg file:

var Cache = xdc.useModule('ti.sysbios.family.arm.a8.Cache');
var Mmu = xdc.useModule('ti.sysbios.family.arm.a8.Mmu');

// Enable the cache
Cache.enableCache = true;

// Enable the MMU (Required for L1/L2 data caching)
Mmu.enableMMU = true;

// Force peripheral section to be NON cacheable
var peripheralAttrs = {
    type : Mmu.FirstLevelDesc_SECTION, // SECTION descriptor
    tex: 0,
    bufferable : false, // bufferable
    cacheable : false, // cacheable
    shareable : false, // shareable
    noexecute : true, // not executable
};

// Set the descriptor for each entry in the address range
    for (var i=0x44000000; i < 0x80000000; i = i + 0x00100000) {
        // Each 'SECTION' descriptor entry spans a 1MB address range
        Mmu.setFirstLevelDescMeta(i, i, peripheralAttrs);
    }
    
    // descriptor attribute structure
    var attrs = {
        type: Mmu.FirstLevelDesc_SECTION,  // SECTION descriptor
        tex: 0x1,
        bufferable: true,                  // bufferable
        cacheable: true,                   // cacheable
    };

    // Set the descriptor for each entry in the address range
    for (var i=0x80000000; i < 0x90000000; i = i + 0x00100000) {
        // Each 'SECTION' descriptor entry spans a 1MB address range
        Mmu.setFirstLevelDescMeta(i, i, attrs);
    }

I am at a loss where to go with this now any advice on what I am doing wrong would be greatly appreciated.

Thanks

Sean

over 8 years ago

0 ToddMullanix over 8 years ago

TI__Guru* 96960 points

I expect some buffer in LWIP is not on a cache boundary. What's right before the "first 56 bytes of either a 128k global buffer array" that is getting corrupted?

Curious...we aren't you using TI's NDK?

Todd

0 Sean Bedford over 8 years ago in reply to ToddMullanix

Expert 1560 points

Hi Todd,

Neither of the corrupted buffers are in lwip, they are in our own ftp code.

Right before the gp_ftp_buffer is other global variables from this file and two more files then there is the heap, specifically a semaphore and task handle are directly before. And the other buffer is on the heap so other dynamically allocated memory most likely but like I said I see nothing else get effected, only the two buffers that are actively being used at the interrupt so I don't see that this is relevant.

To my knowledge ti's NDK does not support the am355x processors which is why we were forced to port lwip and drivers from starterware into a sys/bios project.

Sean

0 Sean Bedford over 8 years ago

Expert 1560 points

I have done some further investigations and the following code fixes the corruption of the buffer's whilst keeping the performance at an acceptable level.

						OxTS_fread((uint8_t *)gp_ftp_buffer, btr, &br, ftp_instance->file_read);
						CacheDataInvalidateBuff((unsigned int) gp_ftp_buffer, GP_FTP_BUFFER_SIZE);
						ftp_instance->MlsdBuffer->push(gp_ftp_buffer, br);

						if(ftp_instance->bytes_read == 0)
						{
							char buf[4];
							ftp_instance->MlsdBuffer->peep(buf,4);
							if((gp_ftp_buffer[3] != 'S') || (buf[3] != 'S'))
								oxts_stdout("Bad start to file!!! %.2x %.2x", (int)gp_ftp_buffer[3], (int)buf[3]);
							ftp_instance->tx_state = SM_FTP_TRANSFER_IN_LIMBO;
						}

However I now get the following error from lwip a lot and large transfers fail

lwip-1.4.1/src/core/ipv4/ip.c, assert: p->ref == 1

This is the same error I got when disabling the cache only now the performance does no suffer noticeably. Please could somebody who understands the working of the cache's in the AM355x try to shed some light on what is happening here and what I could try to get a workable solution?

I would also be grateful if you could direct me to a good resource to learn about how it works.

Thanks

Sean

0 Sean Bedford over 8 years ago in reply to Sean Bedford

Expert 1560 points

Good evening,

I have pretty much gotten to the bottom of this. It seems to be down to the starter ware mmcsd library requiring pointers for data to be read/written to be cache line aligned because it invalidates and writes back the cache so the data modified by DMA is done correctly. I have not seen this requirement documented anywhere, that would be helpful...

To get around this I plan to add a check that pointers passed for writing/reading are line aligned and if not copying them to a local buffer which will be so that there is no loss of efficiency for writes where this has been considered, and where it hasn't been considered the reads/writes will still work and a warning will be printed to the debug. I am yet to write and test the final part of this but hopefully it works.

I did try rounding down to the start of the cache line the data began in and invalidating an additional line but this caused exceptions and other problems no matter how I tried it any thought on whether this should be possible? For example I tried:

CacheDataCleanBuff(((unsigned int)ptr) & 0xffffffc0, (512 * nblks) + SOC_CACHELINE_SIZE);

I tried to replace the cache invalidating functions with the sys/bios ones and couldn't get them to work either I replaced them as below could somebody from TI perhaps comment on why?

    //CacheDataInvalidateBuff(((unsigned int)ptr), (512 * nblks));
    Cache_inv(ptr , (512 * nblks) + 1, Cache_Type_ALLD, true);

and

    //CacheDataCleanBuff(((unsigned int)ptr), (512 * nblks));
    Cache_wb(ptr , (512 * nblks), Cache_Type_ALLD, true);

Any comments on my finding/understanding are welcome and I would be grateful if somebody could address the remaining two questions above.

Many thanks

Sean

0 ToddMullanix over 8 years ago in reply to Sean Bedford

TI__Guru* 96960 points

Sean Bedford said:
I tried to replace the cache invalidating functions with the sys/bios ones and couldn't get them to work either I replaced them as below could somebody from TI perhaps comment on why?

What did not work? The Cache_inv or the Cache_wd or both?

Please note that the second parameter is a byte count. Do you really mean to do (512 * nblks) + 1 bytes to be invalidated? Note: the cache commands are on a 32-bit word. So your + 1 will be rounded up to 4 in the Cache_inv and Cache_wb calls.

0 Sean Bedford over 8 years ago in reply to ToddMullanix

Expert 1560 points

Hi Todd,

Sorry for taking so long to reply I have been busy with other issues.

Yes I really mean it (or rather you really mean it as this is ti's library code) 512 is a sector size on the ad card and nblks is the number being written or read the +1 was me trying to get around data needing to be cache aligned by rounding down to a cache line and writing an extra line so it should be 64 however I couldn't get that to work.

Not getting the sys/bios functions to work was me tripping over my own debug code so they work and it was just about making sure data passed is cache line aligned. I have now modified the drivers to warn and protect against data being passed incorrectly aligned so the problem all seems sorted.

Like I said it would have saved me a lot of agro if this was documented as it certainly isn't obvious.

Sean

Processors

Processors forum

RTOS/AM3358: Possible cache issue with multi-threading