This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

b0tcm

Other Parts Discussed in Thread: HALCOGEN, RM48L952

Hello,

I havre acreated a botcm check and b1tcm check, according to spna106a.pdf.

When correcting 1 bit errors, everything goes smoothly, but when I execute a 2 bit error check something that shouldn't happen happens. I know it is supposed to generate a data abort interruption and it does so effitiently,

However when I check ESMSR3 register for group 3 error for b0tcm and b1tcm, there are none.

My data abort handler:

	.text
	.arm

	.ref	dataAbort
	.def 	_asm_dataEntry

	.asmfunc

_asm_dataEntry

	stmed	sp!, {r0 - r12, lr}
	bl		dataAbort
	ldmfd	sp!, {r0 - r12, lr}
	cps		#0x13
	subs		pc, lr, #0x4

	.endasmfunc

My dataAbort function:

void dataAbort() {
	if(ESMReg->ESMSR3 & 0x8U) {		/*Uncorrectable error was detected in B0TCM*/
		if ((TCRAM1Reg->RAMCTRL & 0x100U)) {
			TCRAM1Reg->RAMERRSTATUS = 0x20U;
			ESMReg->ESMSR3 = 0x8U;
			ESMReg->ESMEKR = 0x5U;
		} else {
			while(1);
		}
	} else if (ESMReg->ESMSR3 & 0x20U){	/*Uncorrectable error was detected in B1TCM*/
		if (!(TCRAM2Reg->RAMCTRL &0x1U)) {
			TCRAM2Reg->RAMERRSTATUS = 0x20U;
			ESMReg->ESMSR3 = 0x20U;
			ESMReg->ESMEKR = 0x5U;
		} else {
			while(1);
		}
	} else {
		while(1);
	}
}
  • Hello Pablo,

    The ECC check ocurrs within the CPU and will result in an abort when identified as you have seen. However, the ESM error notification comes from the Flash Wrapper so you must also enable ECC within the flash wrapper in orer for the ESM error to be flagged.

  • Hello Pablo,

      Make sure you enable the CPU's event bus by calling the function _coreEnableEventBusExport_() so the uncorrectable event can get to the ESM module.

    regards,

    Charles

  • Hello,

    Sorry it took me a long time to respond. I am certain I have enabled ECC within the flash wrapper.

    Also the thing is that I can see that other flags have been set in the ESM register, but this one hasn't been set, which I thought was strange.

    Let me check again and see if I haven't disabled ECC.

  • Yes, I have already enabled the event bus export

  • Hello,

    I have checked FEDACTRL1 and its value is 0x000A060A AND RAMCTRL = 0x0005010A. It just isn't signaling the ESM module and I have no idea why.

  • But I know that it is signaling the ESM module. When i Checked the 1 bit error for the ECC i noticed it was signaling the ESM module correctly.

  • Hello Pablo,

      Do you see bit 5 (The DERR bit) of the RAMERRSTAUS register set?

    regards,

    Charles

  • Hello Charles,

    No. It's value is 0x0.

  • Hello Pablo,

      Looks like the RAM memory controller did not see a double bit ECC error event. I assume you have enabled both the event bus from the CPU as well as enabling ECC detection in the RAM wrapper as you had indicated in your prior reply. There is one circumstance where the RAM memory controller will ignore the double bit ECC error event from the CPU. This is when the ECC is detected by accessing the ECC space. This is starting at 4MB offset address from 0x0800_0000. Can you please tell what the CPU captures in the DFAR (data fault address register) and also the DFSR (data fault status register)? Could you tell from which address you try to read that causes you to take the abort?

    regards,

    Charles

  • DFAR = 0x08000010 (CP15_DATA_FAULT_ADDRESS in CCS)

    DFSR = 0x00000409 (CP15_DATA_FAULT_STATUS in CCS)

    If it's necessary: CP15_AUX_DATA_FAULT_STATUS    0x00800000    Core    

    My export event is:

    	.def	_asm_enableCPUeventBus
    	.asmfunc
    
    _asm_enableCPUeventBus
            stmfd sp!, {r0}
            mrc   p15, #0x0, r0,         c9, c12, #0x0
            orr   r0,  r0,    #0x10
            mcr   p15, #0x0, r0,         c9, c12, #0x0
            ldmfd sp!, {r0}
            bx    lr
    
    	.endasmfunc

    My code to enable ECC RAM:

    	.def	_asm_enableSECDEDtoCPUram
    	.asmfunc
    
    _asm_enableSECDEDtoCPUram
    	mrc p15, #0, r0, c1, c0, #1
    	orr r0, r0, #0x0C000000
    	mcr p15, #0, r0, c1, c0, #1
    
    	bx lr
    
    	.endasmfunc
    




  • Hello Pablo,

       The status does indicate an fault at 0x0800_0010. At the moment I'm not certain as to why it is not exporting the ECC error event to to the RAM memory controller. I believe you have enabled the X bit to export the event bus. Can we just make sure it is actually set. Can you tell if bit4 of the PMNC register in the CPU is set?

      Can you also show the code sequence how you generated the double bit error? Thanks.

    regards,

    Charles

  • CP15_PERFORMANCE_MONITOR_CONTROL = 0x41141810

    void bnetmECC(){
    	volatile unsigned int TccramAreaB0;
        volatile unsigned int TccramAreaB1;
    	volatile unsigned int ReadAddrErrorB0;
        volatile unsigned int ReadAddrErrorB1;
        /*TCRAM module does not have a test mode, so all values, when checking must restore their previous status*/
        volatile unsigned long int tcramB01Bit = (*(volatile unsigned long int *)0x08000000);   /*Supports only 64 bit read. Even address*/
        volatile unsigned long int tcramB11Bit = (*(volatile unsigned long int *)0x08000008);   /*Supports only 64 bit read. Odd address*/
        volatile unsigned long int tcramB02Bit = (*(volatile unsigned long int *)0x08000010);   /*Supports only 64 bit read. Even address*/
        volatile unsigned long int tcramB12Bit = (*(volatile unsigned long int *)0x08000018);   /*Supports only 64 bit read. Odd address*/
    
    	TCRAM1Reg->RAMCTRL = 0x0005010AU; /*Enables writes to the ECCRAM*/
    	TCRAM2Reg->RAMCTRL = 0x0005010AU;
    
    
    	TCRAM1Reg->RAMTHRESHOLD = 0x1;	/*1 bit error will cause a response*/
    	TCRAM2Reg->RAMTHRESHOLD = 0x1;
    
    	TCRAM1Reg->RAMINTCTRL = 0x1;	/*Enables error to be reported to ESM*/
    	TCRAM2Reg->RAMINTCTRL = 0x1;
    
    	(*(unsigned int *)(0x08400000)) ^= 0x1; /*Causes an ECC 1 bit error*/
        (*(unsigned int *)(0x08400008)) ^= 0x1; /*Causes an ECC 1 bit error*/
    
    	TCRAM1Reg->RAMCTRL = 0x0005000AU; /*Disables writes to the ECCRAM*/
    	TCRAM2Reg->RAMCTRL = 0x0005000AU;
    
    
    	TccramAreaB0 = (*(volatile unsigned long int *)0x08000000); /*Read the data with 1 bit error.*/
        TccramAreaB1 = (*(volatile unsigned long int *)0x08000008);
    
    	if (!((TCRAM1Reg->RAMERRSTATUS & 1) || (TCRAM2Reg->RAMERRSTATUS & 1))) {	/*If an error was not detected*/
    		class2Error();	/*class 2 error*/
    	} else {
    		TCRAM1Reg->RAMERRSTATUS = 0x1;	/*Clears error flag*/
    		TCRAM2Reg->RAMERRSTATUS = 0x1;
    		ESMReg->ESMSR1 = 0x14000000;	/*Clears bits 26 and 28 of the group1 ESM. */
    	}
    
    	/*This is done in order to  generate a 2-bit error*/
    	TCRAM1Reg->RAMCTRL = 0x0005010AU; /*Enables writes to the ECCRAM*/
    	TCRAM2Reg->RAMCTRL = 0x0005010AU;
    
    	(*(unsigned int *)(0x08400010)) ^= 0x3;    /*Causes an ECC error. 2-bit error*/
        (*(unsigned int *)(0x08400018)) ^= 0x3; /*Causes an ECC error. 2-bit error*/
    
    	ReadAddrErrorB0 = (*(unsigned int *)0x08000010); /*Read the data with 2 bit error and generates a data abort*/
        ReadAddrErrorB1 = (*(unsigned int *)0x08000018); /*Read the data with 2 bit error and generates a data abort*/
    
        if (!((TCRAM1Reg->RAMUERRADDR & 0xFFFFFFFFU) || (TCRAM2Reg->RAMUERRADDR & 0xFFFFFFFFU))) {    /*If an error was not detected*/
            class2Error();  /*class 2 error*/
        } else {
            ReadAddrErrorB0 = TCRAM1Reg->RAMUERRADDR;
            ReadAddrErrorB1 = TCRAM2Reg->RAMUERRADDR;
            ESMReg->ESMSR2 = 0x140U;
        }
    
        (*(volatile unsigned long int *)0x08000000) = tcramB01Bit;
        (*(volatile unsigned long int *)0x08000008) = tcramB11Bit;
        (*(volatile unsigned long int *)0x08000010) = tcramB02Bit;
        (*(volatile unsigned long int *)0x08000018) = tcramB12Bit;
    }

    The thing is that I can catch group 2 and group 1 errors. But for some reason it is not catching the group 3 errors, even though it should. I'll try to go step by step again to see if I missed anything.

  • I found a typo. I was writing to the auxiliary register instead of reading from it first. Fixed the typo and the new values are:

    GRP( Cp15 ).REG( CP15_PERFORMANCE_MONITOR_CONTROL )  = 0x41141810    

    GRP( Cp15 ).REG( CP15_AUXILIARY_CONTROL ) = 0x0E000027    

    Still not signaling the ESM.



  • Another weird thing is that I can clearly see that there is an ESM3 error signaled when that happens (group 3 channel 1 - efuse).

    After I saw that I modified my stuck-at-zero check in order to clear the ESM3 and put 0x5 in the ESMKR register. I thought maybe i was that. After that the problem remained

  • Sorry to keep bothering you, but I really have no idea what's going on. Everything is set accordingly, but still no response from ESMSR3, which is weird, because when I check the efuse I can see that it signals the ESMSR3 register.

    Any ideas?

  • Hello Pablo,

      I just copied your code in the main() and single step the code. I did try to comment out some portion of the code to simplify it. For exmaple, i commented out the portion where you check if RAMUERRADDR is not zeros and call the class2Error() function It works for me. I see the DERR bit set in the RAM wrapper and also the bit3 of the ESMSTAT3 get set. The RAMUERRADDR shows 0x10 which is the offset address from 0x0800_0000. Upon hitting the double bit error the CPU takes the data abort.

      I will try to rerun it without anything comments.

    regards,

    Charles

  • Hello Pablo,

    1122.RAM double bit error.zip

      Could you try the attached project. The code sequence to test single and double bit ECC error is pretty much yours with minor modifcation. But I simply embed the test code into the main(). The data abort handler is the standard one from HalCoGen. The handler will clear the RAM double bit error and ESM group 3 error.

      In your code i see the below sequence after the double bit error is generated. I have two comments here. You try to check if the UERR error address is not zeros. The issue I see is that in a real application the error address register could have captured 0x0 if the error occurs at 0x0800_0000. Since you are writing a test code to check out the behavior of the ECC maybe this is ok. I will also suggest that you handle this in the data abort routine. The second comment is that you are trying to clear bit6 and bit8 in the ESM group 2. Please note that these two bits are related to address decode error and not ECC double bit error. Is this what you want to check? Inside the RAM memory controller there are two address decode logic working in parallel. Both decode logic will decode the address coming from the CPU. They are checking against each on their decode outputs (i.e. the memory select signals to the memory banks). If they don't match then the address decode fail is detected. Unless the silicon truly has a stuck-at fault or sort, you will not see this error. You will need to put the RAM memory controller into test mode to generate address decode failure.

     

        if (!((TCRAM1Reg->RAMUERRADDR & 0xFFFFFFFFU) || (TCRAM2Reg->RAMUERRADDR & 0xFFFFFFFFU))) {    /*If an error was not detected*/
            class2Error();  /*class 2 error*/
        } else {
            ReadAddrErrorB0 = TCRAM1Reg->RAMUERRADDR;
            ReadAddrErrorB1 = TCRAM2Reg->RAMUERRADDR;
            ESMReg->ESMSR2 = 0x140U;
        }

     

  • Hello,

    I have tested your application in main and it works. The problem is that mine is failing to report to the ESM module during boot.

    As I told you, I did some more tests, and I can see that when I do the stuck-at-zero check during efuse self check that it signals the ESM module (ESMSR3 = 0x2). When I perform the test for single bit errors on TCRAM I can see that it signals ESMSR1, and when I try to check the same for double bit errors it doesn't signal ESMSR3.

    I really don't know what might be wrong, so I am going over my code once again to check any mistakes, or if I didn't set other necessary registers. If you have any idea as to why it signals some times and others it doesn't, please let me know. I've been breaking my head for the past 2-3 days on this problem.

  • Hello Pablo,

      I will suggest that you isolate the problem by first not performing the efuse check. I don't know if running first the efuse check has anything to do with your current problem. I will try from my side too.  Another thing to check is if you have any code that clears out the RAMERRSTATUS and ESMSR3 for double bit error unintentionally. Setting the watchpoints in CCS on the address for these registers will help you see if these registers got cleared out as you intended.  Can you also check before you clear out the ESM SR3 for the efuse check, are there other bits getting set too (i.e. B0 and B1TCM ECC uncorrectable error)?

    regards,

    Charles

  • Hello Pablo,

      I added some efuse stuck-at zero test. I'm able to see ESM bit 2 of SR3 and bit9 of the SR1 set due to efuse test. Once I clear them and continue onto the RAM ECC test and do not see problems setting the DERR bit and the status bit in the ESM module.

    regards,

    Charles

  • About the efuse check: It was just to inform you that it is signaling the ESMSR3, but because of some unknown reason it is not signaling when I perform the b0tcm check and b1tcm check.

    I'll try putting a watchpoint, but I really doubt it is clearing the register somewhere else.

    PS: This is all during boot time, and I am executing the instructions according to spna106a.pdf

  • Hello Pablo,

      If you don't mind sharing your project, I can take a look tomorrow. I will also check with our spna106a owner to see if he has any insights.

    Regards,

    Charles

  • Hello Pablo,

      I never asked which device part you are using. Can you tell me the part#? Also do you use HalCoGen? HalCoGen's startup code mimics the sppna106a for various checks.

    Regards,

    Charles

  • Part# RM48L952

    PS: Some parts are still a work in progress. They depend on lots of definitions for the project and we do not intend on using HALcogen, because of some restraints in this project. I've been reading and rereading the technical reference guide and the datasheet in order to be able to do what is specified in spna106a.

    PS: I still need to fix many TODOs and finishing some necessary implementations.

    When you receive it, please let me know so I can delete it. I removed some important parts, but still, maybe it's better to remove it.

    View:http://e2e.ti.com/cfs-file.ashx/__key/communityserver-discussions-components-files/312/7457.Boot.zip]

  • Hello Pablo,

      Can you please upload the attachment again? I'm having problem downloading the attachment.

    regards,

    Charles

  • http://e2e.ti.com/cfs-file.ashx/__key/communityserver-discussions-components-files/312/7457.Boot.zip

    I just insert the link in the bar and I can download it without problems. I must have deleted a [ (I think it's called bracket) when I sent you the file. I was in a hurry. 

    Now I'm going home for recess and I'll be back the first week of january. 

  • Pablo,

    Can you tell me on which device and hardware board you are running these test?

  • RM48L952

    Hitex's functional development kit

  • Hello Pablo,

      I added the below branch to itself statement highlighted in red  in your data abort handler. I can see that the DERR (bit 5 of RAMERRSTAUS) and bit 3 of the ESMSR3 are set when the data abort handler is called. I do a "move line" to the next line to see how the code sequence will follow.  Your code will clear these two bits and then continue to the while (1) highlighted in orange. Can you try the same in your setup to see if you can see these two bits are set?

    void dataAbort() {

    asm(" b #-8");

    if(ESMReg->ESMSR3 & 0x8U) { /*Uncorrectable error was detected in B0TCM*/

    if ((TCRAM1Reg->RAMCTRL & 0x100U)) {

    TCRAM1Reg->RAMERRSTATUS = 0x20U;

    ESMReg->ESMSR3 = 0x8U;

    ESMReg->ESMEKR = 0x5U;

    } else {

    while(1);

    }

    } else if (ESMReg->ESMSR3 & 0x20U){ /*Uncorrectable error was detected in B1TCM*/

    if (!(TCRAM2Reg->RAMCTRL & 0x1U)) {

    TCRAM2Reg->RAMERRSTATUS = 0x20U;

    ESMReg->ESMSR3 = 0x20U;

    ESMReg->ESMEKR = 0x5U;

    } else {

    while(1);

    }

    } else {

    while(1);

    }

    }

  • At the moment I am almost leaving for home for the holidays. I'll be back only in January. 

    When I get back this is the first thing I will try and I will let you knowimmediately about the result

  • Heelo, I'm back.

    I did everything you did, I added the branch instruction and verified the registers.

    RAMERRSTATUS = 0x0

    ESMSR3 = 0x80

    How is it possible that it worked for you but not for me? I'm using CCS 5.5.0

  • Hello Pablo,

      You said you observed 0x80 on ESMSR3.  0x80 means the channel 7 of ESM group 3 is set. This is a flash uncorrectable error. Can you please confirm this is indeed what you see? I don't expect to see flash error since you are testing B0 and B1TCM which are RAM.

       The thing we need to resolve is why the RAMERRSTATUS/ESMSR3 is not set properly in your environment while they work properly in my setup. I'm actually using a very similar device as yours but operate in big endianism.  

     Can you put a breakpoint at the below line in your bnetmECC() function?

    ReadAddrErrorB0 = (*(unsigned int *)0x08000010); /* should be line 755 in the file */

    After I put the breakpoint at the above line I do single stepping. Immediately I will see the abort. I just want to make sure that when you take the abort it is due to RAM double bit error but not somthing else as you had just shown since the ESMSR3=0x80 is rather indicating a flash error instead.

    regards,

    Charles

     

  • Ok I'll do that, but it may take me some time. For some reason now the line:

    if (SYSReg1->SYSESR & 0x8000U) {

    causes a prefetch exception (MVN #27 instruction). When that happens something goes horribly wrong and the disassembly view shows  that something modified the code (almost every single instruction changes to BLT #0x...).

    Everything was working properly when I left for christmas break, but today I found this huge problem. Maybe the data I gave you in my reply is wrong, it's very possible that it didn't even acess this part of the code.

    PS: When you talk about a breakpoint, I assume you mean  hardware breakpoint, which would stop the code at that point, no matter if it is an interruption or something else. This works ok, but when the uC resets due to the STC self check, it completely ignores all hardware breakpoints.

  • Hi,

      Right before you exeute "if (SYSReg1->SYSESR & 0x8000U) " you enabled the ECC for the ATCM memory which is flash. Did you reflash your code after Christmas? Is it possible that you didn't have ECC in the flash and hence getting a prefetch abort soon as you enable the ATCM ECC? You can check the IFSR to see if you have an ECC error.

      Yes, the breakpoint I was talking about is a hardware breakpoint.

    regards,

    Charles

  • Sorry it took me such a longe time to answer. I didn't touch my code after I left. 

    Found out it was a problem on my computer. Tried it in 3 different computers, all of them worked except mine. I will format it, and then give it a try again tomorrow. 

    This was a really bizarre problem, never happened before. 

  • Finally I was able to do what you suggested (adding asm(" b #-8")).

    And I can see that it goes into the interruption correctly

    RAMERRSTATUS = 0x20

    ESMSR3 = 0x8

    I can see that it clears everything correctly, but I found the problem. When it goes back into the asm function that called it, it doesn't go to the right place:

    stmed	sp!, {r0 - r12, lr}
    bl		dataAbort
    ldmfd	sp!, {r0 - r12, lr}
    cps		#0x13
    subs		pc, lr, #0x4

    When it comes back from dataAbort, it runs it again, which was giving me the impression that the bits where not being set correctly.

    PS: I think I found a bug on CCS 5.5.0.00077. That weird error that was rewriting the flash, and generating a prefetch abort in a MVN instruction was because of the version. Only explanation I could come up with. Tried it in 2 different PCs and it gave me the same error. Tried version 5.4 on the same PCs and it worked.

  • Hello Pablo,

      I'm glad that you are able to see the same behavior as I see in my setup and able to resolve the problem. I will suggest that you open a new ticket for the CCS issue for better tracking.

    regards,

    Charles

  • At the end of dataAbort() the compiler adds this code "SUBS PC, LR, #0x8", which makes it run the exception again.

    Is there a way that I can force the generated code there to be "BX LR"? I think I can't do that, so .. I'll have to write it in assembly? Anyway, I'll start translating it to assembly now, if you read this and find another way to fix this problem, please let me know.

  • Pablo,

    The return instruction from a data abort is intentionally done this way. A data abort is a serious exception and the abort handler is expected to correct the cause of the abort and then make the processor execute the instruction again, so that this time the access works correctly (without another abort).

    You should use the instruction "SUBS PC, LR, #0x4" only for the case when you are causing a data abort intentionally. See the file dabort.asm generated by HALCoGen for an example.

    Regards, Sunil

  • Yes, I know, but here's the problem:

    In my vectors table I have this:

    "_asm_dataEntry"

    which calls an asm entry, which calls a C function called dataAbort. When this C function is called, the return instruction SUBS PC, LR, #0x8, which makes the C function run again, but this time with all the bits cleared, so It goes into an infinite loop. This was giving me the impression that the bits were not being set.

    I wanted to be able to fix this with the C code, but I wasn't able to , so I rewrote the C function in assembly. Then it worked properly

  • If you know how to fix this problem without resorting to assembly I'd appreciate that.

  • Pablo,

    There is a way to "cheat" the compiler so that it generates a return instruction that would take you to the instruction after the one that caused the data abort. However, this would be the return instruction then for every case of a data abort, and you may not want this for a safety application.

    You can change the pragma for the data abort handler and indicate it to be a "prefetch abort" handler instead of a data abort handler. This will change the return instruction to be "SUBS PC, LR, #0x4".

    Regards, Sunil

  • Hello Sunil,

    Thanks a lot for this workaround, but I don't think it suits the development requirements for the team.

    So apparently I'm kinda stuck with assembly here. It isn't ideal, but if it gets the problem solved, then I'm ok with it.

    Thank you very much for your help.