This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

xdc.useModule("ti.sysbios.family.c64p.MemoryProtect"); causes abort operation on 2nd load of program

Other Parts Discussed in Thread: SYSBIOS

I'm trying to implement a memory protection scheme.  So far, its worked in finding two issues, but I noticed when I tried to reload the image in the debugger after running it the first time, I'll go straight to abort.  The load appears to work, then there is a quick "jumping to entry point" message I can read if I'm lucky.   I tried with and without a gel file, with the same results -- an exception occurs.  

 I have tried using the Exception module with memory protection enabled and without.  It always boils down to this line in the bios configuration:

var memProtect = xdc.useModule("ti.sysbios.family.c64p.MemoryProtect");

If its removed from my build, I can reload the image over and over. 

****************************************************************************************

1/11/2016 update:  I discovered by looking at the ROV that the BIOS in "Scan for Errors" is reporting no errors in the first run.  In the 2nd run, it states that loggerbuf, record #1 has "Unknown File".  I presume that "Unknown file" error is related to the exception that is generated.

The details in my setup. 

 Host OS: Windows 7, 64-bit.  CCS Version is Version: 5.1.0.09000. 

Board rev number: can be found, if needed

SysBios Version: 06.33.5.46

XDC Tools Version: 3.23.2.47

Processor Mode: Big Endian

Executable Output: Elf-format

My image is contained in DDR2.  The program is loaded there.  L1P  and D are defined in the linker command file, and are in the symbol map file.  My processor is 64x+ (C6457).  My DDR2 memory is 256 MB.  My L2 is set up to be 1 Meg cache, 1 meg RAM.   I'm using spec. dig jtag, over LAN.   As I stated, I've tried with a gel file and without a gel file.

example attached (outputs big endian, coff format).  It fails the same, per explained.  (no gel file provided in example)

0412.willsTest.zip

Thanks for any help,

Bill

  • Bill,

    If I understand correctly, the second load completes successfully since it then jumps to the entry point. Is that correct?

    Can you explain please what exception occurs? The Exception Handler would give you a report if it were initialized fully, but at the least you would be able to look at registers to determine the exception values that indicate what has gone wrong. Or is it a different type of exception?

    What does it mean to be a first or second load of an image? Is that first or second after a power cycle, after a hard reset, after a code change, or what exactly?

    My early guess is that the first load/execute sets the protection such that something in either the re-load or the next execution is prevented by the protection settings and you cannot move forward. I suspect it is either 1) a setting that is wrong (doubtful), 2) a setting that is done later in the execution that prevents something done normally early in the execution (likely), or 3) a simple requirement that reset is required after the settings you are using (maybe the same as 2).

    Regards,
    RandyP
  • Randy, 

         It "appears" the CSR register is written to if I believe the NRP = 0xe084bed0 (the next PC to execute after exception).  CSR is the control state register associated with cache control operations.  Just jumping ahead to see if they are related.., I can check on the memprotection lock registers for L1/L2 in the debug Registers Window.  They appear to be set intermittently.  


    I added a breakpoint to the TimerSupport_enable() function where the exception appears to be happening;  The program goes into abort upon setting the pwrSaveLock[1] value.

    /* enable timer 0 and 1 */

    pwrSaveLock[1] = mdctl0 | PWRSAVE_ENABLE_TIMER0AND1;

    When an exception occurs and when one doesn't, the value of mdctl0 is the same. 

     

    Note: The strange thing I discovered is reloading the program again will make it work. It appears that every 2nd try I load to try to run, it fails. 

     Any thoughts?

    thanks,

     

    Bill

     

    Block of code that appears to be part of NRP

              ti_sysbios_family_c64p_tci6482_TimerSupport_enable__E:
    e084bea0:   028403E2            MVC.S2        CSR,B5
    ...
    e084bec8:   02102276            STW.D1T2      B4,*+A4[1]
    70            Hwi_restore(key);
    e084becc:   009403A2            MVC.S2        B5,CSR
    e084bed0:   00000000            NOP          
    71        }

  • Bill,

    The C64x/C64x+ CPU & Instruction Set Reference Guide SPRU732 explains the CPU Exceptions in section 6. Table 6-1 lists 9 registers that would be needed to understand what the status of the exception is. Please capture the hex values of all of these and post those. The Exception Handler very conveniently does this and some more when it works, but since you are not getting that dump it will have to be done manually.

    One thing that could be useful to debug this would be to turn off the automatic Run To main(). I think it is available from right-clicking on your project name in CCS to get Properties, then find the Debug parameters on the left-hand pane. One of the items will be something like Automatically run to <symbol>, and you can turn it off. With that off, you can do the second load and it will just set the PC to the entry point and run no further. If you get there, then you know the load was okay and the problem occurs during execution. Then you can use a series of breakpoints to gradually go through the code to find the failure point. This might be a difficult path to follow, but a possible one.

    Many times, the GEL file will be set to do a GEL_reset(); in OnPreFileLoaded(), along with commands to re-initialize the EMIF for the external memory, plus whatever else needs to be re-initialized after a reset. This might be worth a try as a work-around/solution.

    When you have run for a while after the first load, try a Restart instead of re-loading the .out file. This will send the PC back to the entry point and run to main() if that is enabled. This could give some visibility into the effect of the actual load operation vs. the execution of the code.

    Regards,
    RandyP
  • RandyP,

    I was trying to state in my response above what I think you suggested.  Maybe you read my first, longer and more confusing, edit of it, which I changed 3-4 times before getting my final edit of the reply.  Thank you for wading through my rambling.

    I used the symbol map file to get the cint_00 entry point address, then created a hardware breakpoint for that so the 2nd load I could stop and step through.  Eventually it gets to  TimerSupport.c:: TimerSupport_enable() function, and attempts to execute Line 68 before going into the ti_sysbios_family_c64p_Hwi1 disassembly. 

    This is the line

    pwrSaveLock[1] = mdctl0 | PWRSAVE_ENABLE_TIMER0AND1;

    Here's the disassembly for the instructions for the NRP value below. 

    68            pwrSaveLock[1] = mdctl0 | PWRSAVE_ENABLE_TIMER0AND1;
    e084bec4:   020C9FFA            OR.L2X        B4,A3,B4
    e084bec8:   02102276            STW.D1T2      B4,*+A4[1]

    I don't know why this is bad.  Its a TI function, and I don't have any code in my example that I specifically configure the timer specifically that I know about.   Any thoughts on a possibility of our code or understanding of the MPC that would cause this?  I want to make sure we understand the problem before saying our software, in a release configuration, would be ready for testing.  Would the Memory Protection settings in the BIOS be desirable for released software?

         The exception registers are

    TSR 0x00000404 Core Register: Task state register 
    ITSR 0x0000000E Core Register: Interrupt task state register 
    NTSR 0x0001000C Core Register: Non maskable TSR snapshot 
    EFR 0x40000000 Core Register: Exception Flag Register 
    ECR 0x40000000 Core Register: Exception clear register 

    IER 0x00004003 Core Register: Interrupt enable Register 
    ISTP 0xE084E800 Core Register: Interrupt service table pointer 
    NRP 0xE084BEC8 Core Register: Non maskable interrupt 
    ERP 0xE084BEC8 Core Register: Exception return pointer 

    I wasn't able to find the address for the REP (Restricted entry point register).  If we need it to debug this issue, can you provide the physical address for the REP on the C64x+? 

    For the Gel file,  the final solution could be to insert Reset of the cache/Memory protection controller, as you stated above.   Its probably a good thing to do regardless of the Memory Protection being enabled in the bios configuration file.

    Thank you for your time. 

    Bill

  • Bill,

    Some of the registers may not be memory-mapped but have to be accessed by the MVC instruction. The IERR register is one that would have some helpful information.

    You may be able to find these using the Registers Window in CCS.

    Because of the processor's pipeline, it is likely that the exception occurred due to an instruction that executed earlier. If it is related to memory protection, then there could be additional latency if that hardware takes a few cycles to get a flag back to the CPU.

    In your GEL file are several automatic functions that get called by CCS. There is Startup() which only occurs when you first bringup the .ccxml file for your emulator, and then there are several OnXxxx() functions that are called when something else happens. For example, OnReset() is called after a hardware or CCS-driven reset occurs, OnRestart() after a CCS Restart is clicked, and OnPreFileLoaded() and OnFileLoaded() are the other common ones.

    GEL functions are not deterministic and they do not have to complete before the next GEL command is started. This can lead to some race conditions and might explain why you have a different result every other time when you re-load your program.

    One way to debug these automatic functions is to take one out by inserting an 'x' in front of the function name, like xOnPreFileLoaded(). Then you can manually call that function from within CCS or even do the individual steps manually, and see if you get more consistent behavior.

    It could be worthwhile to always do a CCS CPU Reset before each load and see if everything is cleared out so the loads always work.

    If you have a reset solution and are finished with this thread, please click Verify Answer under the post that has the best description of your solution.

    Regards,
    RandyP
  • RandyP,

        Are you able to duplicate my problem on a C64+ processor?  The attached example doesn't point to a valid Gel file, but from my testing one isn't needed; the issue occurs either way.  I think this may help me verify an answer, if I could understand if its an issue that should happen for everyone.

    To answer the post you responded in; 

        The gel file is 'turned off' to minimize the items I'm tracking.  When I 'turn it back on' (point to valid file), the same issue is still there.  I believe we can rule out any GEL file timing issues, as the error will occur in either case.  I'm using a bootloader to initialize the system without the gel file.   CPU power cycling in between still demonstrates the issue on the 2nd load. 

    Before the 2nd load, I performed the CCS CPU Reset per your suggestion.  That doesn't prevent the exception from occurring.

    I added the clear registers to my gel file on the L1P, L1D, and L2 Memory Protection registers as a test, where I'm setting each to 1, but that doesn't seem to make the exception go away.  If I load the image and hit my breakpoint, then clear the registers it works.  So, the fault appears after loading and running to the TimerSupport, TimerSupport_enable() (line 68) function. 

    The exception registers I could get were posted above from the register window.   IERR is 0x0s. 

    IER 0x00000003 Core Register: Interrupt enable Register 
    ISTP 0xE084E800 Core Register: Interrupt service table pointer 
    IRP 0x00000000 Core Register: Interrupt return pointer 
    NRP 0xE08495CC Core Register: Non maskable interrupt 
    ERP 0xE08495CC Core Register: Exception return pointer 

    NTSR 0x0001000C Core Register: Non maskable TSR snapshot 
    ETSR 0x0001000C Core Register: Exception TSR snapshot 
    EFR 0x40000000 Core Register: Exception Flag Register 
    ECR 0x40000000 Core Register: Exception clear register 
    IERR 0x00000000 Core Register: Internal exception cause register 

         

    Gel file addition

    #define MEMPROT_L2 0X0184A008

    #define MEMPROT_L1P 0X0184A408

    #define MEMPROT_L1D 0X0184AC08

    code below added to

    OnRestart and  Global_Default_Setup_Silent

    GEL_TextOut( "***clearing the fault registers... \n" );

    *memProt = 0x1;

    memProt = ( unsigned int* )MEMPROT_L1P;

    *memProt = 0x1;

    memProt = ( unsigned int* )MEMPROT_L1D;

    *memProt = 0x1;

  • William Martin said:
    EFR 0x40000000 Core Register: Exception Flag Register 
    ECR 0x40000000 Core Register: Exception clear register 
    IERR 0x00000000 Core Register: Internal exception cause register

    The EFR shows an external exception, in which case I only know to look at the L2/L1P/L1D fault registers, whose addresses are:
        L2MPFAR - 0x0184A000
        L2MPFSR - 0x0184A004

        L1PMPFAR - 0x0184A400
        L1PMPFSR - 0x0184A404

        L1DMPFAR - 0x0184AC00
        L1DMPFSR - 0x0184AC04

    The MPFAR (Fault Address Register) will have the faulting address and the MPFSR (Fault Status Register) will contain the fault status, which reflects MPPA bits (Permission Attributes) that were lacking in the particular MPPA that covers the faulting address.

    I can see from the source code for TimerSupport_enable() that the pwrSaveLock pointer is set to 0x02AC0004.  Your situation does indeed look like it's faulting in there.  There are two writes performed on the pwrSaveLock registers:

        /* unlock Powersave hardware */
        pwrSaveLock[0] = PWRSAVE_LOCKVAL;

        /* enable timer 0 and 1 */
        pwrSaveLock[1] = mdctl0 | PWRSAVE_ENABLE_TIMER0AND1;

    Due to the nature of external exceptions being delayed it could be either one of these writes.  Perhaps the unlocking operation (1st write) does not successfully happen and the exception is due to attempting to write to a locked PWRSAVE component.  I don't know anything about that HW so I can't really comment there.

    So, do you see any non-zero values in any MPFAR?

    I would expect that a bad write to the PWRSAVE address of 0x02AC0004 would show up in the L2MPFAR/L2MPFSR.  Is L2MPFAR non-zero?  If so, is the address 0x02AC0004 or 0x02AC0008?

    Did you answer Randy's question about what constitutes a "2nd load"?  I didn't pick that up in my review of this thread.  Perhaps you can answer again here - when you say "1st Load", what does 1st mean?  1st Load after what, exactly?  I assume you do something to make it the 1st load and then don't do that something before the 2nd load.

    Regards,

    - Rob