This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

RM48L952: SafeTI: FLASH_ADDRESS_PARITY_SELF_TEST signals actual group3 error to application in demo project anbd for same reason does not work in "real application either"

Part Number: RM48L952
Other Parts Discussed in Thread: CCSTUDIO

Hi,

I have been trying to ask this via email 23.5, no answers...

Now since some other bugs have been found & fixed (DMA test) I was finally able to even run the 2.3.1 SafeTI IAR RM48 demo application to try to replicate the issue there.

According to this post there should not be any problems in this test anymore
https://e2e.ti.com/support/microcontrollers/hercules/f/312/p/497516/1800406#1800406

I disagree, yes, possible for-ever-loop doe not exists anymore but the test signals real group3 error to application.

After calling that test the code goes first to FIQ handler (it is expected) and then it goes to data_abort handler. In data abort the code goes to this branch which does not mask the error away (guessing that clearing the diagnostics mode prevents the for-ever-loop)

   /* DAbort due to an Flash Wrapper test 2 bit ecc fault inject? */
    else if (0 != (sl_flashWREG->FDIAGCTRL & 0x7)) {
        maskDAbort = FALSE;

        /* Though it's not necessary, turn-off the diag mode */
        sl_flashWREG->FDIAGCTRL &= 0xFFF0FFF8; /* Clear Diagnostics enable key */
        callbkParam1 = sl_flashWREG->FUNCERRADD;
        callbkParam2 = sl_flashWREG->FEDACSTATUS;
        ESM_ApplicationCallback((uint32)(ESM_GRP3_MASK | ESM_G3ERR_FMC_UNCORR), callbkParam1, callbkParam2, callbkParam3);

        }


Since data abort call is not masked via flag (and the Demo application side (esm callback) "stupidly" just clears the group3 ESM bits even tough real error was signaled to it) this "looks to work in demo application" even tough it is not actually working. In case you add even a minor error handling to data-abort handler (while(1)-loop for non-masked aborts the code goes there and stays there.
    if (FALSE == maskDAbort) {

        /* Extract err address & report to application */
        //while( 1 )
        //{
        //};
    }

Need proper instructions how to mask that error out and also Demo application needs to be fixed. Also I think that the Group3 ESM error should be acked inside SafeTI code (sl_selftest.c) (in case this data abort cannot be avoided) like it does for every other group3 error, currently that test clears only group2 error inside sl_selftest.c. Definitely the acking of group3 esm channel for SafeTI test cannot be inside ESM_Application_callback where to the real errors are signaled.

This test looks to work without this diagnostic key clearing (it restores the settings in sl_selftest.c)
sl_flashWREG->FDIAGCTRL &= 0xFFF0FFF8; /* Clear Diagnostics enable key */

Also the code comment in data abort handler refers to ECC 2 bit fault injection test, so this branch shall not be entered in any other test?

Should there be a check in data abort handler that in case SL_FLAG_GET says that this test is active and diagnostics for mode 7 is enabled ( if (7U == (sl_flashWREG->FDIAGCTRL & 0x7U))) --- there looks to be a bug by the way in current 0 != sl_flashWREG->FDIAGCTRL & 0x7U, that branch is entered in case any diag mode is selected...


This test works in my real application at cpu start up phase when FIQ is not enabled but since we catch every non-masked error the code is stopped :). So the root cause must actually be the FIQ enabling which causes data abort which is not handled in demo app nor in SafeTI code (SafeTI code should check whether the FIQ is enabled or not and also require that group3 error is raised and ack it before returning ST_PASS to caller).

Here my real application report when running that test during runtime

===DATA_ABORT===<CR><LF>

  DFSR: 0x1008<CR><LF>

  DFAR: 0x20000000<CR><LF>

  Status: 0x8<CR><LF>

  Read: TRUE<CR><LF>

  AxiDec: FALSE<CR><LF>

====================<CR><LF>

  • Here is code which looks to work also in runtime when FIQ is enabled


    data_abort() checks is this possible flash address parity test "residual" call after FIQ interrupt with PROPER diag mode check
    else if( ((TRUE == SL_FLAG_GET((int32)FLASH_ADDRESS_PARITY_SELF_TEST))) && ((sl_flashWREG->FDIAGCTRL & 0x7U) == 7U) )
    { // NOTE: JSI 14.6.2017: in case FIQ is enabled this test causes also data abort
    maskDAbort = TRUE;
    }
    /* DAbort due to an Flash Wrapper test 2 bit ecc fault inject? */
    else if (0U != (sl_flashWREG->FDIAGCTRL & 0x7U))



    sl_selftest.c "heavily modified" to check if FIQ is enabled or not and take appropriate actions based on it (clear group3 here as the error is caused by SafeTI)

    _SL_Barrier_Data_Access();
    flashread = *(volatile uint32 *)flashBadECC1;

    sl_flashWREG->FPAROVR = regBkupFparOvr; // return content immediately after error, before new flash accesses
    sl_flashWREG->FDIAGCTRL = regBckupFdiagctrl; // same to restore already here

    if(FLASH_ADDRESS_PARITY_FAULT_INJECT != testType)
    {
    extern uint32 _fiq_disabled_(void);
    boolean bGroup3Ok = TRUE;

    // NOTE: data abort is generated only if FIQ is enabled, therefore group3 error may or may not exists
    if( !_fiq_disabled_() )
    {
    if ((F021F_FEDACSTATUS_B1_UNC_ERR == (uint32)(sl_flashWREG->FEDACSTATUS & F021F_FEDACSTATUS_B1_UNC_ERR))
    && (sl_flashWREG->FUNCERRADD == (uint32)0x0u) && (BIT(ESM_G3ERR_FMC_UNCORR) == (sl_esmREG->SR1[2] & BIT(ESM_G3ERR_FMC_UNCORR))))
    {
    }
    else
    {
    bGroup3Ok = FALSE;
    }

    /* Anyways clear flash & ESM status registers */
    sl_flashWREG->FEDACSTATUS = F021F_FEDACSTATUS_B1_UNC_ERR;
    sl_esmREG->SR1[2] = BIT(ESM_G3ERR_FMC_UNCORR);
    // do not read sl_flashWREG->FUNCERRADD, it is read below
    }

    if ( bGroup3Ok )
    {
    if ((F021F_FEDACSTATUS_ADD_PAR_ERR == (uint32)(sl_flashWREG->FEDACSTATUS & F021F_FEDACSTATUS_ADD_PAR_ERR)))
    {
    //sl_flashWREG->FPAROVR = regBkupFparOvr;
    _SL_HoldNClear_nError(); /*Clear nError */
    /* Clear flash & ESM status registers */
    //sl_flashWREG->FEDACSTATUS = F021F_FEDACSTATUS_ADD_PAR_ERR; // twice, also below
    //sl_esmREG->SR1[1] = GET_ESM_BIT_NUM(ESM_G2ERR_FMC_UNCORR); // twice, also below
    //flashread = sl_flashWREG->FUNCERRADD;
    *flash_stResult = ST_PASS;
    }else {
    *flash_stResult = ST_FAIL;
    }
    }
    else
    {
    *flash_stResult = ST_FAIL;
    }

    /*SAFETYMCUSW 134 S MR: 12.2 <APPROVED> Comment_5*/
    /*SAFETYMCUSW 96 S MR: 6.2,10.1,10.2,12.1,12.6 <APPROVED> Comment_25*/

    /* Anyways clear flash & ESM status registers */
    sl_flashWREG->FEDACSTATUS = F021F_FEDACSTATUS_ADD_PAR_ERR;
    sl_esmREG->SR1[1] = GET_ESM_BIT_NUM(ESM_G2ERR_FMC_UNCORR);
    sl_esmREG->SSR2 = GET_ESM_BIT_NUM(ESM_G2ERR_FMC_UNCORR);
    flashread = sl_flashWREG->FUNCERRADD;

    /* Clear the diag mode settings */
    sl_flashWREG->FPAROVR = regBkupFparOvr;
    sl_flashWREG->FDIAGCTRL = regBckupFdiagctrl;
    }


    Also please note that in case FIQs are enabled these type of register clearings are completely useless (and maybe may prevent "simultaneously arriving possible real error"?) since FIQ interrupt handling automatically clear SR[1]bits, only SSR2 bits are left active and needs acknowledging...
    sl_esmREG->SR1[1] = GET_ESM_BIT_NUM(ESM_G2ERR_FMC_UNCORR);


    And yes, diagnostics register backup restore is twice here since didn't want to modify code more than it was necessary...

    This cannot be the purpose of the SafeTI library that integrator needs to access "private functions" SL_FLAG_GET, guess and wonder based on register settings that something maybe be ongoing especially when DEMO APPLICATION is not even handling situation propely (well, if/when demo application stops to DMA test and cannot even continue to these tests what you can expect...).


    Asking again, when FIXED version of SafeTI is coming current 2.3.1 is broken in multiple ways - yes, you can live with that (expect that it is required that you are not allowed to modify it :)) but requires "heavy modification" of TI code and unnecessary spending of time to first figure out the root cause and then repair it by yourself?
  • Hello Jarkko,

    I wanted to send a quick message to let you know that we have spoken to the original SW designed of this test and we have not received an acceptable and conclusive answer on it yet. The designer's initial response was to move the ESM handler to execute from SRAM in order to prevent the exception which I do not believe is an acceptable solution for integration into a real safety application. We are continuing to investigate to determine the best solution to fix it and avoid moving any of the elements to SRAM for execution.
  • Hi,

    Just to clarify: Does that mean that in current form of the code the group3 error + abort is "real and expected" side effect of running this test if FIQ is enabled - not some kind of an coding error from my side?

    At least I can live with that group3 error if that is real deterministic side effect so I do not require running code from RAM to prevent it from happening or anything else more complex stuff - I can easily see that esm3/abort belongs to the test if that behavior is documented and properly handled in original test. Integrator can then "copy the logic" also for fault inject test since that behavior is expected.

    I tried here to ask what these diag modes actually do, since I do not understand why this esm3 comes in this test(still doesn't understand it - but in case that is real side effect maybe I do not need to understand it - just knowing that esm3/abort will be activated for a reason would be enough for me), Eesm3/abort looks to activate when returning from FIQ and next line of code is executed - so basically the C code executed inside FIQ does not cause that behavior. Also wondering why manual 0x5400 write is needed in FIQ since there should be that "auto clear" DIAG_TRIG but that trig-bit is not set at all in the test and still the initial esm2 error is generated even though TRM 5.6.2 says that nothing should happen if DIAG_TRIG is not set... But most likely I do not need to understand details of this either in case this esm3/abort is "confirmed" feature for this test... As long as I need to make rather "heavy" modification by myself I would like know the root cause since typically if you do not understand the root cause the "fix" may be completely wrong even though it looks to work and may lead even more catastrophic failure ...
    https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/603175

    //////// about same test's fault-inject variant
    Here other user has exactly same esm3 problem (there it is in fault-inject variant of the same test - I have encountered the same but since find a way to "fix" the real test the same steps apply to fault-inject variant). Just wondering why there is said that test runs without problems in case there is that esm3 'side effect problem' as you above state or does this test works differently on different CPU variants since at minimal 0x5400 modify is needed to prevent continous FIQ loop and depending on the location of 0x5400 setting manual SR1 ack may also needed just to even get to a state that esm3/abort will pop up since at least in my code the "secondary FIQ" will be interpreted as real fault since "one time consumable FI flag" is not active anymore because it was cleared in 1st round and code ends up in while(1)-loop?
    https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/608591

    Here is my experience of that fault-inject version (basically you get it to work if in sl_esm.c set the same 0x5400 for that test (or do it in application side esm-error handler but then you need to manually ack also pending FIQ from SR[1] away since it has re-raised itself and without that manual ack the code jumps immediately again back to FIQ after it has been once exited) - I'll think that correct location is sl_esm.c for that 0x5400 since this related to test, not real fault handling, if that would be real fault there is no need to write 0x5400 in application side to handle the fault in case code execution is wanted to continue...
    e2e.ti.com/.../2217778
    - after that 0x5400 is set in sl_sem.c then application needs to just expect that esm3 error is popping up and handle it similarly in application side as real test is handled


    ///// nice to know information

    Mean time I have improved data abort masking a bit from code in this post (just trying to prevent that real errors are not accidentally masked out, so basically I have at least 2 "keys" per individual masking determination which needs to match in order to allow masking of the data abort thus making it more robust for single point of failures and also catching possible real errors while making tests. Some flag could be also set in sl_esm.c when writing that 0x5400 value which would be checked here and cleared then away, that way masking would be allowed only once per test but that would require modification of sl_esm.c which may cause more manual work when applying CSP).

        // Originally the content is F021F_FPAROVR_ADD_INV_PAR|FPAROVR_TEST_EN but inside FIQ interrupt the content is switched
        if( (SL_FLAG_GET((int32)FLASH_ADDRESS_PARITY_SELF_TEST)) && DIAG_CHECKS( F021F_FDCTRL_DMODE_TEST_MODE, 0x5400U ) )
        {   // NOTE: JSI 14.6.2017: in case FIQ is enabled this test causes also data abort
            if( _SL_Get_DataFault_Address() == 0x20000000U ) // make some barrier, not just trusting to SL_FLAG_GET_VALUE
            {
                maskDAbort = TRUE;
            }
        }

    And here is my current masking logic for fault-inject variant (inside DIAG_vEsmAppCallback()  the esm3 error is acked away in case the tEsmFiCb.bFlashPar_2-flag has been cleared there)
        // Originally the content is F021F_FPAROVR_ADD_INV_PAR|FPAROVR_TEST_EN but inside FIQ interrupt the content is switched
        if( (SL_FLAG_GET((int32)FLASH_ADDRESS_PARITY_FAULT_INJECT)) && DIAG_CHECKS( F021F_FDCTRL_DMODE_TEST_MODE, 0x5400U ) )
        {   // NOTE: JSI 14.6.2017: in case FIQ is enabled this test causes also data abort
            if( _SL_Get_DataFault_Address() == 0x20000000U ) // make some barrier, not just trusting to SL_FLAG_GET_VALUE
            {
                // here is the logic for masking
                boolean bFlashAddParCb = tEsmFiCb.bFlashPar_2;

                callbkParam1 = sl_flashWREG->FUNCERRADD;
                callbkParam2 = sl_flashWREG->FEDACSTATUS;
                uint32 u32Error = SET_HIWORD_U32( ESM_GROUP_3 ) | SET_LOWORD_U32( ESM_G3ERR_FMC_UNCORR );
                DIAG_vEsmAppCallback(u32Error, callbkParam1, callbkParam2, callbkParam3);

                if( (bFlashAddParCb) && (!tEsmFiCb.bFlashPar_2) )
                {
                    maskDAbort = TRUE;
                }
            }
        }

  • Hi Jarkko,

    Comments embedded in your quoted reply below.

    Jarkko Silvasti said:

    Hi,

    Just to clarify: Does that mean that in current form of the code the group3 error + abort is "real and expected" side effect of running this test if FIQ is enabled - not some kind of an coding error from my side?

    The exceptions and interrupt that are seen, I think are expected based on the current state of the code. What I cannot say is if this means the test is being ran correctly and exceptions handled correctly. We need to determine root cause for the exception to determine if it can be avoided or if we keep it as an expected outcome of the test. In my view, this test is testing Flash Address Parity feature which is ESM G2 Ch4 and no other ESM error or Abort should be triggered, but, there may be a logical reason for it which hasn't been identified yet.

    At least I can live with that group3 error if that is real deterministic side effect so I do not require running code from RAM to prevent it from happening or anything else more complex stuff - I can easily see that esm3/abort belongs to the test if that behavior is documented and properly handled in original test. Integrator can then "copy the logic" also for fault inject test since that behavior is expected.

    I don't see anywhere in the thread where you mention which G3 channel flag is set but I am assuming it is channel 7 indicating an uncorrectable Flash error (2-bit or more error). This may be a result of some shared location in flash that can be used for both tests, but even if this is the case, it shouldn't be necessary since the data read doesn't have to have a data error to check Flash address parity.

    I tried here to ask what these diag modes actually do, since I do not understand why this esm3 comes in this test(still doesn't understand it - but in case that is real side effect maybe I do not need to understand it - just knowing that esm3/abort will be activated for a reason would be enough for me), Eesm3/abort looks to activate when returning from FIQ and next line of code is executed - so basically the C code executed inside FIQ does not cause that behavior. Also wondering why manual 0x5400 write is needed in FIQ since there should be that "auto clear" DIAG_TRIG but that trig-bit is not set at all in the test and still the initial esm2 error is generated even though TRM 5.6.2 says that nothing should happen if DIAG_TRIG is not set... But most likely I do not need to understand details of this either in case this esm3/abort is "confirmed" feature for this test... As long as I need to make rather "heavy" modification by myself I would like know the root cause since typically if you do not understand the root cause the "fix" may be completely wrong even though it looks to work and may lead even more catastrophic failure ...

    https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/603175

    I agree that modification without the proper context/background, may lead to either masking a real problem or somehow defeating the intended use of the test. For certain, that is why we are here to help spread the understanding of the test. Certainly review of the datasheet and ESM error types can help but is not conclusive. These devices are very complex and there is a lot of detail to be learned and a lot of documents to review for full understanding. Even with this, meaning in the documents is sometime lost without practical real life use and examples. I have not yet had a chance to have a look at your linked post in detail, but I am planning to review each to make sure we capture issues with our SafeTI Diag Lib (SDL). So we can include as many fixes as possible in the next release which we are trying to pull in  the schedule as much as possible to meat your project requirement.

    //////// about same test's fault-inject variant
    Here other user has exactly same esm3 problem (there it is in fault-inject variant of the same test - I have encountered the same but since find a way to "fix" the real test the same steps apply to fault-inject variant). Just wondering why there is said that test runs without problems in case there is that esm3 'side effect problem' as you above state or does this test works differently on different CPU variants since at minimal 0x5400 modify is needed to prevent continous FIQ loop and depending on the location of 0x5400 setting manual SR1 ack may also needed just to even get to a state that esm3/abort will pop up since at least in my code the "secondary FIQ" will be interpreted as real fault since "one time consumable FI flag" is not active anymore because it was cleared in 1st round and code ends up in while(1)-loop?
    https://e2e.ti.com/support/microcontrollers/hercules/f/312/t/608591

    Here is my experience of that fault-inject version (basically you get it to work if in sl_esm.c set the same 0x5400 for that test (or do it in application side esm-error handler but then you need to manually ack also pending FIQ from SR[1] away since it has re-raised itself and without that manual ack the code jumps immediately again back to FIQ after it has been once exited) - I'll think that correct location is sl_esm.c for that 0x5400 since this related to test, not real fault handling, if that would be real fault there is no need to write 0x5400 in application side to handle the fault in case code execution is wanted to continue...
    e2e.ti.com/.../2217778
    - after that 0x5400 is set in sl_sem.c then application needs to just expect that esm3 error is popping up and handle it similarly in application side as real test is handled

    I am still a bit confused as to the real purpose of the Fault Injection mode. certainly it can be used to test the error path and handling provided proper system level notifications are in place but my original understanding was this was to be used as a tool for verification in the end users development of their safety system. i.e., it could be used to profile error response. I am not certain every test needs to have a FI test included and perhaps only basic elements can be considered. This, however, is up to system level requirements and we rely on the experience and judgment of the end application integrator about how to use the provided SW.

    ///// nice to know information

    Mean time I have improved data abort masking a bit from code in this post (just trying to prevent that real errors are not accidentally masked out, so basically I have at least 2 "keys" per individual masking determination which needs to match in order to allow masking of the data abort thus making it more robust for single point of failures and also catching possible real errors while making tests. Some flag could be also set in sl_esm.c when writing that 0x5400 value which would be checked here and cleared then away, that way masking would be allowed only once per test but that would require modification of sl_esm.c which may cause more manual work when applying CSP).

        // Originally the content is F021F_FPAROVR_ADD_INV_PAR|FPAROVR_TEST_EN but inside FIQ interrupt the content is switched
        if( (SL_FLAG_GET((int32)FLASH_ADDRESS_PARITY_SELF_TEST)) && DIAG_CHECKS( F021F_FDCTRL_DMODE_TEST_MODE, 0x5400U ) )
        {   // NOTE: JSI 14.6.2017: in case FIQ is enabled this test causes also data abort
            if( _SL_Get_DataFault_Address() == 0x20000000U ) // make some barrier, not just trusting to SL_FLAG_GET_VALUE
            {
                maskDAbort = TRUE;
            }
        }

    And here is my current masking logic for fault-inject variant (inside DIAG_vEsmAppCallback()  the esm3 error is acked away in case the tEsmFiCb.bFlashPar_2-flag has been cleared there)
        // Originally the content is F021F_FPAROVR_ADD_INV_PAR|FPAROVR_TEST_EN but inside FIQ interrupt the content is switched
        if( (SL_FLAG_GET((int32)FLASH_ADDRESS_PARITY_FAULT_INJECT)) && DIAG_CHECKS( F021F_FDCTRL_DMODE_TEST_MODE, 0x5400U ) )
        {   // NOTE: JSI 14.6.2017: in case FIQ is enabled this test causes also data abort
            if( _SL_Get_DataFault_Address() == 0x20000000U ) // make some barrier, not just trusting to SL_FLAG_GET_VALUE
            {
                // here is the logic for masking
                boolean bFlashAddParCb = tEsmFiCb.bFlashPar_2;

                callbkParam1 = sl_flashWREG->FUNCERRADD;
                callbkParam2 = sl_flashWREG->FEDACSTATUS;
                uint32 u32Error = SET_HIWORD_U32( ESM_GROUP_3 ) | SET_LOWORD_U32( ESM_G3ERR_FMC_UNCORR );
                DIAG_vEsmAppCallback(u32Error, callbkParam1, callbkParam2, callbkParam3);

                if( (bFlashAddParCb) && (!tEsmFiCb.bFlashPar_2) )
                {
                    maskDAbort = TRUE;
                }
            }
        }

    In general, I think the exception handlers are included as part of the example/demo project as simple examples. They would not be covered by the CSP since the CSPs are intended for the Library alone. Again, this is my understanding and I need to check this to be certain.
    While on this topic, do you plan to use the full CSP including the LDRA tools? or do you plan to use just the reports/collateral evidence? There is a discussion to offer the reports for free so that customers that use the SDL without changes (or to apply only to those parts not changed by the customer). Or as an alternative, customers could use the code and compile as long as they use the same compiler version and options so that the TI generated reports would still apply.
    Also, to clarify, you are using v2.3.1 of the SafeTI diagnostic library, correct?

  • Hi,

    I was on vacation, that's why the late reply.

    Yes, it raises ESM channel bit together with abort which is what you already guessed, the number 7 'ESM_G3ERR_FMC_UNCORR'.


    Yes, we are currently using SafeTI version 2.3.1.


    Since I have heard that new release is coming, I'll guess that it will reveal then the status of this test? I wouldn't mind in case you can tell here that will it in the future only signal that expected Group2 error or also this group3 error and what is/was the reason for that group3 error, why it comes/came - just to improve my own understanding of the device, as you can see from my abort handler code the offending address is 0x20000000 which is mirrored flash and according to this should contain 1 bit error, #define flashBadECC1 0x20000000u?


    I'll guess that this goes a 'bit' off-topic but trying to answer what you asked/mentioned/wondered:

    I think that those corresponding fault insertion tests for actual SafeTI tests should be run at least once during product development time possibly with condition compile flags in the code in order to test that system is capable of reacting to failures. I chose to run them always during runtime together with normal tests since actually the work needed to do that was quite minimal (and it is much more cost effective than running those manually) when built the test harness that in mind. But as you said, this is integrator's problem to determine when/how to run them, some may not run them at all and it could be still as valid approach as any other :)... I have a habit to do a bit "extra" just to be sure rather than using the almost same time trying to explain why that isn't need to do.


    What comes to the CSP, our original intention was to use this SafeTI library code 'as is' and then use the documents just to minimize our certification related work&time (and possibly run the LDRA tests if needed), but in current form that direct approach looks to be impossible since some (not many) manual changes has been required to be made to SafeTI code so some extra manual work would any case needed. Lets see if hopefully new coming release would change that and code can be used as is. Even better if documentation to do that "certification" is received for free since if understood correctly for 2.3.1 you still have to buy that CSP in order to get documentation even though LDRA would not be used?

    Assuming from your comment that incoming release will not yet contain documentation (for free) and if it then it will state state library setting only for CCStudio, one can compile the library with it and then use that in IAR? Using the un-modified library would of course be the simplest way (and preferred in any case) but let's see if that is possible. This bundled with free documentation would also be cost&time effective and if that would be possible it most likely would be our selection.

    This whole "how to certify" concept is still a bit open/foggy for us since previously we have only used own code, ready certified code and 3rd party code which haven't had any pre-made safety stuff. So this kind of 3rd party code with CSP is new for us and that's why I can't say how we will finally "certify" it, we haven't think that yet that much since to focus has been getting required SafeTI tests to get executed and whole product (ours, not SafeTI) to work since there is nothing to certify before that... Especially foggy is if we need to make even 1 line change to CSP code, what to do then, just use documentation for the rest of the code and for example use review to justify the change and/or make own or LDRA based (if that package is bought) unit test for the changed part. We are also planning to make RM44 based product so in case you can get some synergy with certain certification process then that is most likely the best way to proceed and also have keep in mind that existing product may receive update which takes new SafeTI items (tests) onboard...

    If you have suggestion into this area I am willing to listen though not making money related decisions :).
  • Hello Jarkko,

    Welcome back. I hope that you had a wonderful vacation.

    Jarkko Silvasti said:
    Since I have heard that new release is coming, I'll guess that it will reveal then the status of this test? I wouldn't mind in case you can tell here that will it in the future only signal that expected Group2 error or also this group3 error and what is/was the reason for that group3 error, why it comes/came - just to improve my own understanding of the device, as you can see from my abort handler code the offending address is 0x20000000 which is mirrored flash and according to this should contain 1 bit error, #define flashBadECC1 0x20000000u?

    I can't speak to the specific fix that will be implemented since this is a different group. I can state that your concerns and the fact that it is triggering the incorrect fault and the abort has been documented in the bug report. Whether or not the address 0x2000000 has a single bit or uncorrectable error is dependent on the configuration of Diagnostic Mode 7. It is my understanding that this mode allows for the XOR of a value into the ECC byte in order to create the specified error. If the value chosen in the setup of the XOR is incorrect, it very well may result in more than a single bit error.

    Jarkko Silvasti said:
    I think that those corresponding fault insertion tests for actual SafeTI tests should be run at least once during product development time possibly with condition compile flags in the code in order to test that system is capable of reacting to failures. I chose to run them always during runtime together with normal tests since actually the work needed to do that was quite minimal (and it is much more cost effective than running those manually) when built the test harness that in mind. But as you said, this is integrator's problem to determine when/how to run them, some may not run them at all and it could be still as valid approach as any other :)... I have a habit to do a bit "extra" just to be sure rather than using the almost same time trying to explain why that isn't need to do.

    I agree with this sentiment. I can see the point of view that the Fault insertion ran during the operation of the system can add some value dependent on how long the system up time is. For certain, these insertion tests will validate that the error path is working and the diagnostic fault detection is working so this is perfect from a latent fault diagnostic for certain but may also overlap with some of the additional SW test of function diagnostics.

    Jarkko Silvasti said:
    What comes to the CSP, our original intention was to use this SafeTI library code 'as is' and then use the documents just to minimize our certification related work&time (and possibly run the LDRA tests if needed), but in current form that direct approach looks to be impossible since some (not many) manual changes has been required to be made to SafeTI code so some extra manual work would any case needed. Lets see if hopefully new coming release would change that and code can be used as is. Even better if documentation to do that "certification" is received for free since if understood correctly for 2.3.1 you still have to buy that CSP in order to get documentation even though LDRA would not be used?

    Assuming from your comment that incoming release will not yet contain documentation (for free) and if it then it will state state library setting only for CCStudio, one can compile the library with it and then use that in IAR? Using the un-modified library would of course be the simplest way (and preferred in any case) but let's see if that is possible. This bundled with free documentation would also be cost&time effective and if that would be possible it most likely would be our selection.

    This whole "how to certify" concept is still a bit open/foggy for us since previously we have only used own code, ready certified code and 3rd party code which haven't had any pre-made safety stuff. So this kind of 3rd party code with CSP is new for us and that's why I can't say how we will finally "certify" it, we haven't think that yet that much since to focus has been getting required SafeTI tests to get executed and whole product (ours, not SafeTI) to work since there is nothing to certify before that... Especially foggy is if we need to make even 1 line change to CSP code, what to do then, just use documentation for the rest of the code and for example use review to justify the change and/or make own or LDRA based (if that package is bought) unit test for the changed part. We are also planning to make RM44 based product so in case you can get some synergy with certain certification process then that is most likely the best way to proceed and also have keep in mind that existing product may receive update which takes new SafeTI items (tests) onboard...

    Unfortunately, I am not in the main decision path on the CSP and how these packages are offered and at what costs. I do understand the concept behind them and can only offer that the cost is really associated with the TAU (test automation unit) which includes the test suite and the LDRA limited use license. There has been discussion of releasing the CSP reports/paper documents that document the testing as done by TI with a specific compiler and specific build options. Therefore, if you were to either use the binaries provided or were to compile the source as is without any modifications with the same compiler and options to create the exact same binaries tested and released by TI, the CSP static and dynamic reports together with the TI certified software development process would usually be sufficient for component qualification in a customer's safety system.

    Since, I believe, you are using a different certified compiler than the TI compiler that was qualified and used to build the code tested by TI, the reports would not apply even if you didn't modify the code and you would then need to use the TAU tool to regression test the generated binaries.

    In the end, it is really about compliance to the safety standards. What we have tried to recreate is a compliant process (demonstrated by a certification of the process) together with evidence generated by that process just as our customers would do in their development of their safety system. The gap is often in the details such as we have discussed when bugs are found or the customer's build environment is different from ours. To solve this problem we offer the TAU/LDRA solution. 

  • Chuck Davenport said:

    Since, I believe, you are using a different certified compiler than the TI compiler that was qualified and used to build the code tested by TI, the reports would not apply even if you didn't modify the code and you would then need to use the TAU tool to regression test the generated binaries.


    Just to continue this "off-topic" a bit, please correct me in case I am wrong but I have assumed that it should be possible to compile a library with TI tools and then use that library in IAR? This way we should be able to use the exactly specified compiler version & settings (even though we would need to make a code change(s) say to only 1 function then rest of the unmodified code would comply directly on your testing documents).

  • Hello Jarkko,

    Jarkko Silvasti said:
    Just to continue this "off-topic" a bit, please correct me in case I am wrong but I have assumed that it should be possible to compile a library with TI tools and then use that library in IAR? This way we should be able to use the exactly specified compiler version & settings (even though we would need to make a code change(s) say to only 1 function then rest of the unmodified code would comply directly on your testing documents).

    Absolutely, yes. Also, if you are approaching the project in this way, if you have very limited cases where you have modified the code, you could develop your own tests/plans as you would for any other code you have written for your project. i.e., follow your own software development flow which is safety compliant which could help avoid the expenditure of purchase of the TAU tool from LDRA and our software team.