This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

SYS/BIOS and cache

Other Parts Discussed in Thread: SYSBIOS

Hello experts,

My c6745 application is expanding, as they tend to do, so that it soon will not fit to L2 RAM and I need to generate a plan B. In my system I have:

-256M of SDRAM in EMIFB

-32G of FLASH (NANDflash with PATA- inteface) in EMIFA

-I2C boot from EEPROM

So it would be quite easy to write a secondary bootloader booting from I2C-EEPROM, fetching application from FLASH and putting it to SDRAM - no problems, just work. L1 RAMs can be tuned to be caches, L2 as a cache and heap/stack. My concern is the cache handling, I don't have resources to study caching in details and I would be happy to use some optimized handler like SYS/BIOS to do the dirty work, most likely in much more efficient way than my own "cachier" would do. 

So I would like to ask your opinion: should I transfer my C++ code to run under SYS/BIOS or try to optimize cache myself. Time-critical functions of the application would fit to L1/L2 but I still would like to let the OS to handle them by caching.

BR, Risto

  • HI Risto,

    Thanks for your post.

    transferring c++ code under BIOS environment would be more complex, rather you shall try optimizing cache would be a bright idea.

    Please try using C674x DSP cache user's guide to optimize the cache performance as below:

    http://www.ti.com/lit/ug/sprug82a/sprug82a.pdf

    In the above doc., please see section 2 to know more on the usage of cache in configuring L1 & L2 caches as well cacheability. Also, see section 3 to optimize the cache performance.

    Alternatively, you shall also try reading the below E2E thread, because, you also have privilege to control the external memory addresses are cacheable or non-cacheable through MAR bits. The details on memory attribute registers are documented in TMS320C674x DSP Megamodule Reference Guide (SPRUFK5). please check it !

    http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/t/311620.aspx

    I hope it helps!!

    Thanks & regards,
    Sivaraj K

    ------------------------------------------------------------------------------------------------------- 
    Please click the Verify Answer button on this post if it answers your question.
    --------------------------------------------------------------------------------------------------------
  • Hi Sivaraj,

    I would like to continue this discussion if you don't mind.

    I've been, after our last discussion,  creating the code with C++ and included also C++ -98 library functions like <deque> <array> <string> etc. It became time to initialize stack, which had so far been defined in GEL for L1PMODE_32K only. I'm mainly using "C6748_StarterWare_1_20_04_01" for low level interfacing and initiated stack with: 

    CacheEnableMAR (0xC0000000, 0x10000000);
    CacheWBInvAll();
    CacheInvL1pAll();
    CacheEnable(L1PCFG_L1PMODE_32K | L1DCFG_L1DMODE_32K | L2CFG_L2MODE_64K);

    Where 0xC0000000 is the address of the external SDRAM, where sections .text and .sysmem are located in.  stack_size=32767 --heap_size=0xc00000 which are more than enough for this reduced code I'm testing with. So the code and data are in SDRAM!!

    In my test code I have two parts, first using <deque> of <string> filling queue in one loop and emptying in  another. In the other part I just copy static 2D matrices of long to each others, sizes up to [500][500].

    Observations:

    1. When cache for the external SDRAM is off (CacheEnableMAR ), both test parts run OK - loops forever

    2. When cache for the external SDRAM is on , the second part is OK but the first kills the process totally after few loop rounds. 

    Even thou I can not see any connection between dynamic or static data and cache and crashing I would like to ask if you have any suggestions for further testing or restrictions to follow in coding? I was suspicious about SDRAM interface parameters but I guess the access is similar from the interface point of view is one using cache or not?

    BR, Risto

  • Hi,

    I did some additional testing and found out that the system unstability with cache was not dependent on dynamic or static allocation but time between accesses: if I make a fastest possible loop accessing SDRAM behind the cache, system will stuck; when adding some time between accesses, it will stay stable - not related if <deque> or static matrix is used. With cache the access time will be about 1/3. Below there is an example code which kneels c6745 & external SDRAM with cache??!! m2 is on SDRAM, code on L2, MAT=500

    for(i=0;i<MAT;i++)
         for(j=0;j<MAT;j++)
             m2[i][j] += 666;

    Risto

  • Hi Risto,

    Thanks for your update.

    The issue which you have explored is very usual and as expected since it all depends on the common stall conditions which was well explained in section 3.1.1 in the DSP cache user guide as below:

    http://www.ti.com/lit/ug/sprug82a/sprug82a.pdf

    Basically, in order to fetch data from external memory first, the number of stall cycles depends on the particular device and the type of external memory. Also, it depends on access priorities which are governed by the bandwidth management settings, please refer section 6.2.1 in the c674x reference guide as below:

    http://www.ti.com/lit/ug/sprufk5a/sprufk5a.pdf

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------
    Please click the Verify Answer button on this post if it answers your question.
    -------------------------------------------------------------------------------------------------------
     

     

     

  • Thank you Sivaraj,

    Makes sense. When SYS/BIOS is used heap can be in two portions as I've understood, can this be realized without SYS/BIOS with reasonable effort?

    risto

  • Hi Risto,

    Thanks for your update.

    To know more on Sysbios and to find answers to all advanced questions about Sysbios, there are even training videos, workshops, you can see the below to get started quickly:

    http://processors.wiki.ti.com/index.php/Category:SYSBIOS#Get_started_quickly_and_learn_as_you_go

    For questions about memory placement with SYS/BIOS applications, please check below:

    http://processors.wiki.ti.com/index.php/SYS/BIOS_FAQs#PlacingCodeAnchor

     For questions about debugging SYS/BIOS applications, please see:

    http://processors.wiki.ti.com/index.php/SYS/BIOS_FAQs#CoreDumpAnchor

    Thanks & regards,

    Sivaraj K

    -------------------------------------------------------------------------------------------------------
    Please click the Verify Answer button on this post if it answers your question.
    -------------------------------------------------------------------------------------------------------
  • Risto,

    The standard C Run-Time Support library implements a heap for you to use through malloc/free.

    SYS/BIOS extends this to allow you to have multiple heaps in different regions of your memory map.

    If you need to use feature provided by SYS/BIOS, then use SYS/BIOS. That is why it exists, to make your job easier.

    You do not have to use every feature of SYS/BIOS, and you will not incur any memory size loss for features you choose not to include with your SYS/BIOS build. The training material Sivaraj pointed to will help you understand how to use SYS/BIOS quickly.

    Regards,
    RandyP

  • Randy, Sivaraj,

    Thank you both. The original question was about transferring or c++ code to SYS/BIOS to which your reply was that it may be complex. Therefore we skipped SYS/BIOS - maybe more because I'm not a great OS fan in embedded application - and started to think something else. STL looked good but it has some disadvantages like poor allocator which would influence a lot to the performance of <stdque> and <vector> classes which would be nice to use. Also the use of cache seemed to be a bit tricky as can be seen in earlier messages - so what to do??

    We will write a custom allocator not using heap with which we can put the containers where we want to. The "fast" and "slow" code can be forwarded to right sections (in L2 or SDRAM) with pragmas or by using dynamic or static objects and global or local variables etc..  etc..   - something like this. 

    BR, Risto