This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

C6678 cache problem

hi:

I am using cache, and I have some questions:

 L1D 32K cache, L2 32K cache

I want to know how the Coherence betwen L1D and L2 cache,

1 when use CACHE_wbInvAllL2, will it effect the L1D cache?

2 if use CACHE_wbInvAllL1d, will it update the L2 cache? what will the L2 cache do?

I have saw the cache and corepac document, but still not clear.  anyone can help me ? Thanks.

 

Best regards

Si

 

  • I am not sure that if use CACHE_wbInvAllL1d(),  wether the dirty data in L1D all write back to physical memory Or Need  I do  CACHE_wbInvAllL2()??

    Thanks

  • Hi Si Chang,

    There is coherence between L2 and L1, as L1 cache usually is smaller than L2 than there could be that L2 cache includes more data/program than L1 cache,

    Thanks,

    HR

  • Si Cheng,

    The C66x DSP CorePac User Guide explains the cache operation and the answers to your questions, if I understand your questions correctly. Please refer especially to sprugw0 sections 3.3.5 (L1D cache coherence operations) and 4.3.6 (L2 cache coherence operations). Two tables from those, for block coherence operations, are shown below:

    If you have additional questions, please continue the thread.

    Regards,
    RandyP

  • hi,RandyP:

    Thanks for your reply.

    I  have read the table above.

    1.for table 3-5, if L1D writeback, L1D effect updated data written back to L2/external, I am not clear that  its say "back to L2/external",  here L2 mean JUST L2 SRAM OR L2 SRAM and L2 CACHE?

    2.for table 4-7, if L2 writeback,  will the dirty data in L1D cache be updated to L2 cache and the  then  from L2 cache back to actual physical memory if it is cached in L2 cache?? AND if it is not  cached in L2 cache, it will directly update to L2 RAM/external ??

     

    In my project,  I set L2 cache 32K, core 1 process the data witch is larger than the L1D AND L2 cache size , when it process over, I not sure witch cache  operations

    need I do?  when data size larger than L1D AND L2 cache size, I think I need use global cache operations , is it right?

     

    best regards,

    Si

  • Si,

    The Replacement and Allocation sections of the CorePac User Guide explains about how reads and writes cause allocation or eviction and what happens with some of these items. The Coherence sections discuss what happens with the two caches when evictions occur.

    When you set L2 cache to the same size or smaller than L1D, I am not sure that L2 cache will give you much benefit. L1D cache allocation does not have to be a pure subset of L2 cache allocation; L1D is not inclusive in L2. But if your algorithm is such that everything that is cached in L2 cache is also cached in L1D cache, then L2 cache may not be helping with your performance. The only way to determine this would be to do benchmarking of your algorithm with L2 cache set to 0KB and to 32KB for comparison. I would be interested to hear that result, if you choose to try it.

    Cache block operations will always work correctly even if the region of memory is larger than the L1D and L2 cache sizes. It is possible that the global operations will writeback or invalidate memory contents other than that data in your large array, and that could be different from what you want. But the global operation could easily be faster than the block operation in this case, since the block operation would have to test every tag in the cache before operating on it, while the global block operation just operates on every tag in the cache.

    Regards,
    RandyP

  • RandyP,

    Thanks very much for your advice.

    I have a try, and find that set L2 cache size = 32K has the same performace with setting L2 cache size = 0; But when I set L2 cache = 32k, my algorithm don't run correctly.

    I have read  the CorePac User Guide, 4.3.8.3 Policy Relative to L1D Victims says:

    “L1D victim writebacks do not trigger line allocations in L2. L1D victims are written directly to external memory if they miss L2.

    L1D victim writebacks also do not update L2’s LRU if they hit in L2. They do update L2’s dirty status as needed.” Here I  am not clear it says"

    They do update L2’s dirty status as needed". It's just update L2’s dirty status,does it update the dirty data in L2 cache?

     

    You say "cache block operations will always work correctly even if the region of memory is larger than the L1D and L2 cache sizes "

    I know  L1D Invalidate Word Count Register(L1DIWC) has a bit filed: L1DIWC,16bit, its Description:Word count for block invalidation,it means the size range is

    0~65536 word??

    if my data is 0x100000Bytes, then the 16bit filed L1DIWC = 0,because 0x100000 low 16bit is zero. So I use the cache operation has no effect!

     

    Regards,

    Si

  • Si Cheng,

    si cheng said:
    "They do update L2’s dirty status as needed". It's just update L2’s dirty status,does it update the dirty data in L2 cache?

    "writeback" means data is being written back. The details provided are to help you understand the additional details of the operation.

    si cheng said:

    I know  L1D Invalidate Word Count Register(L1DIWC) has a bit filed: L1DIWC,16bit, its Description:Word count for block invalidation,it means the size range is

    0~65536 word??

    A 16-bit field can contain the range of values from 0 to 65535, not 65536. The range of word count is 0-65535, or 0x0000-0xFFFF.

    si cheng said:
    if my data is 0x100000Bytes, then the 16bit filed L1DIWC = 0,because 0x100000 low 16bit is zero. So I use the cache operation has no effect!

    You must use a word count, not a byte count for the L1DIWC field. There are 4 bytes in a word.

    If your word count is greater than 65535, then you will need to use two or more block commands.

    Regards,
    RandyP

  • Regards,

    I am not clear that you say "writeback" means data is being written back if hit in L2 cache, Is the data  being wirtten back to L2 cache and then written back to external??

    I just want to know YES or NO.

    Regards,

    Si

  • Si,

    When the document says L2/external, this means L2 SRAM or external RAM.

    si cheng said:
    I am not clear that you say "writeback" means data is being written back if hit in L2 cache, Is the data  being written back to L2 cache and then written back to external??

    If your question is specific to Section 4.3.8.3 Policy Relative to L1D Victims, which is what is quoted most recently, then the answer is "no". An L1D victim will only go to the L2 cache if it finds a hit in L2 cache.

    This answer specifically applies to L1D victim traffic, which is different than most of your questions in this thread, which had to do with Block Coherence commands.

    If you set L2 cache to 0, then it will be appropriate to use L1D coherence commands. If you set L2 cache to >0KB, then it will be appropriate to use L2 coherence commands, which will also handle all necessary L1D coherence operations.

    If you set L2 cache to 32KB, you may not get any performance benefit from the L2 cache because of the interactions with L1D and L1P. I recommend either using more L2 cache or set L2 cache to 0KB. Please use testing and benchmarking to determine the effect on your performance, which may be very different than my recommendation.

    Regards,
    RandyP