• Join
  • Sign In with my.TI Login
Texas Instruments
  • Products
  • Applications
  • Tools & Software
  • Support & Community
  • Sample & Buy
  • About TI
Sample & Purchase Cart Sample & Purchase Cart
  • Search
  • Advanced
TI E2E™ Community
  • Support Forums
  • Blogs
  • Groups
  • Videos
  • 简体中文
  • More ...
TI Home » TI E2E Community » Support Forums » Digital Signal Processors (DSP) » OMAP™ Processors » OMAP35x Processors Forum » OMAP DM3730/3530 GPMC Access timing issues
Share
OMAP™ Processors
  • Forums
  • Announcements
Options
  • Subscribe via RSS
Resources
  • OMAP-L1x DSP+ARM9™-based Processors Product Folder
  • OMAP3525/30 DSP+ARM Cortex™-A8-based SOCs Product Folder

  • Top OMAPL Wiki Links
  • OMAPL3x Schematic Review Checklist
  • OMAPL13x Boot resources

  • OMAPL Document Resources
  • OMAPL137 Technical reference manual
  • OMAPL138 Technical reference manual
  • OMAPL Boot loader App Notes
  • OMAP DM3730/3530 GPMC Access timing issues

    OMAP DM3730/3530 GPMC Access timing issues

    This question is not answered
    Angelo Joseph
    Posted by Angelo Joseph
    on Apr 02 2012 15:51 PM
    Expert1140 points

    I am seeing something on the DM3730 GPMC access timing that is not making sense to us at this time.

    We have set up CS3 of the OMAP to access our FPGA and our timing has been set up to be roughly 100ns (i.e. time when CS is low)  for a single 16-bit read access from the FPGA. CS3 is set up as a multiplexed 16 bit NOR Flash interface in our design.

    When the processor issues a 32-bit read from CS3 we see two back to back transactions with a 12ns delay between the two transactions which is correct since that is the delay that we set up in our GPMC timing. But when we issue a new 32 bit read from CS3 we see a delay of around 150-160ns and then we see the two back to back reads as expected. We are trying to determine why we see the 150-160ns delay.

    Following is the simple sequence of instructions that our code executes

    1 LDR r_n, [FPGA]

    2 STR r_n, [r_s]

    3 LDR r_n, [FPGA + 4]

    4 STR r_n, [r_s + 4]

    5 LDR r_n, [FPGA + 8]

    6 STR r_n, [r_s + 8]

     

    In the above sequence Instruction 1 issues two back to back 16 bit reads as expected.

    But between Instruction 1 and Instruction 3 we see around 150-160ns delay.... obviously as one can see above there is only 1 store (STR) instruuction between the two reads and this shouldnt be taking 150-160ns. My guess is that this store is also happening to the internal data cache which should be very fast.

    Is there some other setting outside the GPMC that can influence this?

    Report Abuse
    • Reply
    You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    All Replies
    • Brad Griffis
      Posted by Brad Griffis
      on Jun 06 2012 14:53 PM
      Guru57350 points

      Angelo,

      This is expected behavior. Typically a FPGA connected to the GPMC would be configured as non-cacheable on the ARM side.  From the sound of your transactions that appears to be the case here, i.e. your 32-bit read results in two 16-bit accesses.  If cache was enabled for that memory range then you would see an entire cache line fetch.  This ultimately comes down to the way that a CPU behaves and to the memory architecture as a whole.  Specifically, when you issue a "load" instruction the CPU will stall until that data has actually landed in the CPU register.  When you consider that this data request passes through the ARM cache controllers, ARM AXI controller, L3 Interconnect, and GPMC and then the data itself has to traverse back through that entire path, that ends up being a significant amount of time.  It is only after that data has landed in the CPU register that the CPU stops stalling and moves on to the next instruction.  This is why you see a big gap between each of these accesses.

      Here are my recommendations:

      • Create some "write only registers" in your FPGA.
      • Instead of having a single register such as DATA you should have three registers: DATA, DATA_SET, DATA_CLEAR.
      • Currently if you wanted to set bit 3 for example you would do something like DATA |= 0x0008;  This would in turn produce a read-modify-write set of instructions, which as you have noticed is "expensive" over the GPMC.
      • Instead of doing the operations in step b, you would instead simply do DATA_SET = 0x0008 which would have the affect of setting that one bit.
      • Similarly you would currently clear a bit by doing something such as DATA &= ~0x0008.  In the new paradigm you would simply do DATA_CLEAR=0x0008.
      • Finally, make sure that the mapping of this address range from the ARM side is non-cacheable, but BUFFERABLE.  This will be key in terms of making the writes "fire and forget".  In other words, when you do an operation like DATA_CLEAR=8 it would result in a single cycle instruction and the CPU could continue executing while the data propagates out to the FPGA.  If you configure the ARM as strongly ordered, you will still stall even for the write to complete, which is to make sure that everything in the system occurs with very precise order.
      • For larger transfers, use DMA instead of the CPU to read/write the data from the FPGA.  The DMA would not stall like the CPU between reads.

      Best regards,
      Brad

      ---------------------------------------------------------------------------------------------------------

      Please click the Verify Answer button on this post if it answers your question.
      --------------------------------------------------------------------------------------------------------- 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Angelo Joseph
      Posted by Angelo Joseph
      on Jun 06 2012 15:31 PM
      Expert1140 points

      Brad,

      Thanks for the response.

      In my application I dont have too many Read-Modify-Writes.

      I have a whole bunch of periodic reads and my writes are important but not that time critical.  In my example above you see reads from contiguous FPGA addresses.

      I did do a couple of things, I changed my reads to be 64 bit reads (128 bit reads were converted into two 64-bit reads of four 16 bit accesses each by the GPMC, I'm guessing this is due to the L3 Bus interface width of 64 bits) and this helped it a little bit. If I issue 128 bit reads there is a gap of 150-160ns between the two 64 bit transfers.

      I do set up my FPGA space as Strongly ordered. I did do a simple trick of setting up 2 chip selects on the FPGA and since writes followed by reads do not happen in my app on those heavily read registers I put my read registers into a separate CS and optmized the GPMC timing for that CS. Since my FPGA needed a slower write. But the huge gap between the 64 bit transfers still bothers me.

      In my situation do you think chaging it to non-cacheable but bufferable will make a difference?

      Also will DMA help for reads?

      Let me know

      Thanks

      Angelo

      Angelo

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    • Brad Griffis
      Posted by Brad Griffis
      on Jun 06 2012 16:45 PM
      Guru57350 points

      Angelo Joseph
      I did do a couple of things, I changed my reads to be 64 bit reads (128 bit reads were converted into two 64-bit reads of four 16 bit accesses each by the GPMC, I'm guessing this is due to the L3 Bus interface width of 64 bits) and this helped it a little bit. If I issue 128 bit reads there is a gap of 150-160ns between the two 64 bit transfers.

      The larger data type will reduce the total number of accesses.  As you have noticed it still cannot prevent a large gap that you see between reads of non-cacheable data.  The only way to avoid such a gap is to use DMA.

      Angelo Joseph
      In my situation do you think chaging it to non-cacheable but bufferable will make a difference?

      To be clear, marking the space as "bufferable" will vastly improve your write performance, but it will have no effect on read performance.  From the GPMC perspective you should see writes occurring back-to-back.  From a CPU perspective you will not stall the CPU while waiting for writes to complete.  You would also not need to use large data types to get the best performance, as contiguous writes would be merged in the buffer.

      Angelo Joseph
      Also will DMA help for reads?

      The only way to "eliminate the gap" is to use DMA.  Of course there will be overhead for setting up a DMA transfer, so if you're only reading one or two elements then it's probably not worthwhile.  If, however, you're reading a substantial block of data then I expect a night and day difference between non-cacheable CPU reads and a DMA transfer.  You'll want to copy a block of FPGA data to a cacheable location in your DDR memory, i.e. you would make a copy of your FPGA registers.  You'll need to watch out for cache coherence issues.  Specifically, after you've DMA'd a copy of your registers to DDR you will first need to invalidate (i.e. throw away) whatever is in the cache and THEN perform your access so that the CPU is getting "fresh" data from the DDR.

      ---------------------------------------------------------------------------------------------------------

      Please click the Verify Answer button on this post if it answers your question.
      --------------------------------------------------------------------------------------------------------- 

      Report Abuse
      • Reply
      You have posted to a forum that requires a moderator to approve posts before they are publicly available.
    TI E2E™ Community
    • Support Forums
    • Blogs
    • Videos
    • Groups
    • Site Support & Feedback
    • Settings
    TI E2E™ Community Groups
    • TI University Program
    • Make the Switch
    • Microcontroller Projects
    • Motor Drive & Control
    Other Communities
    • Deyisupport
    • Designsomething.org
    • beagleboard.org
    • TI on Element 14
    • TI on TechXchangeSM
    Other Technical & Support Resources
    • WEBENCH® Design Center
    • Product Information Centers
    • Technical Documents
    • TI Design Network
    • TI Technical Articles
    • TI Training

    All content and materials on this site are provided "as is". TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with regard to these materials, including but not limited to all implied warranties and conditions of merchantability, fitness for a particular purpose, title and non-infringement of any third party intellectual property right. TI and its respective suppliers and providers of content make no representations about the suitability of these materials for any purpose and disclaim all warranties and conditions with respect to these materials. No license, either express or implied, by estoppel or otherwise, is granted by TI. Use of the information on this site may require a license from a third party, or a license from TI.

    Content on this site may contain or be subject to specific guidelines or limitations on use. All postings and use of the content on this site are subject to the Terms of Use of the site; third parties using this content agree to abide by any limitations or guidelines and to comply with the Terms of Use of this site. TI, its suppliers and providers of content reserve the right to make corrections, deletions, modifications, enhancements, improvements and other changes to the content and materials, its products, programs and services at any time or to move or discontinue any content, products, programs, or services without notice.

    Follow Us Texas Instruments on Facebook Texas Instruments on Twitter Texas Instruments on LinkedIn Texas Instruments on Google+
    TI Worldwide | Contact Us | my.TI Login | Site Map | Corporate Citizenship | mobile m.ti.com (Mobile Version)

    TI is a global semiconductor design and manufacturing company. Innovate with 100,000+ analog ICs and
    embedded processors, along with software, tools and the industry’s largest sales/support staff.

    © Copyright 1995-2013 Texas Instruments Incorporated. All rights reserved.
    Trademarks | Privacy Policy | Terms of Use