This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TMS320C6678: C66x Cache L1D Miss Pipelining is missing from documentation

Part Number: TMS320C6678


Hi,

in Table 3-17 in TMS320C66x CorePac User Guide (sprugw0c, section 3.5.4), there are no numbers, which I think is an error in the documentation.

Can someone provide these numbers?

Thanks,

Roman

  • Roman,

    I assume these were left blank until performance testing was performed.  However, that should have been done years ago.  We will research that but the data might not be available.

    Tom

  • Hi, Roman,

    As Tom indicated that the table is noted as TBD. We have memory performance measurement in other document. Please see Table 3 C66x DSP Memory Read Performance in Section 3.1 of Throughput Performance Guide for Keystone-II Devices (SPRABK5B)

    http://www.ti.com/lit/an/sprabk5b/sprabk5b.pdf

    Rex

  • Thanks for your answer.

    So you're saying is that this table will never be filled with valid numbers?

    The document you're linking actually covers Keystone-II devices, while the C6678 is a Keystone-I device, isn't it?

    Please also see my other post at

  • Hi, Roman,

    We are still tracking down what's the reason the table wasn't filled at the time of publishing. Was it covered by other documents or a to-do list but got left out. Once we have the answer, I'll post back here.

    Rex

  • Hi, Roman,

    I just want to update you with our investigation. There are 2 tables, 2-13 and 3-17, in C66x CorePac User's Guide are marked as TBD. 

    Table 2-13 L1P Miss Pipelining Performance (Average Number of Stalls per Execute Packet) (TBD) - On this table, we don't have that particular "3 wait states, 8x128-bit banks" configuration numbers. However, in the “DSP Cache User Guide http://www.ti.com/lit/ug/sprugy8/sprugy8.pdf “, we have this table:

    We'll update the C66x CorePac Table 2-13 with this table and use these values instead. Also, since C66 has SPLOOPs, so for most loops, the instructions are prefetched into the SPLOOP buffers, hence  the L1P miss should even be more minimized during software pipeline codes.

    For Table 3-17 L1D Performance Summary (TBD) - Again, the DSP cache guide  (sprugy8.pdf) does not have the above particular configuration, but it has this:

    We'll also update C66x CorePac User's Guide with this table and use these values instead.

    Our internal Jira will be created and documentation update will be scheduled accordingly.

    Rex