# Shared RAM Access Considerations on OMAPL1x/C674x/AM1x

#### Contents

[hide]

- 1Memory Access Considertions on OMAPL1x/C674x/AM1x Devices
  - 1.1Non-cacheable Writes/Stores
  - 1.2Reads/Loads/Cache Fill
  - 1.3Shared RAM Access Considerations on OMAPL1x/C674x/AM1x

# Memory Access Considertions on OMAPL1x/C674x/AM1x Devices

#### [edit]

#### Non-cacheable Writes/Stores

[edit]

- "Buffered" or "fire-n-forget"
- Multiple transactions can be in flight at a time.
- Transactions progress through buffers or pipeline elements at each stage of the design
  - In DSP memory system
  - In SCR+ bridges (System Interconnect)
  - In SRAM controller
- CPU doesn't have to wait for first transaction to complete at destination before starting next write
- Latency (at least for sender) is a don't care.



- Message can land at arbitrary later point in time.
  - CPU write throughput is limited by:
  - Throughput of #1 while buffer is empty
  - Throughput of #2 while buffer is full
  - For SRAM writes #1 Throughput == #2 Throughput

#### Reads/Loads/Cache Fill[edit]

- "Pended" or "Blocking"
- New Read command cannot be issued until previous read is complete.
- Latency impacts Throughput



- 1 and 2 represent latency for read command through system interconnnect (SCR+bridges)
- 3 represents read response data back to the initiator
- CPU read throughput is limited by:
  - Latency of #1, plus
  - Latency of #2, plus
  - Latency of #3, plus
  - Number of dataphases
    - LDW = 4 Bytes = 1 phase/cycle
    - LDDW = 8 Bytes = 1 phase/cycle
    - Cache Fill = 128 Bytes = 16 phases/cycles

### Shared RAM Access Considerations on OMAPL1x/C674x/AM1x [edit]

OMAPL1x/C674x/AM1x family of devices have upto 128KB of on chip memory outside the c674x DSP megamodule and ARM9 internal memory.

The following table provides the latency/throughput details for Shared RAM for accesses made by the DSP or ARM on these devices

| Shared RAM Access Considerations |                |                     |                          |                  |                          |  |  |  |
|----------------------------------|----------------|---------------------|--------------------------|------------------|--------------------------|--|--|--|
|                                  | Access<br>Size | C674x DSP           |                          | ARM9             |                          |  |  |  |
|                                  | Bytes          | Latency<br>(cycles) | Throughput (Bytes/Cycle) | Latency (cycles) | Throughput (Bytes/Cycle) |  |  |  |

| Writes | 4   | 16 | 0.25 | 6  | 0.67 |
|--------|-----|----|------|----|------|
|        | 8   | 16 | 0.5  | 6  | 1.33 |
| Reads  | 4   | 32 | 0.13 | 27 | 0.15 |
|        | 8   | 32 | 0.25 | 27 | 0.3  |
|        | 32  |    |      | 31 | 1.03 |
|        | 128 | 48 | 2.67 |    |      |

### C6747 Shared memory performance



Edmund Pirali Intellectual\_495 points

🏅 Community Member

One last question and I may be missing it, but I am not finding what the performance of L3 memory is as used by DSP.

over 13 years ago



Mariana over 13 years ago

TI\_Mastermind\_24340 points

Some rough numbers with cache disabled:

- ~15 CPU cycles for read
- ~19 CPU cycles for write

 $\frac{https://e2e.ti.com/support/processors-group/processors/f/processors-forum/12401/c6747-shared-memory-performance}{memory-performance}$