AM5718: NVRAM remoteproc segment load to DSP

Part Number: AM5718

Hi folks,

We have successfully configured the access from the DSP of an AM5718 to a 2MB NVRAM connected through the GPMC. We load the DSP binary from Linux with remoteproc, and the DSP binary is able to read and print the values of certain positions of the NVRAM assigning its starting address to an uint16_t pointer (no variables/arrays declared). This means, as far as we understand, that the resource table including the NVRAM (as devmem) is properly configured. Last entry of /sys/kernel/debug/remoteproc/remoteproc2/resource_table:

Entry 20 is of type devmem
  Device Address 0x15000000
  Physical Address 0x1000000
  Length 0x200000 Bytes
  Flags 0x0
  Reserved (should be zero) [0]
  Name DSP_NVRAM

However, when we try to locate a variable/array into the NVRAM section, remoteproc fails showing a "bad phdr da 0x15000000 mem 0x40000" trace. Address 0x15000000 comes from the address of L3_MAIN map (0x14000000. See table 2-10. DSP Memory Map of the TRM) + 0x01000000 offset of the NVRAM within the GPMC (which is located in address 0x0 of L3 as stated in table 2-1. L3_MAIN Memory map).

Looking into the dts, we see that only l1pram, l1dram and l2ram are defined for the DSP, so we added the info for the nvram:

&dsp1 {
    reg = <0x40800000 0x48000>,
          <0x40e00000 0x8000>,
          <0x40f00000 0x8000>,
          <0x15000000 0x200000>;
    reg-names = "l2ram", "l1pram", "l1dram", "nvram";
};

It resulted in the same error. Then, digging into de omap_remoteproc driver (kernel 4.19.94) we realized that omap_rproc_of_get_internal_memories function performs an address "translation" from addresses of L3_MAIN to addresses of the DSP:

DSP1_L2_SRAM [0x4080 0000] in L3_MAIN memory map --> DSP_L2 [0x0080 0000] in DSP memory map.
DSP1_L1P_SRAM [0x40E0 0000] in L3_MAIN memory map --> DSP_L1P [0x00E0 0000] in DSP memory map.
DSP1_L1D_SRAM [0x40F0 0000] in L3_MAIN memory map --> DSP_L1D [0x00F0 0000] in DSP memory map.

Even if the implementation it's a little trickier set of operations (to support several Omaps, we guess), the result is quite like removing a 0x40000000 offset to translate from L3_MAIN address to DSP address. But we can't use the very same translation operations to get from 0x15000000 (NVRAM address for the DSP) to 0x01000000 (real address of the NVRAM within GPMC). We would need to modify the remoteproc driver in order to support our NVRAM and this makes us wonder whether it's really the proper way or if it exists a much more suitable way to do it.

We would be very grateful if anyone can tell us what we should modify to get the nvram working fully and be able to locate variables/arrays into the NVRAM section. Just DTS? omap remoteproc driver too? anywhere else? how? Should we give up trying to locate any kind of variables into a devmem and just address it through pointers?

Are we right setting NVRAM as devmem or should we change to carveout? Application Report SPRAC60 (AM57x Processor SDK Linux®: Customizing Multicore. Applications to Run on New Platforms) mentions CMA (for carveouts) is aimed to DSP and IPU application code as well as to IPC buffers, but we see no further difference with CMEM, and we are not very clear when to use CMA or CMEM. DDR goes to CMA as carveout, so does OCMC_1 too and so on. Any light on this?

Thank you very much in advance for your support

over 3 years ago

0 Suman Anna over 3 years ago

TI__Guru* 94565 points

Hi Roberto,

Thanks for the detailed question. We haven't used NVRAM with remoteproc like this before, but let's see if we can get this working for you. What SDK are you using?

The DSP is behind an MMU (there are two MMUs actually within the DSP subsystem, one for processor and one for DMA). So, any address you need to access from the DSP-side needs to be programmed in the MMU.

The reg property in the DSP node is only reserved for the DSP internal memories, and not external peripherals or DDR. So, you cannot add the GPMC address space to the reg property. DSP processor has its own fixed addresses to view these memories (they are at the same level as the DSP MMU, and don't go through MMU). The linker map for DSP will always use the DSP addresses, and the OMAP remoteproc driver has to translate these DSP addresses into equivalent kernel-mapped virtual addresses for these for supporting the loading into these memories.

The DEVMEM entry in the resource table is the key.

Device Address => DSP view of the space (akin to DSP processor virtual address). You use this address in your linker cmd files, and to assign Cache attributes etc.

Physical Address => Actual bus address. This is the address on the interconnect bus.

You seem to have some translation setup, DEVMEM is the right entry for mapping peripherals/peripheral memories. CARVEOUT entry is used to allocate DDR for you from the remoteproc device's CMA/DMA pool.

I am not sure if raw memory access can be used for the NVRAM space or if all accesses need to go through a corresponding GPMC Controller. Are you able to directly write into this memory from Linux console using devmem2 or omapconf or rwmem (/dev/mem access)? You mentioned 2 MB NVRAM, but your physical address is a 16 MB address. OMAP remoteproc driver is independent of GPMC setup, so is the setup done already? If this works, then we will need a small fixup in either of remoteproc core of OMAP remoteproc's da_to_va callback to perform the address translation.

The bad phdr da trace is generated when the rproc_da_to_va() cannot find the kernel virtual address to perform the loading. rproc_pa_to_da() looks through both the mappings and carveouts, but rproc_da_to_va() only looks through carveouts and DSP internal memories since most peripheral addresses are not actually RAM addresses. The OMAP remoteproc driver doesn't support loading into OCM RAMs as well (these are typically controlled by SRAM driver, but these are proper memories and doesn't require any additional setup).

regards

Suman

0 Ro over 3 years ago in reply to Suman Anna

Prodigy 60 points

Hi Suman,

Thank you very much for your fast answer! Glad to see we were not so disoriented. Then, if we want remoteproc to load a segment including a variable with some content to the NVRAM we will need to include the kernel virtual address resolution into the remoteproc driver at rproc_da_to_va level.

I think we can say that our resource table with the devmem entry is right, as we can read the NVRAM through a uint16_t pointer to 0x15000000. Regarding your comment about the size, the NVRAM size is 2MByte (0x00200000, 16Mbit).

It’s clear what you say about the DSP being behind the MMU but, is the remoteproc driver in charge of configuring the DSP MMU based on the resource table within the binary or is the DSP binary who configures the MMU? Do we have to configure anything on the MMU in general or for a proper access to the GPMC in particular other than the resource table? SDK RTOS (we’re using the latest SDK 6.03.00.106 BTW) states in chapter 10.2.5.1.3. Understanding the Memory Map, the following:

First, it is important to understand that there are a pair of Memory Management Units (MMUs) that sit between the DSP subsystems and the L3 interconnect. One of these MMUs is for the DSP core and the other is for its local EDMA. They both serve the same purpose of translating virtual addresses (i.e. the addresses as viewed by the DSP subsystem) into physical addresses (i.e. addresses as viewed from the L3 interconnect).

Knowing that I would think that the MMU is needed to access the L3_MAIN (including the GPMC for the NVRAM) but TRM states:

DSP accesses (to non-DSP memories like SDRAM on L3_MAIN) for addresses above 0x1000_0000 are handled via the DSP (XMC) MDMA 128-bit master interface and are routed to the DSP subsystem MDMA Initiator port either through DSP_ MMU0 or bypassing MMU

So now we are not sure if:

the MMU is being bypassed and all DSP addresses above 0x10000000 (just L3_MAIN @0x14000000 in our case regarding DSP Memory Map of the TRM) are being directly translated like 0x14000000 (DSP address) à 0x00000000 (L3_MAIN real physical address)
the MMU is being configured by remoteproc based on the resource tabled contained into the DSP binary.

Back to the NVRAM, and answering your question, yes, we do have access to NVRAM too through devmem2.

Finally, we got it almost working marking the memory section going to NVRAM as type NOLOAD in linker.cmd. As it’s a NVRAM we need to keep the content and avoid loading any other values. Therefore, we just need to map variables/structures to parse the contents of the NVRAM avoiding changing any content of the memory. Let me explain why I say “almost”…

Now, I have a dataNVram array in the code, obviously mapped to the NVRAM segment at its very beginning @0x15000000 (checked in .map, there’s nothing more going to NVRAM). I increase the value of one position of the array every ten seconds.

In the same loop, I print the contents of this array addressing it with a pointer pointing to the base address of the NVRAM, and I can see the modified value (surrounded in red).

But, if I check the modified NVRAM memory position with devmem2 the result shows that is has not been modified (0xFFFF). Then, I modify the very same NVRAM memory position with devmem2 and it is not reflected in system_printf trace either.

But, if I stop the DSP and start it again then it prints the right value.

We are quite sure that NVRAM is being cached by the DSP (if we write with devmem and do a cold reset then we see the written value correctly), but we don’t know how to avoid this memory or memory section being cached. Any hints?

By the way, we plan to write to the NVRAM from the DSP and maybe one of the IPUs, the A15 will just read from it. Which would be the proper way to shared the access to the GPMC-NVRAM? Can you suggest us some type of intercore synchronization for this? GateMP? Spinlocks? Any other?

Sorry for the “brick” size of the post, due to the points “derived” from the main issue and needed to be detailed and as clear as possible. So, let me summarize the questions…

DSP MMU. Is it bypassed or configured by the DSP binary based on the resource table or configured by remoteproc based on the resource table?
If we are right and the NVRAM is being cached, how can we avoid it?
Recommended ways to share the access to the GPMC-NVRAM from different cores

Again, thank you very much in advance for your support.

+1 Suman Anna over 3 years ago in reply to Ro

TI__Guru* 94565 points

Hi Roberto,

That's quite a long post :)

Here are the responses to your summarized questions:

1. DSP MMU is enabled in the Linux SDK. OMAP remoteproc driver configures the MMU, not the DSP binary. The resource table entries are used to program the DSP's da against the actual pa. RSC_DEVMEM entry is mapped using the da and pa from the resource table entry directly. RSC_CARVEOUT entry has the memory allocated dynamically on Linux-side to get the pa and program it against the da from resource table entry.

2. The DSP Cache setting are configured by programming the corresponding MAR bits. Each MAR bit's granularity is 16 MB, so you have to set the corresponding bit to mark the associated 16 MB as non-cached.

Please see the SYS/BIOS C64x Cache module API.

http://software-dl.ti.com/dsps/dsps_public_sw/sdo_sb/targetcontent/bios/sysbios/6_83_00_18/exports/bios_6_83_00_18/docs/cdoc/index.html

Eg: Cache.MAR192_223 = 0x00000010; /* Sets 0xc4000000 -> 0xc4ffffff as non-cached */

3. This is a memory-mapped peripheral, so the semantics are the usual. IPU and DSPs have their own MMUs, so you need to provide the RSC_DEVMEM entry and read/write using the corresponding device addresses.

---

W.r.t DSP addresses, the TRM is correct. Please also refer to DSP Memory Map in Chapter 2. In summary, even though the DSP MMUs are capable of addressing the full 4 GB space, the lower range addresses (0x14000000 as per DSP Memory Map) will never go through the DSP MMUs, and these are all local addresses and terminate within the DSP subsystem itself.

Please check the accesses either by making the corresponding region as non-cached through MAR register programming, or use Cache API to do invalidate for read/flush for write.

IPUs behave differently both w.r.t address ranges, cache programming. The Cache programming is provided by an Attribute MMU (AMMU) which can also provide some translation.

regards

Suman

Processors

Processors forum

AM5718: NVRAM remoteproc segment load to DSP