Why is my PRU UART code halting?

Craig McQueen

I'm working at getting some PRU code running to drive the PRU-ICSS UART0 on the BeagleBone Black.

I've made some simple test code running that can TX "Hello" periodically, and also loopback any RX bytes to TX. However, I'm finding that there are many seemingly trivial changes that I can make to the code that cause it to crash, which is to say, the PC gets stuck at 0x148 (according to /sys/kernel/debug/remoteproc/remoteproc1/regs) which is a location past the end of my program code.

See the attached code. See in particular the TODO comment in ir-pru-uart-test.c. I would really appreciate if someone could check over my code and give any suggestions as to why the code runs properly when the code without the 'tx' variable is used, but crashes when the code using the 'tx' variable is used.

ir-pru-uart-test-2.tar.gz

For building the PRU code, I'm using the TI PRU compiler 2.1.0 under CCS 6.0.1.

For building Linux, I'm using the meta-ti layer and the linux-ti-staging 3.14.31 kernel in Yocto.

I'm loading the code using the pruss_remoteproc driver, since that seems the way of the future. That is, I copy the build output ir-pru-uart-test-2.out to the target /lib/firmware/rproc-pru0-fw. Then I start the pruss_remoteproc driver with the following commands:

insmod /lib/modules/3.14.31/kernel/drivers/rpmsg/virtio_rpmsg_bus.ko
insmod /lib/modules/3.14.31/kernel/drivers/remoteproc/pruss_remoteproc.ko

I have to say that the documentation for the pruss_remoteproc driver is sparse, so there are a few things I'm not sure about, such as the "Resource Table". Also whether the program execution has to start at 0 (which I'm doing by putting .text:_c_int00* > 0x0, PAGE 0 in the linker command file).

.text:_c_int00* > 0x0, PAGE 0

over 10 years ago

0 Craig McQueen over 10 years ago

Intellectual 850 points

I should also mention that I am not able to debug the PRU code at the moment. I'm curious about the presence of the file /sys/kernel/debug/remoteproc/remoteproc1/single_step and I wonder if remoteproc could enable any debugging possibilities (I'm totally new to remoteproc).

I hope to get an XDS100 JTAG adapter soon, and then use it to step through the PRU code directly. But I haven't got that yet.

0 Craig McQueen over 10 years ago

Intellectual 850 points

The other detail I should have mentioned before is: When the code variant with the 'tx' variable is used, the code runs fine until the UART receives a byte, at which point it crashes (it doesn't echo the byte, and the PC ends up at 0x148. With the code variant that doesn't use the 'tx' variable, the code is able to echo the RX byte to TX and continue running fine.

0 Biser Gatchev-XID over 10 years ago in reply to Craig McQueen

TI__Guru**** 393215 points

Hi Craig,

Is this a soft UART? you can check this wiki - there is an example with up to 6 soft UARTs on the PRU: http://processors.wiki.ti.com/index.php/Soft-UART_Implementation_on_AM335X_PRU_-_Software_Users_Guide

0 Craig McQueen over 10 years ago in reply to Biser Gatchev-XID

Intellectual 850 points

No, my question has nothing to do with soft UARTs. It has more to do with execution flow, perhaps code loading.

0 Biser Gatchev-XID over 10 years ago in reply to Craig McQueen

TI__Guru**** 393215 points

Sorry for misunderstanding your question Craig. My fault - didn't read carefully enough. I will forward this to a PRU expert.

0 melissaw over 10 years ago in reply to Biser Gatchev-XID

TI__Genius 15950 points

Hi Craig,

Yes, there are some debug features enabled by the remoteproc driver. These features are located in /sys/kernel/debug/remoteproc/remoteproc<0/1> directory.

The 'single_step' utilizes the single-step execution of the PRU cores. Writing a non-zero value performs a single step, and a zero value restores the PRU to execute in the same mode as before the first single step. (Note: if the PRU is halted because of a halt instruction, then no change occurs.)

For example, you can perform a single-step by echoing 1 into single_step, and then you can read the PRU registers. So, typically, I would use it as:

echo 1 > /sys/kernel/debug/remoteproc/remoteproc1/single_step; cat /sys/kernel/debug/remoteproc/remoteproc1/regs

and continue doing this to execute PRU one instruction after other.

Echo 0 will return the PRU to the prior state before the first echo 1 (it will run continuously if it was running before; noop if it was halted before).

If you want to start stepping from the beginning of the program, you can write 0x0 to the PRU_CTRL register (i.e. address 0x4A322000 for PRU0) before echoing 1 into single_step.

Regards,

Melissa

0 Craig McQueen over 10 years ago in reply to melissaw

Intellectual 850 points

Thanks for explaining single-stepping via remoteproc. I will try it. Is it possible to see PRU registers and memory when single-stepping this way?

I got an XDS100 v2 JTAG, and today I was able to run the attached code on the PRU using it. I found that the code runs fine via the JTAG. That is significant—it indicates that my code is okay, but there is some issue with running it in the Linux remoteproc environment.

At this point, I have two hypotheses:

1) The remoteproc driver isn't loading the code into the PRU properly.

This seems possible but unlikely. I'm not sure how to verify it.

2) There is some PRU register configuration that is different when running under Linux and remoteproc.

Does my code need to set some other registers to configure the PRU, e.g. to set up the PRU to access shared data memory, or something else when running under remoteproc? E.g. PRU-ICSS CTRL register.

By the way, is it possible to do JTAG debugging of a PRU program when it's running under Linux remoteproc?

Any advice to resolve this would be greatly appreciated.

0 melissaw over 10 years ago in reply to Craig McQueen

TI__Genius 15950 points

Hi Craig,

Can you try replacing your resource table file with resource_table_empty.h? The resource table is the main difference I can think of between the CCS loader and pruss_remoteproc.

As for JTAG debugging, I am investigating if this is possible while running Linux remoteproc.

Regards,

Melissa

0 Craig McQueen over 10 years ago in reply to melissaw

Intellectual 850 points

I actually already tried that (I found it in e2e.ti.com/.../368069). It doesn't improve the situation.

I've tried JTAG debugging of PRU while Linux is running, but it gave me an error about not being able to load.

By the way, it's not easy to find how to debug the PRU with JTAG. This is the crucial info I found: Lab 1 of processors.wiki.ti.com/.../PRU_Training:_Hands-on_Labs

0 Craig McQueen over 10 years ago in reply to Craig McQueen

Intellectual 850 points

I've investigated, and results are pointing to my hypothesis (1): the remoteproc driver isn't loading the code into the PRU properly.

I've used the devmem2 tool under Linux to halt the PRU0 and read its IMEM. According to the map file, the first 0x118 bytes of IMEM are used by my program.

I compared the IMEM read under Linux to the IMEM read when doing straight JTAG debugging. It looks as though the IMEM under Linux has not correctly loaded the last 6 words, from address 0x100 onwards.

Please see the attached file, sub-directory memory-check. In particular, this diff:

craigm@craig-linux-pc:~/git/ir-pru-uart-test-2/memory-check$ diff -u pru0-imem-jtag.dat pru0-imem-remoteproc-edited.txt 
--- pru0-imem-jtag.dat	2015-02-19 16:54:51.650676089 +1100
+++ pru0-imem-remoteproc-edited.txt	2015-02-19 16:55:21.370676665 +1100
@@ -62,9 +62,9 @@
 91002780
 E1040200
 100000E0
-81002780
-21001500
-230044C3
-21004300
-10000000
-20C30000
+00000001
+00000000
+00000000
+00000000
+00000000
+00000000

Compare to the file pru0-imem-disassembly-jtag.txt which shows the disassembly. The corrupted memory corresponds to the C line

tx = CT_UART.RBR;

0 Craig McQueen over 10 years ago in reply to Craig McQueen

Intellectual 850 points

Here is the updated file which I forgot to attach before:

0412.ir-pru-uart-test-2.tar.gz

0 Craig McQueen over 10 years ago in reply to Craig McQueen

Intellectual 850 points

Hmm, that address 0x100 is interesting and suspicious. I wonder if that is the resource table, which my map file shows at address 0x100 to 0x134 in data memory, being erroneously loaded into program memory. It's a hypothesis.

0 Craig McQueen over 10 years ago in reply to Craig McQueen

Intellectual 850 points

In remoteproc_elf_loader.c function rproc_elf_find_loaded_rsc_table() it is calling rproc_da_to_va() with flags parameter shdr->sh_flags, but section flags are not the same as pheader flags. The section has LSbit SHF_WRITE set, but pru_da_to_va() interprets it as PF_X and thus calculates it in IMEM.

I think that must be part of the problem. Now to try to understand it more and then figure out how to fix it. I'd really appreciate if there's anyone at TI who is willing and able to help with this.

0 Craig McQueen over 10 years ago in reply to Craig McQueen

Intellectual 850 points

The code was added as follows:

commit 5e834aa9ce55e3ae1473a39868fca5420b7bedea
Author: Suman Anna <s-anna@ti.com>
Date:   Mon May 19 17:21:12 2014 -0500

    remoteproc/core: add a rproc ops for performing address translation
    
    The rproc_da_to_va API is currently used to perform any device to
    kernel address translations to meet the different needs of the remoteproc
    core/drivers (eg: loading). The functionality is achieved within the
    remoteproc core, and is limited only for carveouts allocated within the
    core or to internal memories published through the resource table.
    
    A new rproc ops, da_to_va, is added to provide flexibility to platform
    implementations to perform the address translation themselves when the
    above conditions cannot be met by the implementations. The rproc_da_to_va()
    is expanded to take in an additional flags field and invoke this ops if
    present, and fallback to regular processing if the platform implementation
    cannot provide the translation.
    
    Signed-off-by: Suman Anna <s-anna@ti.com>

It appears that this has a bug as previously described.

0 Craig McQueen over 10 years ago in reply to Craig McQueen

Intellectual 850 points

How about this. It is a patch file to be applied to git branch ti-linux-3.14.y commit a7f4f71b32.

0001-remoteproc-Fix-loading-of-PRU-code-resource-table.patch.tar.gz

0 Suman Anna over 10 years ago in reply to Craig McQueen

TI__Guru** 116505 points

Hi Craig,

You have found a valid bug, thanks a lot. The loading of the resource table is correct, but as you mentioned when it tries to find the cached resource table, it does get an IRAM equivalent address instead of the DRAM equivalent address due to the difference in the flags between program headers and section headers. I have taken a quick glance at your patch, and it seems to do the right things to get the proper translation. I would have to rework the patch to generalize it and have it apply to my feature tree, I will share the patch(es) once I am done.

regards
Suman

0 Suman Anna over 10 years ago in reply to Suman Anna

TI__Guru** 116505 points

Craig,

I am gonna push the following patchset for the address translation, let me know if you have any questions on the series. This will show up on ti-linux-3.14.y sometime next week. pruss-rsc-table-fix.tar.gz

regards

Suman

0 Craig McQueen over 10 years ago in reply to Suman Anna

Intellectual 850 points

Thanks, they look fine to me after a quick review. (Though I'm not familiar with "carveouts" as mentioned in the doc changes.)

By the way, is pruss_remoteproc driver available in a kernel after 3.14?

0 Suman Anna over 10 years ago in reply to Craig McQueen

TI__Guru** 116505 points

The carveouts are what the traditional DSP or IPU processors use for loading and executing code in DDR, essentially memory is allocated from kernel and mapped into the respective processors's MMUs at the specific addresses that those processors load the code/data.

There is no particular branch hosted on a newer kernel where the driver is available, the next branch that I will host will be on the 2015 LTS kernel. Depending on the kernel of your choice, you should be able to cherry-pick the remoteproc patches from my 3.14 remoteproc feature tree.

http://git.ti.com/gitweb/?p=rpmsg/remoteproc.git;a=shortlog;h=refs/heads/rproc-linux-3.14.y

The PRU related commits should start from ed052fb60b9a03fea110cb253ffeed5c6f198a34 onwards, there shouldn't be many. Depending on the kernel you want, you may also have to pick some additional mailbox framework patches (the framework that 3.14 uses is available from 3.18 onwards).

Processors

Processors forum

Why is my PRU UART code halting?