This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Why is my PRU UART code halting?



I'm working at getting some PRU code running to drive the PRU-ICSS UART0 on the BeagleBone Black.

I've made some simple test code running that can TX "Hello" periodically, and also loopback any RX bytes to TX. However, I'm finding that there are many seemingly trivial changes that I can make to the code that cause it to crash, which is to say, the PC gets stuck at 0x148 (according to /sys/kernel/debug/remoteproc/remoteproc1/regs) which is a location past the end of my program code.

See the attached code. See in particular the TODO comment in ir-pru-uart-test.c. I would really appreciate if someone could check over my code and give any suggestions as to why the code runs properly when the code without the 'tx' variable is used, but crashes when the code using the 'tx' variable is used.

ir-pru-uart-test-2.tar.gz

For building the PRU code, I'm using the TI PRU compiler 2.1.0 under CCS 6.0.1.

For building Linux, I'm using the meta-ti layer and the linux-ti-staging 3.14.31 kernel in Yocto.

I'm loading the code using the pruss_remoteproc driver, since that seems the way of the future. That is, I copy the build output ir-pru-uart-test-2.out to the target /lib/firmware/rproc-pru0-fw. Then I start the pruss_remoteproc driver with the following commands:

insmod /lib/modules/3.14.31/kernel/drivers/rpmsg/virtio_rpmsg_bus.ko
insmod /lib/modules/3.14.31/kernel/drivers/remoteproc/pruss_remoteproc.ko

I have to say that the documentation for the pruss_remoteproc driver is sparse, so there are a few things I'm not sure about, such as the "Resource Table". Also whether the program execution has to start at 0 (which I'm doing by putting .text:_c_int00*    >  0x0, PAGE 0 in the linker command file).

.text:_c_int00*    >  0x0, PAGE 0

  • I should also mention that I am not able to debug the PRU code at the moment. I'm curious about the presence of the file /sys/kernel/debug/remoteproc/remoteproc1/single_step and I wonder if remoteproc could enable any debugging possibilities (I'm totally new to remoteproc).

    I hope to get an XDS100 JTAG adapter soon, and then use it to step through the PRU code directly. But I haven't got that yet.

  • The other detail I should have mentioned before is: When the code variant with the 'tx' variable is used, the code runs fine until the UART receives a byte, at which point it crashes (it doesn't echo the byte, and the PC ends up at 0x148. With the code variant that doesn't use the 'tx' variable, the code is able to echo the RX byte to TX and continue running fine.

  • Hi Craig,

    Is this a soft UART? you can check this wiki - there is an example with up to 6 soft UARTs on the PRU: http://processors.wiki.ti.com/index.php/Soft-UART_Implementation_on_AM335X_PRU_-_Software_Users_Guide

  • No, my question has nothing to do with soft UARTs. It has more to do with execution flow, perhaps code loading.

  • Sorry for misunderstanding your question Craig. My fault - didn't read carefully enough. I will forward this to a PRU expert.
  • Hi Craig,

    Yes, there are some debug features enabled by the remoteproc driver. These features are located in /sys/kernel/debug/remoteproc/remoteproc<0/1> directory. 

    The 'single_step' utilizes the single-step execution of the PRU cores.  Writing a non-zero value performs a single step, and a zero value restores the PRU to execute in the same mode as before the first single step.  (Note: if the PRU is halted because of a halt instruction, then no change occurs.)

    For example, you can perform a single-step by echoing 1 into single_step, and then you can read the PRU registers. So, typically, I would use it as:

    echo 1 > /sys/kernel/debug/remoteproc/remoteproc1/single_step; cat /sys/kernel/debug/remoteproc/remoteproc1/regs

    and continue doing this to execute PRU one instruction after other.

    Echo 0 will return the PRU to the prior state before the first echo 1 (it will run continuously if it was running before; noop if it was halted before).

    If you want to start stepping from the beginning of the program, you can write 0x0 to the PRU_CTRL register (i.e. address 0x4A322000 for PRU0) before echoing 1 into single_step.

    Regards,

    Melissa 

  • Thanks for explaining single-stepping via remoteproc. I will try it. Is it possible to see PRU registers and memory when single-stepping this way?

    I got an XDS100 v2 JTAG, and today I was able to run the attached code on the PRU using it. I found that the code runs fine via the JTAG. That is significant—it indicates that my code is okay, but there is some issue with running it in the Linux remoteproc environment.

    At this point, I have two hypotheses:

    1) The remoteproc driver isn't loading the code into the PRU properly.

    This seems possible but unlikely. I'm not sure how to verify it.

    2) There is some PRU register configuration that is different when running under Linux and remoteproc.

    Does my code need to set some other registers to configure the PRU, e.g. to set up the PRU to access shared data memory, or something else when running under remoteproc? E.g. PRU-ICSS CTRL register.

    By the way, is it possible to do JTAG debugging of a PRU program when it's running under Linux remoteproc?

    Any advice to resolve this would be greatly appreciated.
  • Hi Craig,

    Can you try replacing your resource table file with resource_table_empty.h?   The resource table is the main difference I can think of between the CCS loader and pruss_remoteproc.   

    As for JTAG debugging, I am investigating if this is possible while running Linux remoteproc.  

    Regards,

    Melissa 

  • I actually already tried that (I found it in e2e.ti.com/.../368069). It doesn't improve the situation.

    I've tried JTAG debugging of PRU while Linux is running, but it gave me an error about not being able to load.

    By the way, it's not easy to find how to debug the PRU with JTAG. This is the crucial info I found: Lab 1 of processors.wiki.ti.com/.../PRU_Training:_Hands-on_Labs
  • I've investigated, and results are pointing to my hypothesis (1): the remoteproc driver isn't loading the code into the PRU properly.

    I've used the devmem2 tool under Linux to halt the PRU0 and read its IMEM. According to the map file, the first 0x118 bytes of IMEM are used by my program.

    I compared the IMEM read under Linux to the IMEM read when doing straight JTAG debugging. It looks as though the IMEM under Linux has not correctly loaded the last 6 words, from address 0x100 onwards.

    Please see the attached file, sub-directory memory-check. In particular, this diff:

    craigm@craig-linux-pc:~/git/ir-pru-uart-test-2/memory-check$ diff -u pru0-imem-jtag.dat pru0-imem-remoteproc-edited.txt 
    --- pru0-imem-jtag.dat	2015-02-19 16:54:51.650676089 +1100
    +++ pru0-imem-remoteproc-edited.txt	2015-02-19 16:55:21.370676665 +1100
    @@ -62,9 +62,9 @@
     91002780
     E1040200
     100000E0
    -81002780
    -21001500
    -230044C3
    -21004300
    -10000000
    -20C30000
    +00000001
    +00000000
    +00000000
    +00000000
    +00000000
    +00000000

    Compare to the file pru0-imem-disassembly-jtag.txt which shows the disassembly. The corrupted memory corresponds to the C line

    tx = CT_UART.RBR;

  • Here is the updated file which I forgot to attach before:

    0412.ir-pru-uart-test-2.tar.gz

  • Hmm, that address 0x100 is interesting and suspicious. I wonder if that is the resource table, which my map file shows at address 0x100 to 0x134 in data memory, being erroneously loaded into program memory. It's a hypothesis.
  • In remoteproc_elf_loader.c function rproc_elf_find_loaded_rsc_table() it is calling rproc_da_to_va() with flags parameter shdr->sh_flags, but section flags are not the same as pheader flags. The section has LSbit SHF_WRITE set, but pru_da_to_va() interprets it as PF_X and thus calculates it in IMEM.

    I think that must be part of the problem. Now to try to understand it more and then figure out how to fix it. I'd really appreciate if there's anyone at TI who is willing and able to help with this.

  • The code was added as follows:

    commit 5e834aa9ce55e3ae1473a39868fca5420b7bedea
    Author: Suman Anna <s-anna@ti.com>
    Date:   Mon May 19 17:21:12 2014 -0500
    
        remoteproc/core: add a rproc ops for performing address translation
        
        The rproc_da_to_va API is currently used to perform any device to
        kernel address translations to meet the different needs of the remoteproc
        core/drivers (eg: loading). The functionality is achieved within the
        remoteproc core, and is limited only for carveouts allocated within the
        core or to internal memories published through the resource table.
        
        A new rproc ops, da_to_va, is added to provide flexibility to platform
        implementations to perform the address translation themselves when the
        above conditions cannot be met by the implementations. The rproc_da_to_va()
        is expanded to take in an additional flags field and invoke this ops if
        present, and fallback to regular processing if the platform implementation
        cannot provide the translation.
        
        Signed-off-by: Suman Anna <s-anna@ti.com>

    It appears that this has a bug as previously described.

  • How about this. It is a patch file to be applied to git branch ti-linux-3.14.y commit a7f4f71b32.


    0001-remoteproc-Fix-loading-of-PRU-code-resource-table.patch.tar.gz

  • Hi Craig,

    You have found a valid bug, thanks a lot. The loading of the resource table is correct, but as you mentioned when it tries to find the cached resource table, it does get an IRAM equivalent address instead of the DRAM equivalent address due to the difference in the flags between program headers and section headers. I have taken a quick glance at your patch, and it seems to do the right things to get the proper translation. I would have to rework the patch to generalize it and have it apply to my feature tree, I will share the patch(es) once I am done.

    regards
    Suman
  • Craig,

    I am gonna push the following patchset for the address translation, let me know if you have any questions on the series. This will show up on ti-linux-3.14.y sometime next week. pruss-rsc-table-fix.tar.gz


    regards

    Suman

  • Thanks, they look fine to me after a quick review. (Though I'm not familiar with "carveouts" as mentioned in the doc changes.)

    By the way, is pruss_remoteproc driver available in a kernel after 3.14?
  • The carveouts are what the traditional DSP or IPU processors use for loading and executing code in DDR, essentially memory is allocated from kernel and mapped into the respective processors's MMUs at the specific addresses that those processors load the code/data. 

    There is no particular branch hosted on a newer kernel where the driver is available, the next branch that I will host will be on the 2015 LTS kernel. Depending on the kernel of your choice, you should be able to cherry-pick the remoteproc patches from my 3.14 remoteproc feature tree.

    http://git.ti.com/gitweb/?p=rpmsg/remoteproc.git;a=shortlog;h=refs/heads/rproc-linux-3.14.y

    The PRU related commits should start from ed052fb60b9a03fea110cb253ffeed5c6f198a34 onwards, there shouldn't be many. Depending on the kernel you want, you may also have to pick some additional mailbox framework patches (the framework that 3.14 uses is available from 3.18 onwards).