clock for PRU external bus interface

Ryan Bishop

I am looking at using the PRU to drive some parallel video data out of the D30 register. I was originally going to use the LCDC port to do this, but it turns out we need 20-bits of data rather than 16. My question is does anyone know of a way to create a clock and/or sync signals for PRU data other than bit-banging them on the GPIO lines?

over 15 years ago

0 RandyP over 15 years ago

TI__Guru* 84110 points

One method that I can suggest would be to use the asynchronous 16-bit EMIFA interface and use the 4 least-significant address lines for data. The EMIFA interface will generate a write strobe for each write operation, and the PRU could apply the top 4 bits of the intended data to make bits 4:1 of the address.

For example, consider the memory address 0x60000000 to be the base address of your output video port. Then use the following pseudo code to write the 20-bit data from 32-bit words from a buffer in L2 at, say 0x11801000:

int *pInData = (int*)0x11801000; /* or int *pInData = BufferLoc */
short *pOutData (short*)0x60000000;

for ( i = 0; i < nBufferLengthInWords; i++ )
{
    // read 32-bit word from buffer into variable j
    j = pInData[i];

    // shift out lower half-word of j and mask off for 4 remaining bits
    k = (j >> 16) & 0xf;

    // write lower 16 bit on data bus, bits 19:16 of data as EMA_A[3:0]
    pOutData[k] = j & 0xffff;
}

The write rate should be EMA_CLK/3 if the async timing is set to 0,0,0 for SETUP,STROBE,HOLD parameters = 1,1,1 timing values.

Let us know if that is worth a try for you, and whether it works.

If this answers your question, please click Verify Answer on this posting; if not, please reply back with additional information for your query.

0 Ryan Bishop over 15 years ago in reply to RandyP

Intellectual 415 points

Randy,

I think I understand how this works, and the write strobe would make a good pixel clock. However I'm still not sure how this solution provides the LineValid and FrameValid or Hsync and Vsync signals. I am guessing I would still need to use the PRU or GPIO to generate those signals or add external logic. Also I assume when you said:

RandyP said:
and the PRU could apply the top 4 bits of the intended data to make bits 5:1 of the address.

you actually meant bits 4:1 of the address.?

0 RandyP over 15 years ago in reply to Ryan Bishop

TI__Guru* 84110 points

Thanks for catching my error. I edited the posting so nobody will have to read 3 posts to find the right answer.

Ryan Bishop said:
However I'm still not sure how this solution provides the LineValid and FrameValid or Hsync and Vsync signals.

Sorry, I was just trying to solve the 20-bit data problem.

You can certainly turn more address lines into signals, but that might not be safe for signals that are not clocked at the destination by the "pixel clock". Logic levels on the address lines tend to stay at their last driven level, but this is not a clearly defined condition or situation. GPIOs might be safest for these signals, but you could try it out either way to see how it works.

One caution is that the same EMIFA bus will be available for communicating with other devices if you have them on this bus. Those reads and writes would be interleaved with these video writes, so this could affect the consistency of the video clock - stalls could be inserted.

If other devices can be present, then you will also have to qualify the write strobe/pixel clock with the CE2 or whichever chip select you choose.

Ryan Bishop said:
I am guessing I would still need to use the PRU or GPIO to generate those signals or add external logic.

Each PRU is basically another microprocessor inside the chip. So the PRU can be programmed to generate signals using the EMIFA bus and/or GPIO pins. Your application sounds like a perfect situation to use a PRU; this is a dedicated purpose that is well defined and does not require a lot of DSP-type instructions. PRU transfers can be bursty, so make sure everything works the way you want it to. Maybe the two PRUs could be used like ping-pong writers to keep the EMIFA running [more] constantly rather than being bursty.

0 Ryan Bishop over 15 years ago in reply to RandyP

Intellectual 415 points

Randy,

Thanks for your ideas and the feedback. I am still wondering if for my application if it would be easier not to use the EMIFA at all. Currently we have our EMIFA hooked up to FPGA's and flash memories, so the interleaving problem you mentioned could be an issue.

We are actually bringing our video data through the UPP, processing it and buffering it in DDR memory. From my understanding, each PRU R30 register is a 32-bit register directly mapped to what are essentially GPO pins, is that correct? Also does the PRU have DMA to the DDR? If so, my idea was just to set a semaphore in the DSP code that alerts the PRU when a processed frame is ready. Then the PRU would just take that frame and shove it directly out to our display device.

Am I missing something with regard to the function of the PRU that would make this not work?

Thanks,

Ryan

0 RandyP over 15 years ago in reply to Ryan Bishop

TI__Guru* 84110 points

Ryan Bishop said:
From my understanding, each PRU R30 register is a 32-bit register directly mapped to what are essentially GPO pins, is that correct?

Yes, see the datasheet to find which pins have the PRUn_R30[m] connections.

Ryan Bishop said:
does the PRU have DMA to the DDR?

The PRU is a bus master, but it does not have a SRC/DST/CNT functionality. This can be implemented in a PRU program, though, by reading from a source and writing to a destination, and the PRU can read from and write to the DDR memory space.

The only caution I would have is that the PRU memory accesses tend to be bursty rather than being long continuous streams of reads or writes. This fits very well with the DSP architecture for maximizing the available internal and external bus bandwidth, but can affect some peripherals if they are expecting continuous streams.

0 Ryan Bishop over 15 years ago in reply to RandyP

Intellectual 415 points

Randy,

Could you tell me a little bit more about what you mean by "bursty"? What is the cause of the burstiness? How long are the bursts typically and how often do they occur?

Thanks

Ryan

0 RandyP over 15 years ago in reply to Ryan Bishop

TI__Guru* 84110 points

The EDMA3 architecture is heavily optimized to move a lot of data on a lot of buses and to keep those buses as busy as possible. The PRUSS is a register-based architecture CPU that allows very customized operations to take place outside of the main DSP/CPU; and there are two PRUSS's on the chip.

The DSP can read and write memory, but it is not as efficient as the EDMA3 is. The PRUSS can also read and write memory, and it has similar limitations as the DSP core does, being a register-based architecture where a source must be read into a register then the register is written to the destination.

There are some instructions in the PRUSS that can accelerate memory operations by reading or writing multiple locations with one instruction, but these are still single instructions that have to execute reads or execute writes. The EDMA3 has multiple access buses and multiple FIFOs that allow it to simultaneously both read and write, when conditions permit.; and the EDMA3 can have multiple Transfer Controllers allow multiple reads and multiple writes to take place simultaneously.

When you do a bunch of reads, then a bunch of writes, it will look like a burst of reads or a burst of writes on whichever bus you are monitoring. This is what I meant by burstiness.

I have not had a chance to build up a good example, but it seems like there should be ways to get a lot of performance out of each PRUSS and possibly double by using both of them at the same time.

For reference here, you can find a lot of good information on the PRU on the TI Wiki Pages, and in particular in the PRU category.

Processors

Processors forum

clock for PRU external bus interface