Multicore Navigator Tips ‘n Tricks, pt. 1

Multicore Navigator Tips ‘n Tricks, pt. 1

  • Comments 2

Our Multicore Mix blog site contest earlier this year solicited input on what content readers would like to see from our posts.  As a result of a request for a more tutorial oriented blog, I am providing a multi part series on our Multicore Navigator.

Multicore Navigator is the primary data mover in all TI Keystone-based devices.  It can be thought of as the “Swiss Army knife” of the device because it can be used for functions other than just data movement.  Understanding Navigator’s functionality and how to make use of it is critical to the efficient use of our Keystone devices.

In this first post, I will briefly highlight the hardware capabilities of Multicore Navigator, then talk about using it for synchronization of cores or processing tasks.  My next post will discuss using Navigator for notification purposes, and the third will discuss messaging, the biggest blade in Navigator’s Swiss Army knife arsenal.

Hardware capabilities of TI’s Multicore Navigator:

  • A hardware Queue Manager, supporting 8192 or 16384 queues:
    • Data items (packets) can be added to a queue by “pushing” its address to the queue.
    • Data items can be removed from a queue by “popping” the queue.
  • Multiple Packet-DMAs (a specialized type of  DMA, or “direct memory access” – a hardware engine that copies data from point A to point B without CPU intervention).
    • Packet-DMAs (pktDMA) embedded in several peripherals, which can be chained to automatically send packets from one peripheral to the next.
    • One or two pktDMAs dedicated for core-to-core data transfers.
    • Packets pushed to special queues automatically trigger pktDMA transfers.
  • Embedded processors (PDSPs) for running Navigator-specific firmware, which perform additional processing without CPU intervention.
  • Several modes of generating interrupts.

Using Navigator for Synchronization

Synchronization, a type of InterProcessor Communication (IPC), is the first tool in the Swiss Army knife. It generally refers to bringing multiple tasks, cores or even devices to a known point in their programs at the same time. This sync point is sometimes called a sync barrier. There are many reasons why a program might do this – one reason would be to force slave cores to wait until the master core is finished with something.  In this case, the slaves reach the sync barrier and cannot continue until the master is ready (in practice, you can’t guarantee which cores will arrive at the sync barrier first).

The simplest method is shown here:

  

In this case, two or more general purpose queues are used. One is used as a free queue and the others are sync queues (there may more than one sync queues). The master of the sync barrier pops from the free queue and pushes to the sync queue(s). The slaves wait for their sync queue for something to arrive (either by interrupt or polling).  When it does, the slaves pop their sync queue, push back to the free queue, and resume running.

In other cases, the master needs confirmation that all slaves have arrived before the slaves are allowed to continue. This type of sync barrier is a bit more complex. Here, the slaves would push to a sync queue for the master, then wait for a slave sync queue. The master would wait for the master sync queue, and only when all the slaves have “checked in” would it push to the slave sync queues to release the slaves.

Those are just two examples. There are many types of sync barriers, and many ways to implement them using Multicore Navigator.  Can you think of a way to make a cascading sync where tasks in a chain of tasks are each waiting for the previous task to release it, similar to falling dominos?

Stay tuned for the next installment, I will discuss using Navigator for notification, the next tool in the Swiss Army knife.

 

  • Hi,

    I want to use all cores of C6678 i.e 8 cores.

    2 cores are used by H.264 BP Decoder and 4 cores by H.264 BP Encoder.

    I can configure the number pf cores by config file for this codec.

    but after Decoding the .264 by H.264BP decoder, how encoder will come to know that it has to take the decoded .YUV and encode it by using all 4 cores and generate once again .264 from this.

    How I can communicate between the cores of C6678 DSP ???

    Please do the reply.

    -Studinstru

  • Hi Studinstru,

    What you describe is an example of Load Balancing, that will be covered in a later post.  There are many ways for one core to notify another core that work is ready to be performed.  One way to do this using the Queue Manager is very similar to what is shown above: The decoder cores are the masters, and the encoder cores are the slaves. The master would pop a descriptor from a free queue, write into the descriptor the information needed by the encoder (perhaps the address and dimensions of the buffer to be encoded), then push the descriptor into a queue that is being monitored by slave (encoder) core. The encoder core would then pop the descriptor, encode the buffer, recycle the descriptor to a free queue then wait for another descriptor to appear in it's job queue.  Each encoder core would have its own queue to monitor, and the decoder cores would make the decision about which core to send the job to, perhaps by monitoring the number of jobs (descriptors) in each slave's job queue, or maybe based on the size of the buffer to be encoded.