I am wondering if there is any way to program the packet accelerator or some other subsystem to be able to have all 8 cores on a c6678 share one IP v4 address and have a range of UDP ports for each core so that the PASS system basically redirects packets based on certain port #s to specific cores as opposed to all packets going to a common memory space and one of the cores being tasked with getting the packets to the specific cores to do further processing. I am not interested in just having one UDP port # mapped to one core but a range of port #s in the hundreds map to each core. I know one can program using custom L4 on a particular port in the PASS.
We currently are able to direct on our proprietary system packets with multiple DSPs all sharing a single IPv4 address and making use of a range of UDP port numbers to redirect traffic to individual DSPs thereby not having to have multiple IP addresses for the multiple DSPs.
Thanks, Aamir
Hi, Aamir:
The PASS system is designed to support your use case.
We can configure the PASS with single MAC address, single IP address and multiple UDP ports which direct packet traffic to different queues served by different cores.
Please note that multiple UDP ports can share the same destination.
If you would like to have a single LUT2 entry per core then you can use custom LUT2.
Please see a simple example at ti\drv\pa\example\multicoreExample.
Best regards,
Eric Ruei
Hi Eric,
I had a look through the multicore example you listed.
I am unsure what you mean by “If you want a single LUT2 entry per core then you can use custom LUT2”. Am I correct in understanding that this single entry per core will handle a range of UDP port #s and that shall be done using custom L2 and so I would need to modify how I call the Pa_addCustomLUT2 LLD function and the response sent back from the driver to the host application will then be forwarded on through the PKTDMA to setup the PA? Also I will not need to modify the actual PA firmware that runs on the PDSPs and that the firmware on the PDSPs dealing with custom LUT2 can handle my requirements of having ports 10000-11023 say go to core 0, 11024-12047 go to core 1 etc? What is the difference between the Pa_setCustomLUT2 and the addCustomLUT2 functions? Is there any example of this custom LUT2 in any of the examples listed that come with the MCSDK or can you provide a rough example for me to better understand the LLD.
If I want to be able to have 1024 ports assigned to each core so does that mean I need to have 1024 entries in the LUT2 table normally? This would mean for 8 cores I will be exhausting the 8192 LUT2 entries leaving no room for expansion if I wanted more than 1024 ports per core. Does this also mean I need to have 1024 Pa_addPort commands issued by each core if not doing custom LUT2 and the efficient way of setting up the LUT2 table is through the use of custom LUT2?
The PA user guide talks of 9 tx and 24 Rx channels. I am assuming these are just the number of channels available just for the PKTDMA dealing with the PA? So with 8 cores receiving Ethernet traffic from the PA, would one Rx channel per core be sufficient?
I have a lot of questions about the keystone architecture as it is a big departure from the c6414/6 processor - can I take them offline with you or someone else?
The customLUT2 feature is the best option to route a group of UDP ports to the desired DSP core if you can control the range of UDP ports.
For example:
Core 0: 0x8000 - 0x83FF
Core 1: 0x8400 - 0x87FF
Core 2: 0x8800 - 0x8BFF
Core 3: 0x8C00 - 0x8FFF
Core 4: 0x9000 - 0x93FF
Core 5: 0x9400 - 0x97FF
Core 6: 0x9800 - 0x9BFF
Core 7: 0x9C00 - 0x9FFF
Call Pa_setCustomLUT2() to setup one Custom Lookup rule. Here is the recommended configuration parameters::
custIndex = 0 // It is the first custom LUT1 group
handleLink = TRUE // The custom lookup should be linked to the previous IP matching
byteOffsets[4] = {2, 3, 4, 5} // Point to destination port number
byteMasks[4] = {0xFC, 0x00, 0, 0}
setMask = 0 // not used
Call Pa_addMAC() to add the MAC entry
Call Pa_addIp() to add the IP entry which should be linked to the MAC entry
Where paRouteInfo_t should be set as the followings:
dest = pa_DEST_CONTINUE_PARSE_LUT2
customType = pa_CUSTOM_TYPE_LUT2
customIndex = 0
Note: set paIpInfo_t.proto to 17 (UDP)
Call Pa_addCustomLUT2() to add one entry for each core. All the entries are linked to the single IP entry.
custIndex = 0
match[4] = {0x80, 0, 0,0} // core 0
{0x84, 0, 0,0 } // core 1
....
fReplace = FALSE
There are only eight entries are used at the LUT2.
The drawback is that the PASS will not perform UDP checksum verification since it treats the UDP header as a custom header.
You can use this mechanism to add 8 entries for TCP over IP.
Yes, the other choice is to call Pa_addPort to add 1024 entries per DSP core.
Regarding the PA CPPI channels, the 9 tx channels are connected to 9 transmit queues respectively.
The receive channels will be used to forward packets from PASS to any queues.
You just need to enable them all.
/* Open all CPPI Tx Channels. These will be used to send data to PASS/CPSW */ for (i = 0; i < NUM_PA_TX_QUEUES; i ++) { txChCfg.channelNum = i; /* CPPI channels are mapped one-one to the PA Tx queues */ txChCfg.txEnable = Cppi_ChState_CHANNEL_DISABLE; /* Disable the channel for now. */ txChCfg.filterEPIB = 0; txChCfg.filterPS = 0; txChCfg.aifMonoMode = 0; txChCfg.priority = 2; if ((gCpdmaTxChanHnd[i] = Cppi_txChannelOpen (gCpdmaHnd, &txChCfg, &isAllocated)) == NULL) { System_printf ("Error opening Tx channel %d\n", txChCfg.channelNum); return -1; }
Cppi_channelEnable (gCpdmaTxChanHnd[i]); }
/* Open all CPPI Rx channels. These will be used by PA to stream data out. */ for (i = 0; i < NUM_PA_RX_CHANNELS; i++) { /* Open a CPPI Rx channel that will be used by PA to stream data out. */ rxChInitCfg.channelNum = i; rxChInitCfg.rxEnable = Cppi_ChState_CHANNEL_DISABLE; if ((gCpdmaRxChanHnd[i] = Cppi_rxChannelOpen (gCpdmaHnd, &rxChInitCfg, &isAllocated)) == NULL) { System_printf ("Error opening Rx channel: %d \n", rxChInitCfg.channelNum); return -1; }
/* Also enable Rx Channel */ Cppi_channelEnable (gCpdmaRxChanHnd[i]); }
NUM_PA_TX_QUEUES = 9
NUM_PA_RX_CHANNELS = 24
You can find some training materials at the website link:
http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=OLT110027
Thanks very much for your reply.
I did have a look at the training material. It was not accessible till today as the external link was broken. Going through the PASS training it seems that the PASS can only do udp port filtering if it is within the first 128 bytes of the packet. What happens if there is sufficient number of optional headers in the UDP ipv4 packet and the UDP information is not within the first 128 bytes of the ethernet packet or is this 128 bytes of the start of the UDP?
What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?
Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.
Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?
Hi, Aamar:
Please read my answers below:
[Eric] I am not aware of the 128-byte limitation. I believe that the PASS is able to handle UDP header beyond 128-byte. Could you please inform us which training slide give you such an impression? We may need to clarify the content.
What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?[Eric] The PASS can recognize the ICMP packet by matching the IP proto field as ICMP (1), but it will not parse the ICMP header. It may not be a good idea to trigger another type of custom LUT2 lookup based on IP address and proto = ICMPsince there are so many different types of ICMP messages. It is possible to send the ICMP packets to eight different queues served by different cores by invoking the multi-route feature. Please refet to API Pa_configMultiRoute() and paRouteInfo_t for multi-route configurations.
Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.[Eric] Refer to the answers above.
Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?[Eric] No, all the tx channels and rx channels are linked to core-independent hardware modules and are not accessed by the host software (application) directly. The host software just needs to enable all PASS tx and rx channels at system startup.
Look at page 33 of the keystone architecture online training. It mentions the 128 byte limitation.
I am only interested in the destination unreachable type as individual cores when TXing to the outside may receive back a destination unreachable so making use of the port # the ICMP message can be directed to the core that intitiated the UDP packet that resulted in the ICMP, minimizing extraneous processing by the other cores. When you say the PASS will not parse the ICMP message can I not make use of the same process like is being done for UDP packets? Something similar to the way linux packet filters work.
okay thanks.
Hi, Aamar: Please read my answers below: I did have a look at the training material. It was not accessible till today as the external link was broken. Going through the PASS training it seems that the PASS can only do udp port filtering if it is within the first 128 bytes of the packet. What happens if there is sufficient number of optional headers in the UDP ipv4 packet and the UDP information is not within the first 128 bytes of the ethernet packet or is this 128 bytes of the start of the UDP? [Eric] I am not aware of the 128-byte limitation. I believe that the PASS is able to handle UDP header beyond 128-byte. Could you please inform us which training slide give you such an impression? We may need to clarify the content. Look at page 33 of the keystone architecture online training. It mentions the 128 byte limitation. [Eric] Thanks for pointing this out! No, there is no such limitation. We will correct the training material accordingly. What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?[Eric] The PASS can recognize the ICMP packet by matching the IP proto field as ICMP (1), but it will not parse the ICMP header. It may not be a good idea to trigger another type of custom LUT2 lookup based on IP address and proto = ICMPsince there are so many different types of ICMP messages. It is possible to send the ICMP packets to eight different queues served by different cores by invoking the multi-route feature. Please refet to API Pa_configMultiRoute() and paRouteInfo_t for multi-route configurations. I am only interested in the destination unreachable type as individual cores when TXing to the outside may receive back a destination unreachable so making use of the port # the ICMP message can be directed to the core that intitiated the UDP packet that resulted in the ICMP, minimizing extraneous processing by the other cores. When you say the PASS will not parse the ICMP message can I not make use of the same process like is being done for UDP packets? Something similar to the way linux packet filters work. [Eric] Yes, you may be able to do this if the IP header size is fixed, in other words, the offset from the ICMP header to the UDP port number is fixed. In this case, you can configure another types of custom LUT2 look . Please note that this approach will not work if you expect to receive any other types of ICMP packets. Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.[Eric] Refer to the answers above. Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?[Eric] No, all the tx channels and rx channels are linked to core-independent hardware modules and are not accessed by the host software (application) directly. The host software just needs to enable all PASS tx and rx channels at system startup. okay thanks. Best regards, Eric Quote
[Eric] Thanks for pointing this out! No, there is no such limitation. We will correct the training material accordingly.
[Eric] Yes, you may be able to do this if the IP header size is fixed, in other words, the offset from the ICMP header to the UDP port number is fixed. In this case, you can configure another types of custom LUT2 look . Please note that this approach will not work if you expect to receive any other types of ICMP packets.
Eric
I am not sure then whether this will work as for
1) IPv4 UDP packets with optional headers prior to the UDP header and
2) IPV4 ICMP packets with optional headers prior to the ICMP header followed by an embedded UDP IPV4 packet with optional headers prior to the UDP header,
the length from the start of the IPv4 packet to the start of the UDP or ICMP header is not known for both types above and also the length from the start of the ICMP header to the start of the UDP header is not known for type 2 but needs to be extracted from the header length fields in the IPV4 or embedded IPV4 packet header.
The impression I get from this conversation is that the PASS is able to point to the UDP header by looking at the optional header length in byte one of the IPv4 packet but is not able to look through the same byte in the embedded IPv4 packet hence routing packets to cores based on port #s cannot work if the ICMP packet has optional headers as the PASS has no way of knowing the start of the UDP within the embedded IPv4 packet. Am I correct?
[Eric] Yes, you are correct. The PASS does not parse the ICMP packet header. It provides header parsing and classification for the following protocol header only: MAC (DIX and 802.3), IPv4, IPv6, UDP, TCP. There are some limited support for SCTP and GTP-U. For all other protocols over MAC or IP, the packets should be routed to the DSP core for full Network stack processing.
The PASS does not replace the general-purpose network stack. It provides the fast data paths for dedicated data streams over UDP/TCP with or without IPSEC tunnels. It is in particular useful when the types of data traffic entering NetCP are limited.
That is too bad. Is this just a PDSP software limitation? Is it possible to have this capability added in the future if it just a software issue?
This is a long-shot idea but is it possible to have the output of the PDSP3 with custom LUT2 for ICMP packets get re-fed back to the first PDSP to parse the embedded UDP packet. This could work if the output of the PDSP3 could strip of upto the ICMPv4 header.
Aamir
Can you describe your application in a little more deatils? What kind of packets will enter NetCP? What does the network stack do at each DSP core? What do you expect the NetCP to do? It seems to me that you just want the NetCP to deliver the incoming packets to the required DSP core based on the UDP number and the network processor of each core will process the packet as if it is received from the EMAC. Is it true?
We may be able to figure out better ways to configure the NetCP to take the full advantages of it if we can understand your use case better.
Eric,
The ethernet packets that enter the NETCP are UDP (for RTP traffic) and ICMP packets. The ICMP scenario is the result of an RTP port being put on a specific core and txmitng rtp data to a destination port via UDP ethernet packets. If the destination is not reachable then it responds with the ICMP which the NETCP should try and redirect to the core that sent the original UDP packet for further processing and stopping the txmit. We could of course have all ICMP packets go to one core and have them redirected to the appropriate cores but the more elegant way would be to only have the appropriate packets relevant to a core get to that core. The cores then handle the processing of the IP, UDP and subsequently RTP headers for media processing. In addition we will have commands and other media messages being sent over ethernet as UDP packets on other specific ports. All the incoming packets could then be placed in a large buffer specific to each core in external DDR3 memory and then copied a packet at a time to each cores L2 memory for individual processing on each core. Does this provide you with more information?
Aamar:
If the only type of ICMP packets which enter NetCP are message type 3 (Destination Unreachable) and a message code 3 (Port Unreachable), the ICMP packet will have the same format and it contains the first 64 byte of egress packets. In this case, should the offsets from the ICMP header to the source port and destination port be fixed?Therefore, you may define another type of CustomLut2 to route the ICMP traffic to the specified DSP core based on the UDP port number.
Another solution is to broadcast (multi-route) the ICMP traffics to all DSP cores, but only the desired DSP core will handle the packets. Please note that you can dsitinguish the types (UDP/ICMP) of packets by using different swInfo0 which will be placed at the CPPI descriptor.
Besides, you may be able to simplify the application for UDP packet processing with NetCP since the received packet descriptor conatin the offset information to IP and UDP header.
The problem is the ICMP header is followed by parts of the IPv4 packet which may or may not contain optional headers prior to the UDP header so there is no way of knowing the length to skip to get to the UDP header unless one examines the IPv4 header and the subsequent optional headers. A gateway could add these optional headers as it traverses over the ethernet.
What do you mean by the different swInfo0 ? I understand it to mean simplify processing at the core as both UDP and ICMP will use the same Rx queue for each core.
Actually our intent is to have a board with multiple 6678 processors in which we could share an IP address across multiple cc6678 so use of the port # to filter incoming packets is quite beneficial as without it Shannon DSPs would receive ICMP messages for other DSPs not just other corepacs without the current Shannon DSP. In any case this is what I envision. Have a regular LUT1 rule which redirects packets to continue parsing and be processed through 8 custom LUT2 rules for port based UDP matching to 8 UDP Rx queues. Each one with descriptors in a certain region of DDR3. Also have the PASS setup with one custom LUT2 rule to redirect all type 3 ICMP packets to a separate Rx queue from the UDP Rx queues (one for each core) which refers to a region in DDR and have that serviced by a specific core (say 0) that can decide which core is this ICMP packet for and then redirect by Txing it to the appropriate UDP Rx queue by making use of the infrastructure pktdma or discarding this incoming packet if is not for any of the cores. The UDP Rx queue i.e. memory region for each core will then only contain packets for that core. I could also have the ICMP multiroute to 8 different queues one for each core and have a particular core disregard the messages if it is not directed towards it.
Having the offset information to the UDP or ICMP header will be good to have.
[Eric] I understand now. The offset to the UDP port is not fixed since the egress paclet may be updated by the network before it reaches the node where the ICMP packet is generated.
[Eric] Yes, we can recognize the types of packets even they share the same destination queue if we specify unique swInfo0 word at the paRouteInfo_t data structure for each packet type.
[Eric] Yes, both methods will work.