This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

IPv4 udp packets to multicores through PASS

Other Parts Discussed in Thread: TEST2

I am wondering if there is any way to program the packet accelerator or some other subsystem to be able to have all 8 cores on a c6678 share one IP v4 address and have a range of UDP ports for each core so that the PASS system basically redirects packets based on certain port #s to specific cores as opposed to all packets going to a common memory space and one of the cores being tasked with getting the packets to the specific cores to do further processing. I am not interested in just having one UDP port # mapped to one core but a range of port #s in the hundreds map to each core. I know one can program using custom L4 on a particular port in the PASS.

We currently are able to direct on our proprietary system packets with multiple DSPs all sharing a single IPv4 address and making use of a range of UDP port numbers to redirect traffic to individual DSPs thereby not having to have multiple IP addresses for the multiple DSPs.

Thanks, Aamir

  • Hi, Aamir:

    The PASS system is designed to support your use case.

    We can configure the PASS with single MAC address, single IP address and multiple UDP ports which direct packet traffic to different queues served by different cores.

    Please note that multiple UDP ports can share the same destination.

    If you would like to have a single LUT2 entry per core then you can use custom LUT2.

    Please see a simple example at ti\drv\pa\example\multicoreExample.

    Best regards,

    Eric Ruei

     

     

     

     

     

  • Hi Eric,

     

    I had a look through the multicore example you listed.

     

    I am unsure what you mean by “If you want a single LUT2 entry per core then you can use custom LUT2”. Am I correct in understanding that this single entry per core will handle a range of UDP port #s and that shall be done using custom L2 and so I would need to modify how I call the Pa_addCustomLUT2 LLD function and the response sent back from the driver to the host application will then be forwarded on through the PKTDMA to setup the PA? Also I will not need to modify the actual PA firmware that runs on the PDSPs and that the firmware on the PDSPs dealing with custom LUT2 can handle my requirements of having ports 10000-11023 say go to core 0, 11024-12047 go to core 1 etc? What is the difference between the Pa_setCustomLUT2 and the addCustomLUT2 functions? Is there any example of this custom LUT2 in any of the examples listed that come with the MCSDK or can you provide a rough example for me to better understand the LLD.

     

    If I want to be able to have 1024 ports assigned to each core so does that mean I need to have 1024 entries in the LUT2 table normally? This would mean for 8 cores I will be exhausting the 8192 LUT2 entries leaving no room for expansion if I wanted more than 1024 ports per core. Does this also mean I need to have 1024 Pa_addPort commands issued by each core if not doing custom LUT2 and the efficient way of setting up the LUT2 table is through the use of custom LUT2?

     

    The PA user guide talks of 9 tx and 24 Rx channels. I am assuming these are just the number of channels available just for the PKTDMA dealing with the PA? So with 8 cores receiving Ethernet traffic from the PA, would one Rx channel per core be sufficient?

     

    I have a lot of questions about the keystone architecture as it is a big departure from the c6414/6 processor - can I take them offline with you or someone else?

  • Hi, Aamir:

    The customLUT2 feature is the best option to route a group of UDP ports to the desired DSP core if you can control the range of UDP ports.

    For example:

    Core 0: 0x8000 - 0x83FF

    Core 1: 0x8400 - 0x87FF

    Core 2: 0x8800 - 0x8BFF

    Core 3: 0x8C00 - 0x8FFF

    Core 4: 0x9000 - 0x93FF

    Core 5: 0x9400 - 0x97FF

    Core 6: 0x9800 - 0x9BFF

    Core 7: 0x9C00 - 0x9FFF

    Call Pa_setCustomLUT2() to setup one Custom Lookup rule. Here is the recommended configuration parameters::

    custIndex = 0 // It is the first custom LUT1 group

    handleLink = TRUE // The custom lookup should be linked to the previous IP matching

    byteOffsets[4] = {2, 3, 4, 5} // Point to destination port number

    byteMasks[4] = {0xFC, 0x00, 0, 0}

    setMask = 0 // not used

    Call Pa_addMAC() to add the MAC entry

    Call Pa_addIp() to add the IP entry which should be linked to the MAC entry

    Where paRouteInfo_t should be set as the followings:

    dest = pa_DEST_CONTINUE_PARSE_LUT2

    customType =  pa_CUSTOM_TYPE_LUT2

    customIndex = 0

    Note: set paIpInfo_t.proto to 17 (UDP)

    Call Pa_addCustomLUT2() to add one entry for each core. All the entries are linked to the single IP entry.

    custIndex = 0

    match[4] = {0x80, 0, 0,0} // core 0

                        {0x84, 0, 0,0 } // core 1

                         ....

    fReplace = FALSE

    There are only eight entries are used at the LUT2.

    The drawback is that the PASS will not perform UDP checksum verification since it treats the UDP header as a custom header.

    You can use this mechanism to add 8 entries for TCP over IP.

    Yes, the other choice is to call Pa_addPort to add 1024 entries per DSP core.

    Regarding the PA CPPI channels, the 9 tx channels are connected to 9 transmit queues respectively.

    The receive channels will be used to forward packets from PASS to any queues.

    You just need to enable them all.

        /* Open all CPPI Tx Channels. These will be used to send data to PASS/CPSW */            
        for (i = 0; i < NUM_PA_TX_QUEUES; i ++)
        {
            txChCfg.channelNum      =   i;       /* CPPI channels are mapped one-one to the PA Tx queues */
            txChCfg.txEnable        =   Cppi_ChState_CHANNEL_DISABLE;  /* Disable the channel for now. */
            txChCfg.filterEPIB      =   0;
            txChCfg.filterPS        =   0;
            txChCfg.aifMonoMode     =   0;
            txChCfg.priority        =   2;
            if ((gCpdmaTxChanHnd[i] = Cppi_txChannelOpen (gCpdmaHnd, &txChCfg, &isAllocated)) == NULL)
            {
                System_printf ("Error opening Tx channel %d\n", txChCfg.channelNum);
                return -1;
            }

            Cppi_channelEnable (gCpdmaTxChanHnd[i]);
        }

        /* Open all CPPI Rx channels. These will be used by PA to stream data out. */
        for (i = 0; i < NUM_PA_RX_CHANNELS; i++)
        {
            /* Open a CPPI Rx channel that will be used by PA to stream data out. */
            rxChInitCfg.channelNum  =   i;
            rxChInitCfg.rxEnable    =   Cppi_ChState_CHANNEL_DISABLE;
            if ((gCpdmaRxChanHnd[i] = Cppi_rxChannelOpen (gCpdmaHnd, &rxChInitCfg, &isAllocated)) == NULL)
            {
                System_printf ("Error opening Rx channel: %d \n", rxChInitCfg.channelNum);
                return -1;
            }

            /* Also enable Rx Channel */
            Cppi_channelEnable (gCpdmaRxChanHnd[i]);   
        }

    NUM_PA_TX_QUEUES = 9

    NUM_PA_RX_CHANNELS = 24

    You can find some training materials at the website link:

    http://focus.ti.com/docs/training/catalog/events/event.jhtml?sku=OLT110027

    Best regards,

    Eric Ruei

     

     

  • Hi Eric,

    Thanks very much for your reply.

    I did have a look at the training material. It was not accessible till today as the external link was broken. Going through the PASS training it seems that the PASS can only do udp port filtering if it is within the first 128 bytes of the packet. What happens if there is sufficient number of optional headers in the UDP ipv4 packet and the UDP information is not within the first 128 bytes of the ethernet packet or is this 128 bytes of the start of the UDP?

    What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?

    Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.

    Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?

  • Hi, Aamar:

    Please read my answers below:

    I did have a look at the training material. It was not accessible till today as the external link was broken. Going through the PASS training it seems that the PASS can only do udp port filtering if it is within the first 128 bytes of the packet. What happens if there is sufficient number of optional headers in the UDP ipv4 packet and the UDP information is not within the first 128 bytes of the ethernet packet or is this 128 bytes of the start of the UDP?

    [Eric] I am not aware of the 128-byte limitation. I believe that the PASS is able to handle UDP header beyond 128-byte. Could you please inform us which training slide give you such an impression? We may need to clarify the content.

    What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?
    [Eric] The PASS can recognize the ICMP packet by matching the IP proto field as ICMP (1), but it will not parse the ICMP header.
           It may not be a good idea to trigger another type of custom LUT2 lookup based on IP address and proto = ICMPsince there are so many different types of ICMP messages.
           It is possible to send the ICMP packets to eight different queues served by different cores by invoking the multi-route feature.
           Please refet to API Pa_configMultiRoute() and paRouteInfo_t for multi-route configurations.        

    Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.
    [Eric] Refer to the answers above.

    Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?
    [Eric] No, all the tx channels and rx channels are linked to core-independent hardware modules and are not accessed by the host software (application) directly. The host software just needs to enable all PASS tx and rx channels at system startup.

     

  • Hi, Aamar:

    Please read my answers below:

    I did have a look at the training material. It was not accessible till today as the external link was broken. Going through the PASS training it seems that the PASS can only do udp port filtering if it is within the first 128 bytes of the packet. What happens if there is sufficient number of optional headers in the UDP ipv4 packet and the UDP information is not within the first 128 bytes of the ethernet packet or is this 128 bytes of the start of the UDP?

    [Eric] I am not aware of the 128-byte limitation. I believe that the PASS is able to handle UDP header beyond 128-byte. Could you please inform us which training slide give you such an impression? We may need to clarify the content.

    Look at page 33 of the keystone architecture online training. It mentions the 128 byte limitation.

    What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?
    [Eric] The PASS can recognize the ICMP packet by matching the IP proto field as ICMP (1), but it will not parse the ICMP header.
           It may not be a good idea to trigger another type of custom LUT2 lookup based on IP address and proto = ICMPsince there are so many different types of ICMP messages.
           It is possible to send the ICMP packets to eight different queues served by different cores by invoking the multi-route feature.
           Please refet to API Pa_configMultiRoute() and paRouteInfo_t for multi-route configurations.   

    I am only interested in the destination unreachable type as individual cores when TXing to the outside may receive back a destination unreachable so making use of the port # the ICMP message can be directed to the core that intitiated the UDP packet that resulted in the ICMP, minimizing extraneous processing by the other cores.  When you say the PASS will not parse the ICMP message can I not make use of the same process like is being done for UDP packets?  Something similar to the way linux packet filters work.     

    Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.
    [Eric] Refer to the answers above.

    Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?
    [Eric] No, all the tx channels and rx channels are linked to core-independent hardware modules and are not accessed by the host software (application) directly. The host software just needs to enable all PASS tx and rx channels at system startup.

    okay thanks.

  •  

    Hi, Aamar:

    Please read my answers below:

    I did have a look at the training material. It was not accessible till today as the external link was broken. Going through the PASS training it seems that the PASS can only do udp port filtering if it is within the first 128 bytes of the packet. What happens if there is sufficient number of optional headers in the UDP ipv4 packet and the UDP information is not within the first 128 bytes of the ethernet packet or is this 128 bytes of the start of the UDP?

    [Eric] I am not aware of the 128-byte limitation. I believe that the PASS is able to handle UDP header beyond 128-byte. Could you please inform us which training slide give you such an impression? We may need to clarify the content.

    Look at page 33 of the keystone architecture online training. It mentions the 128 byte limitation.

    [Eric] Thanks for pointing this out!  No, there is no such limitation. We will correct the training material accordingly.

    What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?
    [Eric] The PASS can recognize the ICMP packet by matching the IP proto field as ICMP (1), but it will not parse the ICMP header.
           It may not be a good idea to trigger another type of custom LUT2 lookup based on IP address and proto = ICMPsince there are so many different types of ICMP messages.
           It is possible to send the ICMP packets to eight different queues served by different cores by invoking the multi-route feature.
           Please refet to API Pa_configMultiRoute() and paRouteInfo_t for multi-route configurations.   

    I am only interested in the destination unreachable type as individual cores when TXing to the outside may receive back a destination unreachable so making use of the port # the ICMP message can be directed to the core that intitiated the UDP packet that resulted in the ICMP, minimizing extraneous processing by the other cores.  When you say the PASS will not parse the ICMP message can I not make use of the same process like is being done for UDP packets?  Something similar to the way linux packet filters work.  

    [Eric] Yes, you may be able to do this if the IP header size is fixed, in other words, the offset from the ICMP header to the UDP port number is fixed. In this case, you can configure another types of custom LUT2 look . Please note that this approach will not work if you expect to receive any other types of ICMP packets

    Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.
    [Eric] Refer to the answers above.

    Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?
    [Eric] No, all the tx channels and rx channels are linked to core-independent hardware modules and are not accessed by the host software (application) directly. The host software just needs to enable all PASS tx and rx channels at system startup.

    okay thanks.

    Best regards,

    Eric

    Quote

  • Hi, Aamar:

    Please read my answers below:

    I did have a look at the training material. It was not accessible till today as the external link was broken. Going through the PASS training it seems that the PASS can only do udp port filtering if it is within the first 128 bytes of the packet. What happens if there is sufficient number of optional headers in the UDP ipv4 packet and the UDP information is not within the first 128 bytes of the ethernet packet or is this 128 bytes of the start of the UDP?

    [Eric] I am not aware of the 128-byte limitation. I believe that the PASS is able to handle UDP header beyond 128-byte. Could you please inform us which training slide give you such an impression? We may need to clarify the content.

    Look at page 33 of the keystone architecture online training. It mentions the 128 byte limitation.

    [Eric] Thanks for pointing this out!  No, there is no such limitation. We will correct the training material accordingly.

    What about ICMP handling? Can I have Ipv4 ICMP packets received at the EMAC redirected to different cores by looking at the UDP port #s in the embedded UDP packets in the ICMP packet as long as it fits within the 128 byte requirement by modifying the byteOffset, byteMask and paIpInfo_t.proto?
    [Eric] The PASS can recognize the ICMP packet by matching the IP proto field as ICMP (1), but it will not parse the ICMP header.
           It may not be a good idea to trigger another type of custom LUT2 lookup based on IP address and proto = ICMPsince there are so many different types of ICMP messages.
           It is possible to send the ICMP packets to eight different queues served by different cores by invoking the multi-route feature.
           Please refet to API Pa_configMultiRoute() and paRouteInfo_t for multi-route configurations.   

    I am only interested in the destination unreachable type as individual cores when TXing to the outside may receive back a destination unreachable so making use of the port # the ICMP message can be directed to the core that intitiated the UDP packet that resulted in the ICMP, minimizing extraneous processing by the other cores.  When you say the PASS will not parse the ICMP message can I not make use of the same process like is being done for UDP packets?  Something similar to the way linux packet filters work.  

    [Eric] Yes, you may be able to do this if the IP header size is fixed, in other words, the offset from the ICMP header to the UDP port number is fixed. In this case, you can configure another types of custom LUT2 look . Please note that this approach will not work if you expect to receive any other types of ICMP packets

    I am not sure then whether this will work as for

    1) IPv4 UDP packets with optional headers prior to the UDP header and

    2) IPV4 ICMP packets with optional headers prior to the ICMP header followed by an embedded UDP IPV4 packet with optional headers prior to the UDP header,

    the length from the start of the IPv4 packet to the start of the UDP or ICMP header is not known for both types above and also the length from the start of the ICMP header to the start of the UDP header is not known for type 2 but needs to be extracted from the header length fields in the IPV4 or embedded IPV4 packet header.

    The impression I get from this conversation is that the PASS is able to point to the UDP header by looking at the optional header length in byte one of the IPv4 packet but is not able to look through the same byte in the embedded IPv4 packet hence routing packets to cores based on port #s cannot work if the ICMP packet has optional headers as the PASS has no way of knowing the start of the UDP within the embedded IPv4 packet. Am I correct?

    Extending this to IPv6 it would seem that with optional headers it becomes more likely that 128 bytes may not be sufficient and then each core may need to have its own IPv6 address especially when dealing with many optional headers with ICMPv6 or IPv6 for that matter.
    [Eric] Refer to the answers above.

    Regarding the 9 Tx channels, should each of the cores have one channel statically assigned to them? While the queues for each core use the receive channels dynamically ?
    [Eric] No, all the tx channels and rx channels are linked to core-independent hardware modules and are not accessed by the host software (application) directly. The host software just needs to enable all PASS tx and rx channels at system startup.

    okay thanks.

    Best regards,

    Eric

  • Hi, Aamar:

    I am not sure then whether this will work as for

    1) IPv4 UDP packets with optional headers prior to the UDP header and

    2) IPV4 ICMP packets with optional headers prior to the ICMP header followed by an embedded UDP IPV4 packet with optional headers prior to the UDP header,

    the length from the start of the IPv4 packet to the start of the UDP or ICMP header is not known for both types above and also the length from the start of the ICMP header to the start of the UDP header is not known for type 2 but needs to be extracted from the header length fields in the IPV4 or embedded IPV4 packet header.

    The impression I get from this conversation is that the PASS is able to point to the UDP header by looking at the optional header length in byte one of the IPv4 packet but is not able to look through the same byte in the embedded IPv4 packet hence routing packets to cores based on port #s cannot work if the ICMP packet has optional headers as the PASS has no way of knowing the start of the UDP within the embedded IPv4 packet. Am I correct?

    [Eric] Yes, you are correct. The PASS does not parse the ICMP packet header. It provides header parsing and classification for the following protocol header only: MAC (DIX and 802.3), IPv4, IPv6, UDP, TCP. There are some limited support for SCTP and GTP-U. For all other protocols over MAC or IP, the packets should be routed to the DSP core for full Network stack processing.

    The PASS does not replace the general-purpose network stack. It provides the fast data paths for dedicated data streams over UDP/TCP with or without IPSEC tunnels. It is in particular useful when the types of data traffic entering NetCP are limited.

    Best regards,

    Eric

     

  • Hi Eric,

    That is too bad. Is this just a PDSP software limitation? Is it possible to have this capability added in the future if it just a software issue?

    This is a long-shot idea but is it possible to have the output of the PDSP3 with custom LUT2 for ICMP packets get re-fed back to the first  PDSP to parse the embedded UDP packet. This could work if the output of the PDSP3 could strip of upto the ICMPv4 header.

    Aamir

  • Hi, Aamar:

    Can you describe your application in a little more deatils? What kind of packets will enter NetCP? What does the network stack do at each DSP core? What do you expect the NetCP to do? It seems to me that you just want the NetCP to deliver the incoming packets to the required DSP core based on the UDP number and the network processor of each core will process the packet as if it is received from the EMAC. Is it true?

    We may be able to figure out better ways to configure the NetCP to take the full advantages of it if we can understand your use case better.

    Best regards,

    Eric

     

     

     

     

  • Eric,

    The ethernet packets that enter the NETCP are UDP (for RTP traffic) and ICMP packets. The ICMP scenario is the result of an RTP port being put on a specific core and txmitng rtp data to a destination port via UDP ethernet packets. If the destination is not reachable then it responds with the ICMP which the NETCP should try and redirect to the core that sent the original UDP packet for further processing and stopping the txmit. We could of course have all ICMP packets go to one core and have them redirected to the appropriate cores but the more elegant way would be to only have the appropriate packets relevant to a core get to that core. The cores then handle the processing of the IP, UDP and subsequently RTP headers for media processing. In addition we will have commands and other media messages being sent over ethernet as UDP packets on other specific ports. All the incoming packets could then be placed in a large buffer specific to each core in external DDR3 memory and then copied a packet at a time to each cores L2 memory for individual processing on each core. Does this provide you with more information?

    Aamir

  • Aamar:

    If the only type of ICMP packets which enter NetCP are message type 3 (Destination Unreachable) and a message code 3 (Port Unreachable), the ICMP packet will have the same format and it contains the first 64 byte of egress packets. In this case, should the offsets from the ICMP header to the source port and destination port be fixed?Therefore, you may define another type of CustomLut2 to route the ICMP traffic to the specified DSP core based on the UDP port number. 

    Another solution is to broadcast (multi-route) the ICMP traffics to all DSP cores, but only the desired DSP core will handle the packets. Please note that you can dsitinguish the types (UDP/ICMP) of packets by using different swInfo0 which will be placed at the CPPI descriptor.

    Besides, you may be able to simplify the application for UDP packet processing with NetCP since the received packet descriptor conatin the offset information to IP and UDP header.

    Best regards,

    Eric

     

     

  • Eric,

    The problem is the ICMP header is followed by parts of the IPv4 packet which may or may not contain optional headers prior to the UDP header so there is no way of knowing the length to skip to get to the UDP header unless one examines the IPv4 header and the subsequent optional headers. A gateway could add these optional headers as it traverses over the ethernet.

    What do you mean by the different swInfo0 ? I understand it to mean simplify processing at the core as both UDP and ICMP will use the same Rx queue for each core.

    Actually our intent is to have a board with multiple 6678 processors in which we could share an IP address across multiple cc6678 so use of the port # to filter incoming packets is quite beneficial as without it Shannon DSPs would receive ICMP messages for other DSPs not just other corepacs without the current Shannon DSP. In any case this is what I envision. Have a regular LUT1 rule which redirects packets to continue parsing and be processed through 8 custom LUT2 rules for port based UDP matching to 8 UDP Rx queues. Each one with descriptors in a certain region of DDR3. Also have the PASS setup with one custom LUT2 rule to redirect all type 3 ICMP packets to a separate Rx queue from the UDP Rx queues (one for each core) which refers to a region in DDR and have that serviced by a specific core (say 0) that can decide which core is this ICMP packet for and then redirect by Txing it to the appropriate UDP Rx queue by making use of the infrastructure pktdma or discarding this incoming packet if is not for any of the cores. The UDP Rx queue i.e. memory region for each core will then only contain packets for that core. I could also have the ICMP multiroute to 8 different queues one for each core and have a particular core disregard the messages if it is not directed towards it.

    Having the offset information to the UDP or ICMP header will be good to have.

    Thanks, Aamir

  • Hi, Aamir:

    The problem is the ICMP header is followed by parts of the IPv4 packet which may or may not contain optional headers prior to the UDP header so there is no way of knowing the length to skip to get to the UDP header unless one examines the IPv4 header and the subsequent optional headers. A gateway could add these optional headers as it traverses over the ethernet.

    [Eric] I understand now. The offset to the UDP port is not fixed since the egress paclet may be updated by the network before it reaches the node where the ICMP packet is generated.

    What do you mean by the different swInfo0 ? I understand it to mean simplify processing at the core as both UDP and ICMP will use the same Rx queue for each core.

    [Eric] Yes, we can recognize the types of packets even they share the same destination queue if we specify unique swInfo0 word at the paRouteInfo_t data structure for each packet type.

    Actually our intent is to have a board with multiple 6678 processors in which we could share an IP address across multiple cc6678 so use of the port # to filter incoming packets is quite beneficial as without it Shannon DSPs would receive ICMP messages for other DSPs not just other corepacs without the current Shannon DSP. In any case this is what I envision. Have a regular LUT1 rule which redirects packets to continue parsing and be processed through 8 custom LUT2 rules for port based UDP matching to 8 UDP Rx queues. Each one with descriptors in a certain region of DDR3. Also have the PASS setup with one custom LUT2 rule to redirect all type 3 ICMP packets to a separate Rx queue from the UDP Rx queues (one for each core) which refers to a region in DDR and have that serviced by a specific core (say 0) that can decide which core is this ICMP packet for and then redirect by Txing it to the appropriate UDP Rx queue by making use of the infrastructure pktdma or discarding this incoming packet if is not for any of the cores. The UDP Rx queue i.e. memory region for each core will then only contain packets for that core. I could also have the ICMP multiroute to 8 different queues one for each core and have a particular core disregard the messages if it is not directed towards it.

    [Eric] Yes, both methods will work.

    Best regards,

    Eric

     

  • Eric,

    I had a followup question in this thread. Can you please confirm the following?

    For UDP packets, the LUT1 rule does the IP check and UDP type check if it matches it then continues parsing to first entry in the custom LUT2 . Does it then go through each of the 8 custom LUT2 rules that I define sequentially till it finds a match or is it able to look at all of the rules in parallel. Can it also just go through a subset of rules in the LUT2. I ask this because if somehow I could quarantee ICMP packets to not have optional headers then I could have another LUT1 rule for ICMP proto type and IP address and another 8 LUT2 rules for looking at the UDP port # for each core but I do not want these rules to be used if it is a UDP packet.

    Aamir

  • Hi, Aamir:

    I would like to clarify the terminology.

    In the current scenario, there is one LUT1 IP entry, one LUT2 custom rule (type) and eight LUT2 entries.

    LUT1 entry: (IP with Protocol equal to IP), if matched, continue parsing with Custom LUT2 type 0.

    LUT2 entry: Each one will serve a range of UDP ports and it is linked to the IP entry. If matched, deliver packet to the specified host queue.

    To handle the ICMP packets, you need to add another LUT2 IP entry, another LUT2 custom rule (type) and eight LUT2 entries.

    LUT1 entry: (IP with Protocol equal to ICMP), if matched, continue parsing with Custom LUT2 type1.

    LUT2 entry: Each one will serve a range of UDP ports and it is linked to the IP entry. If matched, deliver packet to the specified host queue.

    The LUT2 search is a binary search based on the 32-bit input which consists of the previous IP link, the custom Type number and the masked UDP port number.

    Best regards,

    Eric Ruei

     

     

  • Eric,

    Thanks for the clarification.

    If one is to support IPv4 and IPv6 packets coming in then can you clarify how the the rules versus entries work for all levels in the PDSPs/LUTs like you have done above. I could speculate on what I think it is but would appreciate your response.

    Thanks, Aamir

  • Hi, Aamir:

    For both IPv4 and IPv6 packets, the PASS processing engine will point to the beginning of the next protocol header such as UDP or ICMPafter the IP classification is complete.Therfore, the same custom LUT2 type can be used for UDP classification.

    I believe that you will define two LUT2 custom types,one for UDP over IP and another for UDP over ICMP.

    Best regards,

    Eric

     

     

     

     

  • Eric,

    Thanks for the reply. It is true that the PASS will point to the beginning of the next protocol header after the IP classification is complete but for the case of ICMP, the length of the embedded packet will be different as they could be either IPv4 or Ipv6 and so the LUT2 custom rule (type) for ICMP v4 will be different from the LUT2 custom rule (type) for IPv6 even though in all likelihood the 8 custom LUT2 entries i.e. port byte matching for ICMP will be the same for both IPv4 and IPv6. This is of course is only applicable for the case where there are no optional headers in the embedded IPv4 or IPv6 packet within the ICMP packets.

    We talked earlier about being able to filter on ports with a UDP header for Ipv4. I may misunderstand how the PA really works but it appears to me, based on the custom LUT1 example code in the pdk directory etc, that the PA may not be able to filter on L4 proto type=UDP for IPv6 if there are additional L3 optional headers. In other words, can the PA continue to skip over optional L3 headers and get to the L4 header? For example, one could have an IPv6 packet with an optional L3 header followed by an L4 UDP header and so the proto check will fail as it is not UDP but an L3 optional header and only within the L3 optional header is the next header field UDP. Also, they could be multiple optional L3 headers. For Ipv4 it is not a problem as the L4 header type is specified in the protocol field within the IPv4 header but for IPv6 that is not the case. Although for Ipv4 too what may also not be known is the amount of bytes that one has to skip to get to the UDP header as that is contained within the header length byte in the IPv4 header. Can you confirm one way or the other?

    If the UDP based filtering can work for IPv6 then I envision something along the line of:

    1.      Check the MAC address using PDSP0 from entry 0 in LUT1. If matched continue parsing.

    2.      Check the IP address using PDSP1 from entry 1 in LUT1. Also does check to ensure that ipProto is UDP irrespective of IPv4 or Ipv6. If matched continue parsing with custom LUT2 type 0 rule using 2 bytes for destination port filtering mask and specifying the two offsets to those bytes. Previous handle linking is true so uses the 4th byte leaving one unused matching byte.

    3.      8 LUT2 entries – each one a range of UDP ports per core linked to item 2 above. If matched send to appropriate core queue number. If not matched then discard.

    4.      Check the IP address using PDSP1 from entry 2 in LUT1. Also does check to ensure that ipProto is ICMP and packet type is IPv4. If matched continue parsing with custom LUT2 type 1 rule using one byte for matching header length in embedded IPv4 (if no optional headers than header length should be 5) and 2 bytes for source port filtering mask. Previous handle linking is true so uses the 4th and last byte.

    5.      8 LUT2 entries – each one a range of UDP ports per core linked to item 4. This is the same port number ranges but I guess should have different 8 entries as they are tied previous stage. If matched send to the appropriate core queue number. If not matched, need to multi-route to all core queues as may have optional headers so up to DSP core to process packet further and check if it is for that core.

    6.      Check the IP address using PDSP1 from entry 3 in LUT1. Also does check to ensure that ipProto is ICMP and packet type is IPv6. If matched continue parsing with custom LUT2 type 2 rule using one byte for matching next header in embedded IPv6 (if no optional headers than next header should be UDP) and 2 bytes for source port filtering mask. Previous handle linking is true so uses the 4th and last byte.

    7.      8 LUT2 entries – each one a range of UDP ports per core linked to item 6. This is again the same port range as item 5 but should be different 8 entries as they are tied to previous stage. If matched send to the appropriate core queue number. If not matched, need to multi-route to all core queues as may have optional headers so up to DSP core to process packet further and check if it is for that core.

    NextRtFail is defined as “where to send a packet that matches here but fails the next parse level” in addCustomLUT1. So should set up the nextRtFail for multi-route i.e. all DSP core queues for both item 4 and 7 above.

    Please confirm that the Rx queue # that is used for the configuration slide 20 step 6 in the PA keystone training does not need to be the same as the Rx queue # in the actual receive i.e. step 5 in slide 21? Our intent is to have all the packets in a given time period for a particular core received and stored in external memory. The host can then grab them one by one into each core L2 and begin processing till they are no more packets to process in that time period.

    We could have packets arrive into a particular core via the infrastructure Pktdma (core-to-core), SRIO from another DSP or entity, or Ethernet from another DSP or entity, so can we use the same Rx queue # that is used for the Ethernet packets going from the PA to the host? That way the queue that we create for the Rx Free descriptor queue will at start-up contain the number of entries for the number of packets that we expect in a given configurable time period and the external memory buffer will have packets from another DSP or entity via core-to-core, SRIO or Ethernet transfer. I am to understand that from the descriptors we should be able to differentiate between these 3 methods of transfer?

    I remember you mentioning that a core can use any of the 9 PA channels for Tx. So does that mean that any of the 9 Tx queues will just make use of any available channels or are the channel #s tied to a particular queue? The reason why I ask is that in the keystone training for the multicore navigator slide 10 the queue #s are mapped to channel #s for the SRIO Tx which is the later option. On the transmit side, as we could transmit via different mechanisms(core-to-core, SRIO, PA), we would need to use different queues for each method and so would need to have different Tx free queues and so we would have to increase the number of buffers/descriptors to account for the multiple uses by the 3 Tx and 3 Tx free queues. Is my understanding correct?

    I can take these questions off line if need be to provide more clarity. I appreciate very much your help in answering my questions.

    Thanks, Aamir

  • Hi, Aamir:

    Please read my answers in line.

    Thanks for the reply. It is true that the PASS will point to the beginning of the next protocol header after the IP classification is complete but for the case of ICMP, the length of the embedded packet will be different as they could be either IPv4 or Ipv6 and so the LUT2 custom rule (type) for ICMP v4 will be different from the LUT2 custom rule (type) for IPv6 even though in all likelihood the 8 custom LUT2 entries i.e. port byte matching for ICMP will be the same for both IPv4 and IPv6. This is of course is only applicable for the case where there are no optional headers in the embedded IPv4 or IPv6 packet within the ICMP packets.

    [Eric] Yes, you are right!

    We talked earlier about being able to filter on ports with a UDP header for Ipv4. I may misunderstand how the PA really works but it appears to me, based on the custom LUT1 example code in the pdk directory etc, that the PA may not be able to filter on L4 proto type=UDP for IPv6 if there are additional L3 optional headers. In other words, can the PA continue to skip over optional L3 headers and get to the L4 header? For example, one could have an IPv6 packet with an optional L3 header followed by an L4 UDP header and so the proto check will fail as it is not UDP but an L3 optional header and only within the L3 optional header is the next header field UDP. Also, they could be multiple optional L3 headers. For Ipv4 it is not a problem as the L4 header type is specified in the protocol field within the IPv4 header but for IPv6 that is not the case. Although for Ipv4 too what may also not be known is the amount of bytes that one has to skip to get to the UDP header as that is contained within the header length byte in the IPv4 header. Can you confirm one way or the other?

    [Eric] The PDSP firmware handles the variable-size IPv4 header and IPv6 extension headers. The lookup is based on the L4 protocol type and the packet offset will point to the beginning of L4 header.

    If the UDP based filtering can work for IPv6 then I envision something along the line of:

    1.      Check the MAC address using PDSP0 from entry 0 in LUT1. If matched continue parsing.

    2.      Check the IP address using PDSP1 from entry 1 in LUT1. Also does check to ensure that ipProto is UDP irrespective of IPv4 or Ipv6. If matched continue parsing with custom LUT2 type 0 rule using 2 bytes for destination port filtering mask and specifying the two offsets to those bytes. Previous handle linking is true so uses the 4th byte leaving one unused matching byte.

    3.      8 LUT2 entries – each one a range of UDP ports per core linked to item 2 above. If matched send to appropriate core queue number. If not matched then discard.

    4.      Check the IP address using PDSP1 from entry 2 in LUT1. Also does check to ensure that ipProto is ICMP and packet type is IPv4. If matched continue parsing with custom LUT2 type 1 rule using one byte for matching header length in embedded IPv4 (if no optional headers than header length should be 5) and 2 bytes for source port filtering mask. Previous handle linking is true so uses the 4th and last byte.

    5.      8 LUT2 entries – each one a range of UDP ports per core linked to item 4. This is the same port number ranges but I guess should have different 8 entries as they are tied previous stage. If matched send to the appropriate core queue number. If not matched, need to multi-route to all core queues as may have optional headers so up to DSP core to process packet further and check if it is for that core.

    6.      Check the IP address using PDSP1 from entry 3 in LUT1. Also does check to ensure that ipProto is ICMP and packet type is IPv6. If matched continue parsing with custom LUT2 type 2 rule using one byte for matching next header in embedded IPv6 (if no optional headers than next header should be UDP) and 2 bytes for source port filtering mask. Previous handle linking is true so uses the 4th and last byte.

    7.      8 LUT2 entries – each one a range of UDP ports per core linked to item 6. This is again the same port range as item 5 but should be different 8 entries as they are tied to previous stage. If matched send to the appropriate core queue number. If not matched, need to multi-route to all core queues as may have optional headers so up to DSP core to process packet further and check if it is for that core.

    [Eric] Yes, the design is good and it should work.

    NextRtFail is defined as “where to send a packet that matches here but fails the next parse level” in addCustomLUT1. So should set up the nextRtFail for multi-route i.e. all DSP core queues for both item 4 and 7 above.

    [Eric] Yes.

    Please confirm that the Rx queue # that is used for the configuration slide 20 step 6 in the PA keystone training does not need to be the same as the Rx queue # in the actual receive i.e. step 5 in slide 21? Our intent is to have all the packets in a given time period for a particular core received and stored in external memory. The host can then grab them one by one into each core L2 and begin processing till they are no more packets to process in that time period.

    [Eric] Yes, they are two different queues. The first one is the command response queue and the second one is the destination queue of the matched packets

    We could have packets arrive into a particular core via the infrastructure Pktdma (core-to-core), SRIO from another DSP or entity, or Ethernet from another DSP or entity, so can we use the same Rx queue # that is used for the Ethernet packets going from the PA to the host? That way the queue that we create for the Rx Free descriptor queue will at start-up contain the number of entries for the number of packets that we expect in a given configurable time period and the external memory buffer will have packets from another DSP or entity via core-to-core, SRIO or Ethernet transfer. I am to understand that from the descriptors we should be able to differentiate between these 3 methods of transfer?

    [Eric] Yes, you can configure the routinginformation and distinguish the packet source based on the software Info word0 (softwareInfo0).

    I remember you mentioning that a core can use any of the 9 PA channels for Tx. So does that mean that any of the 9 Tx queues will just make use of any available channels or are the channel #s tied to a particular queue? The reason why I ask is that in the keystone training for the multicore navigator slide 10 the queue #s are mapped to channel #s for the SRIO Tx which is the later option. On the transmit side, as we could transmit via different mechanisms(core-to-core, SRIO, PA), we would need to use different queues for each method and so would need to have different Tx free queues and so we would have to increase the number of buffers/descriptors to account for the multiple uses by the 3 Tx and 3 Tx free queues. Is my understanding correct?

    [Eric] No, the 9 CPPI Tx channels are used by the 9 PA Tx queues respectively. The 24 PA CPPI Rx channels will be used by the PA receiving operation based on their availability. In general, you do not need to worry about the CPPI Tx/Rx channels, you just need to enable them all.

    I can take these questions off line if need be to provide more clarity. I appreciate very much your help in answering my questions.

    Thanks, Aamir

  • Eric,

    Thanks for your answers.

    I remember you mentioning that a core can use any of the 9 PA channels for Tx. So does that mean that any of the 9 Tx queues will just make use of any available channels or are the channel #s tied to a particular queue? The reason why I ask is that in the keystone training for the multicore navigator slide 10 the queue #s are mapped to channel #s for the SRIO Tx which is the later option. On the transmit side, as we could transmit via different mechanisms(core-to-core, SRIO, PA), we would need to use different queues for each method and so would need to have different Tx free queues and so we would have to increase the number of buffers/descriptors to account for the multiple uses by the 3 Tx and 3 Tx free queues. Is my understanding correct?

    [Eric] No, the 9 CPPI Tx channels are used by the 9 PA Tx queues respectively. The 24 PA CPPI Rx channels will be used by the PA receiving operation based on their availability. In general, you do not need to worry about the CPPI Tx/Rx channels, you just need to enable them all.

    Okay so the 9 channels use 9 TX queues respectively. You statement about just enabling all the channels is not fully comforting :-). In regards to the second part of my question above, given that a core can transmit using any of the three ways I mentioned, I am assuming a different queue number and queue is needed for each of the three methods for each of the cores? Also, can we do with one or do we need 3 TX free queues? So core 1 would use PA tx queue 1, SRIo tx queue 1 and core-to-core PKTDMA queue 2 and core 2 would use the corresponding queues 2 etc. They would all be grabbing a buffer and descriptor from the Tx FDQ (actually completion queue) that they share and once a packet has been txed it shall be returned back to the Tx FDQ for use by any of the 3 transmit procesures.

    A general question when one continues parsing from one L3 classify engine 0 i.e. PDSP1 to L3 classify engine 1 i.e. PDSP2. How exactly does that work for IPsec/tunneling which appears to have embedded IP header within an IP header. Is there any example code or document to describe how this works in the keystone PA?

    Thanks, Aamir

  • Hi, Aamir:

    I believe that there is some mis-understanding here. The 9 PA tx queues connect to 9 different end points respectively as described at the training slide "PA LLD: Rx Configuration". queue 640-645: PDSP0-PDSP5 within PASS; queue 646-647:SA (Security Accelerator);queue 648:GbE-SW. The master core should enable all TX channels and any core transmit packets to the GbE-SW by queueing the CPPI packet to queue 648. The PASS does not include the SRIO queue and the core-to-core PKTDMA queue. But I believe that those are common sources and can be used by any cores, too.

    Here is the rx flow for IPSEC ESP tunnel packet: PDSP0 (MAC) ==> PDSP1(Outer IP and IPSEC SPI)==>SA (IPSEC decryption and authentication)==>PDSP2(Inner IP)==>PDSP3(UDP) ==> host.

    The SA (Security Accelerator) user guide is available at the TI web page of C6678.

    If I answer your questions, could you please click the "Verify answer" button.

    Best regards,

    Eric

     

     

     

     

     

     

  • Hi Eric,

    Thanks for clarifying the mixup about the PA tx queues. I get it now - I was not paying attention to the numbers in the training slide:).

    If I want to Tx a packet from a core to 644 and then to 648. I will need to setup 644 to forward onto 648 after completion of processing.

    I was talking about the SRIO queue and the core-to-core queue as I could transmit packets from one core to another core through various means but I get it. I could if I want have each core use one of the transmit SRIO queues in a dedicated way or could have all the cores use any of the queues.

    Thanks for the IPsec flow answer.

    I believe you have answered my questions.

    Thanks very much,

    Aamir

     

  • Hi, Aamir:

    You are welcome!

    If you want PDSP4 to perform some operation such as UDP checksum and then forward the packet to GbE SW, you just need to send the packet with the UDP cheksum command and the Next Route command which specifies the next destination as pa_DEST_EMAC to queue 644. Please refer to PA LLD Doxygen, pa.h and test3.c at PA Unit test for details. By the way, I will recommend you to use PDSP5 (queue 645) for egress packet operation since all extra processing for ingress packets is executed at PDSP4.

    I am not familiar with SRIO and IPC. I will recommend you to start another thread and you will be served by the best TI engineer in those area.

    If I anwswer your questions, please click "Verify answer" below.

    Best regards,

    Eric

     

     

     

  • Hi 

    I’m working on PA_UnitTest_K2KC66BiosTestProject example now and have focused on SCTP checksum.

    In that example project, configuring the CRC engine for SCTP checksum is mentioned. However i could not see it, just could see the examples of UDP and Ipv4 checksums.

    In this post you mentioned about there are some limited support for SCTP and GTP-U. So how could i configure PA for SCTP checksum? 

    Regards

    Anil

  • Hi, Anil:

    The SCTP checksum of inner IP is demonstrated at test4.c (paTestL3Routing).

    /* CRC Configuration of CRC-32C for SCTP */
    static paCrcConfig_t   t4CrcCfg = {
                                        pa_CRC_CONFIG_INVERSE_RESULT |
                                        pa_CRC_CONFIG_RIGHT_SHIFT,          /* ctrlBitfield */
                                        pa_CRC_SIZE_32,
                                        0x1EDC6F41,                         /* polynomial */
                                        0xFFFFFFFF                          /* initValue */
                                      };

        /*
         * Configure the CRC engine for SCTP CRC-32C checksum
         * The CRC-engine connected to PDSP2 should be configured since the SCTP is within the
         * inner IP paylaod which is parased and lookup at PDSP2
         */
      cmdReply.replyId = T4_CMD_SWINFO0_CRC_CFG_ID; 
      cmdReply.queue = t4Encap.tf->QGen[Q_CMD_REPLY];
       
        hd[0] = testCommonConfigCrcEngine(t4Encap.tf, 2, &t4CrcCfg,  t4Encap.tf->QGen[Q_CMD_RECYCLE], t4Encap.tf->QLinkedBuf3,
                                          &cmdReply, &cmdDest, &cmdSize, &paret);

    ...

    If you need to do SCTP checksum for outer IP, you need to configure CRC engine 1 as well.

    The test packet (packet 3) is defined at test4pkts.h

    /* packet 3
     * mac dest = 00:01:02:03:04:aa (MAC Info 0)
     * out ip dest = 71.72.73.70 (out IP info 23)
     * inner ip dest = 10.11.12.13 (inner IP info 2)
     * inner ip src  = 10.11.12.13
     * inner ip protocol = 132 (0x84) (SCTP)
     * Designed to match inner IP configuration 2 */
    #pragma DATA_SECTION (pkt3, ".testPkts")
    static uint8_t pkt3[] = {
     0x00, 0x01, 0x02, 0x03, 0x04, 0xaa, 0x00, 0xe0,
     0xa6, 0x66, 0x57, 0x04, 0x08, 0x00, 0x45, 0x00,
     0x00, 0x64, 0x00, 0x00, 0x00, 0x00, 0x05, 0x04,
     0x19, 0x23, 0x9e, 0xda, 0x6d, 0x0b, 0x47, 0x48,
     0x49, 0x46, 0x45, 0x00, 0x00, 0x50, 0x00, 0x00,
     0x00, 0x00, 0x05, 0x84, 0x88, 0xfb, 0x0a, 0x0b,
     0x0c, 0x0d, 0x0a, 0x0b, 0x0c, 0x0d,
        0x8e, 0x3d, 0xd7, 0x5d, 0x45, 0x23, 0x05, 0x4f, /* SCTP source port, dest port and tag */             
        0x5f, 0x12, 0x9f, 0xa6, 0x05, 0x00, 0x00, 0x30, /* CRC-32C reference 0x5f129fa6 */
        0x00, 0x01, 0x00, 0x2c, 0x02, 0x00, 0x8e, 0x3d,
        0xc0, 0xa8, 0x01, 0x0d, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00,
        0x40, 0x43, 0x22, 0xdf, 0x56, 0x01, 0x00, 0x00,
        0x40, 0x69, 0x57, 0x00, 0xb6, 0x51, 0xe7, 0xc2,
        0xcc, 0x82, 0x90, 0xf9 };

    Best regrads,

    Eric

     

     

  • Hi Eric,

    Thanks for reply. I just want to know the differences between Inner IP and Outer IP. At test4.c (paTestL3Routing) you talked about, there is a 'triple layered IP headers' situation as commented. Does Inner IP mean 'to network IP' and Outer IP mean 'from network Ip' or is there IPinIP protocol there? What is the reason to configure PDSP2 to do SCTP checksum for inner IP and PDSP1 to do SCTP checksum for outer IP? 

    Regards

    Anil 

  • Hi, Anil:

    The PASS is designed to parse the input packet at multiple rocessing stages as the followings:
    PDSP0: MAC
    PDSP1: Outer IP (or single IP)
    PDSP2: Inner IP (IPinIP)
    PDSP3: TCP/UDP

    There is CRC engine associated to each processing stage so that the application need to configure the corresponding CRC engine based on where the CRC operation occurs.
    For MAC/IP/SCTP packet, the SCTP checksum needs to be verified at PDSP1 (The CRC engine index = 1)
    For MAC/IP/IP/SCTP packet, the SCTP checlsum needs to be verified at PDSP2 (The CRC engine index = 2)

    The PASS is designed to handle both IP and IPinIP packets and it is able to handle more IP layer where application needs to configure the second IP match route to send the packet back to PDSP2 through external queue. If it is one of your use case, please share the use case in details so that we can figure out how to configure PASS to handle such use case.

    Best regards,

    Eric

     

     

  • Hi Eric

    Thanks for explanation. I just curious about it, I do not need to configure PA for an IPinIP application for now. 

    I am now working on Rx checksum operation, however i did not see any test codes about it and also met a comment at Pa_formatTxCmd function as 'Support CRC calculation only in tx direction'. Should i use Pa_configCmdSet function for Rx checksum operation and paCmdInfo_t structure as configuration which has CRC operation command specific parameters (like test2.c)? Could you please share IPv4 Checksum configurations for  from network direction? My example packet is shown at blow

    Anil

    uint8_t rx_IpV4_checksum_pkt[] = {

                     /* MAC header */
                        0x00, 0x01, 0x02, 0x03, 0x04, 0xaa,
                        0x00, 0xe0, 0xa6, 0x66, 0x57, 0x04,
                        0x08, 0x00,

                   /* IP header */
                      0x45, 0x00,
                     0x00, 0x6c, /* Length (including this header) */
                     0x00, 0x00, 0x00, 0x00, 0x05, 0x11,
                     0xa5, 0x97, /* Header checksum */    -> After Rx checksum this field should be 0x00,0x00 i guess
                     0x9e, 0xda, 0x6d, 0x0a, 0x01, 0x02, 0x03, 0x04,

                 /* UDP header */
                     0x12, 0x34, 0x05, 0x55,
                     0x00, 0x58, /* Length, including this header */
                     0x00, 0x00, /* Header checksum */

                /* Payload */
                    0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39,
                    0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x41,
                    0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49,
                    0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51,
                    0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,
                    0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f, 0x60, 0x61,
                    0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69,
                    0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71,
                    0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79,
                    0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80, 0x81

    };

  • Hi, Anil:

    The good news is that you do not need to configure PASS for ingress IPv4 checksum because PASS will perform IPv4 checksum and UDP/TCP checksum during the header pasring operation automatically. If checksum error occurs, the CPPI error flag will be set as described at PA LLD doxygen and below:

     /**
      *  @page appendix2 CPPI Error Flags
      *
      *  The sub-system performs IPv4 header checksum, UDP/TCP checksum and SCTP CRC-32c checksum autonomously.
      *  The sub-system can also perform the CRC verification for incoming packet as one of the actions specified
      *  by the post-classification command set.
      *
      *  The checksum and CRC verification results are recorded at the 4-bit error flags in the CPPI packet descriptor
      *  as described below:
      *  @li bit 3: IPv4 header checksum error
      *  @li bit 2: UDP/TCP or SCTP CRC-32c checksum error
      *  @li bit 1: Custom CRC checksum error
      *  @li bit 0: reserved
      *
      */

    Please refer to PA LLD unit test "paTestTxFormatRoute" (test3.c) for a simple test case.

    Best regards,

    Eric

     

  • Hi Eric

    Thanks for good news, I checked the checksum results according to Error Flags for ingress scenario and handled situation.

    I got some questions about patching operation too. I refered to test6.c at  Pa_UnitTest_K2KC66BiosTestProject example project and try to understand the patching capability of PA. 

    Example at test6 includes tx patching operations with 'Pa_formatTxCmd' and 'Pa_formatRoutePatch' functions. Is there any way to operate patching for packets which come from network too? Could i patch the packets at any stages for 'from network' direction, such as after LUT1 matching and/or LUT2 matching?  

    There is a control bit field (ctrlBitfield) for Packet patching configuration. There are some definitions for it as 'pa_PATCH_OP_INSERT' ,  'pa_PATCH_OP_MAC_HDR' and 'pa_PATCH_OP_DELETE'. I used all these configuration parameters but did not see the difference between pa_PATCH_OP_MAC_HDR and pa_PATCH_OP_INSERT. How could i use pa_PATCH_OP_MAC_HDR, is it for ARP? 

    Best Regards

    Anil

  • Hi, Anil:

    Please refer to pa.h or PA  LLD doxygen for PA LLD API details:

    /**
     *  @def  pa_PATCH_OP_INSERT
     *        Control Info -- Set: Insert data into the packet
     *                        Clear: Patch data replaces existing packet data
     */
    #define pa_PATCH_OP_INSERT                 0x0001
    /**
     *  @def  pa_PATCH_OP_MAC_HDR
     *        Control Info -- Set: Replace MAC header with patch data
     *                        Clear: Normal Patch/Insert operation
     */
    #define pa_PATCH_OP_MAC_HDR                0x0002
    /**
     *  @def  pa_PATCH_OP_DELETE
     *        Control Info -- Set: Delete data in the packet
     *                        Clear: Normal Patch/Insert operation
     */
    #define pa_PATCH_OP_DELETE                 0x0004
    /*@}*/
    /** @} */


    /**
     *  @ingroup palld_api_structures
     *  @brief  Packet patching configuration
     *
     *  @details paPatchInfo_t is used to create data patch command. The patch command is used to patch
     *           existing data or insert data in the packet in both to-network and from-network directions.
     *
     *           In the to-network direction, it can be used to patch the authentication tag provided by SASS
     *           into the AH header within the packet. In this case, the patch data is not present at the command
     *           when it is formatted and it is appended by the SASS. The @ref Pa_formatRoutePatch is used to create
     *           a command block along with a packet routing command to forward the packet after the patch is complete
     *
     *           In the from-network direction, it can be used to insert up to 32 bytes to the offset location
     *           as part of the command set to be executed after a LUT1 or LUT2 match.
     *           This command can be used to patch the entire MAC header for MAC router functionality. It may be further
     *           enhanced and combined with other commands to support IP forwarding operation in the future.
     *           A short version of the patch command can be used to insert up to 2 bytes into the current parsing
     *           location of the packet after a LUT2 match.
     */

    typedef struct {

      uint16_t   ctrlBitfield;      /**<  Patch operation control information as defined at @ref patchCtrlInfo */
      uint16_t   nPatchBytes;       /**<  The number of bytes to be patched */
      uint16_t   totalPatchSize;    /**<  The number of patch bytes in the patch command, must be >= to nPatchBytes and a multiple of 4 bytes */
      uint16_t   offset;            /**<  Offset from the start of the packet for the patch to begin in the to-network direction
                                          Offset from the start of the current header for the patch to begin in the from-network direction */
      uint8_t    *patchData;        /**<  Pointer to the patch data */

    } paPatchInfo_t;

    Please refer to PAUnitTest test2 and test7 for some examples

    Best regards,

    Eric

     

     

     

  • Hi Eric

    Thanks for reply again. I successfully configured related PDSPs for patching operations for Tx and Rx.

    Now I am working on SCTP based classification. Do i need to configure PDSP2 and add an entry to lookup table for that operation as i did for UDP or  is it enough to give sctp dest port info and protocol info with IP configuration? 

    typedef struct {

                   paIpAddr_t src;                              /**< Source IP address */
                   paIpAddr_t dst;                              /**< Destination IP address */
                   uint32_t spi;                                   /**< ESP or AH header Security Parameters Index */
                   uint32_t flow;                                 /**< IPv6 flow label in 20 lsbs */
                   int ipType;                                       /**< @ref IpValues */
                   uint16_t greProto;                         /**< GRE protocol field */
                   uint8_t proto;                                  /**< IP Protocol (IPv4) / Next Header (IPv6) */
                   uint8_t tos;                                      /**< IP Type of Service (IPv4) / Traffic class (IPv6) */
                   uint16_t tosCare;                           /**< TRUE if the tos value is used for matching */
                   uint16_t sctpPort;                           /**< SCTP Destination Port */
    } paIpInfo_t;

    Best regards 

    Anil

  • Hi, Anil:

    You do not need to consider which PDSP to configure. The LLD API will determine the command destination (cmdDest) for you. For example:
    MAC Entry: DSP0 (LUT1_0)
    IP Entry with L2 link or no link: PDSP1(LUT1_1)
    IP entry with L3 link: PDSP2 (LUT1_2)

    The caller can overwrite the default destination by specifying the parameter lutInst to the desired LUT.

    To specify an IP/SCTP entry, the caller can specify both the IP parameters and SCTP destination port number. If you want to send all IP/SCTP packets to a single queue, you can set the IP protocl number (proto) to SCTP (132).

    Best regards,

    Eric

     

     

  • Hi Eric

    Thanks for reply. I saw that i do not need to configure related PDSP. Now i got a question about from network direction.

    For from network direction, PA could create some commands like patching, multi routing etc. For example I saw that Command buffers for these commands are created at PDSP5 for multi routing, but while data packets are running i saw at log that packets are forwarded to PDSP4 for that operation. What is the reason of it? How should i use PDSP4 and PDSP5 for these operations?

    I got some questions about Gigabit Ethernet Switch Subsystem, too.

    Could '1 Gb Ethernet Switch Subsystem in NETCP' forward packets directly to Host instead of forwarding to PA? Are all packets from network always forwarded to PDSP0 HW queue? 

    Best Regards

    Anil

  • Hi, Anil:

    Some configuration command packets can be processed at any PDSP and PA LLD will directv those command packets to PDSP5 since it is lightly used.

    Yes, the CPSW can forward packets to host queues without going through PASS. The application should call Pa_create() with initDefaultRoute set to FALSE or not call Pa_create() at all. The application should configure PASS CPPI flow 22 and 23 for packets from EMAC port 0 and 1 respectively.

    As a general concept, the PA LLD is designed to interact with user application to configure and use PASS which can be considered as a blackbox. It should be sufficient to use PASS with all the APIs provided by PA LLD. It is good to know the basic functional partition of PASS, but it is not required.

    Best regards,

    Eric

     

     

  • Hi Eric

    Thanks for reply and suggestions. I met some problems during working on IP fragmentation & reassembly operation. I refered test 10 and test 11 at PA_UnitTest_K2KC66BiosTestProject. Is there any control at PA for IP Fragmentation like: the packet size is bigger than mtu size so this packet should be fragmented? Or should i do do that control at software with comparing mtu size and the packet length at IP packet length field+mac header length?

    For reassembly, does PA automatically detect that the packet from network is fragmented? Is any configuration needed for PA? For reassembly PA forwards packets to Host i guess.

    Best Regards

    Anil 

  • Hi, Anil:

    Please refer to PA LLD API header file pa.h or doxygen for general inforamtion.

    PASS-assisted IP Reassembly operation
    - pa.h appendix3 PA-assisted IP Reassembly Operation

    /**
     *  @ingroup palld_api_structures
     *  @brief  IP Reassembly Configuration Information.
     *
     *  @details  paIpReassmConfig_t is used to configure the PA-assisted IP reassembly operation. Two separate structures are used
     *            for the outer IP and inner IP respectively. The IP reassembly assistance feature is disabled until
     *            this information is provided. See section @ref appendix3 for deatiled description.
     *  @note The maximum number of traffic flows is limited due to processing time and internal memory restriction.
     */
    typedef struct {

      uint8_t numTrafficFlow; /**< Maximum number of IP reassembly traffic flows supported, default = 0, maximum = 32 */
      uint8_t destFlowId;     /**< CPPI flow which instructs how the link-buffer queues are used for forwarding packets */
      uint16_t destQueue;     /**< Destination host queue where PASS will deliver the packets which require IP reassembly assistance */

    } paIpReassmConfig_t;

    IP Fragmentation operation:


    /*  @def  pa_CMD_IP_FRAGMENT
     *        Perform IP fragmentation 
     */
    #define  pa_CMD_IP_FRAGMENT                 12

    /**
     *  @ingroup palld_api_structures
     *  @brief   IP fragmentation information
     *
     *  @details paCmdIpFrag_t is used to create the IPv4 fragment command. The IP fragment command is used to instruct the PASS to
     *           perform IPv4 fragmentation operation. This operation can be applied to both inner IP prior to IPSEC encapsulation and
     *           outer IP after IPSEC encapsulation.  This command should go with a next route command which provides the destination
     *           information prior to the fragmentation operation.
     *   
     *           For the inner IP fragmentation, follow the following procedure:
     *  @li      Host sends packets with the IP fragment command and the destination queue set to a host queue to PASS PDSP5
     *           for IP fragmentation operation.
     *  @li      All fragments will be delivered to the specified host queue.
     *  @li      Host adds the outer MAC/IP header, invokes the SA LLD sendData function and then sends the fragments to the SA queue.
     *  @li      Each fragment will be encrypted, authenticated and forwarded to the final destination.
     *
     *           For the outer IP fragmentation, the overall operation is stated below:
     *  @li      Packet is delivered to SASS for IPSEC operation
     *  @li      Packet is delivered to PASS for IP Fragmentation operation
     *  @li      The entire packet or its fragments are delivered to the network.
     *
     *  @note the next route command is required for step 2
     *  @note The IP fragment command can not be combined with some other tx commands such as checksum and CRC commands since
     *        those commands may require the PASS operation across multiple fragments. The workaround is to break the tx commands into
     *        two groups. The first group consists of the checksum, CRC, other commands and a next route command which routes the packet
     *        back to the same PDSP to execute the second command group which consists of the IP fragment command and the next route
     *        command which points to the final destination.
     *
     *        The IP fragment command can be combined with a single blind patch command to support the IPSEC AH use case in which the SASS
     *        passes the IPSEC AH packet with the blind patch command to the PASS so that the autentication tag can be inserted into the AH
     *        header. The recommended order of the tx commands is as the followings:
     *        - pa_CMD_IP_FRAGMENT
     *        - pa_CMD_NEXT_ROUTE with flag pa_NEXT_ROUTE_PROC_NEXT_CMD set
     *        - pa_CMD_PATCH_DATA
     *
     *        The IP fragment command can be also combined with up to two message length patching commands to support the message length
     *        field updating for each IP fragment. This operation is required for certain L2 header which contains a length field such as
     *        802.3 and PPPoE. The order of tx command is as the followings:
     *        - pa_CMD_PATCH_MSG_LEN (optional)
     *        - pa_CMD_PATCH_MSG_LEN (optional)
     *        - pa_CMD_IP_FRAGMENT
     *        - pa_CMD_NEXT_ROUTE
     */

    typedef struct  {
        uint16_t  ipOffset; /**< Offset to the IP header. */
        uint16_t  mtuSize;  /**< Size of the maximum transmission unit (>= 68) */
    } paCmdIpFrag_t;

    Test 10 and Test 11 perform the IP fragmentaion and reassembly tests for IPv4 and IPv6 packets respectively.
    It send multiple streams of packets to PDSP5 for fragmentation and then deliver packets and fragments to PASS input queue for classification and reassembly operation.

    Best regards,

    Eric