This thread has been locked.
If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.
Getting 80% packet loss in DSP ingress direction with AIF2 Fast CnM over CPRI mode.
Setup:
Appleton baseband unit (BBU) to our RRH through CPRI
Xilinx CPRI core byteswap fix enabled (to overcome DSP errata)
RRH FPGA fabric nibbles swaps packet data (to overcome DSP errata)
DSP enables 4b5b mode
Run ping from RRH to BBU
Results:
The BBU->RRH (DSP egress) direction seems to work reliably
The RRH->BBU (DSP ingress) seems many packets lost (~80% as shown by ping)
As an experiment we changed the DSP to null delimiter mode and the DSP then received all the packets. No loss. We confirmed this by counters in the DSP and logic analyzer captures on the MII interface in the RRH.
The only difference between the null delimiter DSP code and 4b5b code is the 4b5b sets the 4b5b in the “PD Link Register 1” register.
For register: 0x01f6A82C and 0x01f62804
Ie, regval &= 0x00ffffff;
regval |= 0x01000000; //set 4b/5b encoding ON (channel 0)
Are there more changes needed?
Below is a logic analyzer screen shot showing the tx & rx packets on the RRH FPGA with the DSP in 4b5b mode.
This is a trace of pings sent every 1 seconds for 60 seconds. The rising edge of dbg1_txdv (tx data valid) represents a tx (to DSP) packet. The rising edge of dbg1_rxdv (rx data valid) represents a rx (from DSP) packet. There should be one rx packet for every tx packet. The screenshow shows about 60 tx packets and only 6 rx packet. When the DSP software received a packet is looped it back. However, the DSP software never got most of the packets.
Please advise,
Thanks
Bryan
We have an update:
We created a DSP executable that sends packets on egress with a 32bit count. The ingress side of the code checks the count and updates a missed sequence counter or a correct sequence count. The DSP code sends a preamble of six 55's and a 5D.
We put the DSP in Serdes loopback 4b5b encoding on and we received 100% of the packets correctly.
We then did a loopback in our RRH after the Xilinx CPRI core.
In this case we lost about 3 of every 4 packets sent by the DSP. Also, in the is case we see the 802.1 compatible Ethernet packet with preamble of seven 55's and a D5.
We then created a packet generator in the RRH FPGA.
The generator only had six 55.
We were then able to get 100% of the packet through.
We were somewhat surprised at this. Our understanding is:
DSP egress -> six 55 -> AIF2 (append SSD) -> SSD + six 55 -> Xilinx Core Rx (replace SSD with 55) -> seven 55 ->
-> Xilinx Core Tx (replace 55 with SSD) -> SSD + six 55 -> AIF2 ingress (remove SSD) -> six 55 to DSP memory
If this is the case then our packet generator should not have worked because there would only be SSD + five 55 to AIF2 ingress. Can you shed some light on this?
Thanks,
Bryan
You can ignore the previous post regarding five 55's. We were faked out into believing there was five. Another process added back the 55 so the packet being sent had 7x 55's as it should.
But we have some other important findings:
We are now able to sniff the CPRI traffic going from the RRH to the DSP. We never saw any bad or missing packets which indicates the packets are being dropped in the DSP AIF2.
Furthermore, we also measured the offset in relation to bfn_strobe (beginning of cpri frame stobe). We have one FPGA build the gets either 100% or 0% packet loss after we reset the RRH CPRI core. We also noticed that this build had a fixed offset from bfn_strobe that changed after a RRH CPRI core reset.
We have another build that always gets about 50% packet loss. We noticed the packets in this build bounced between two offset. Perhaps one offset had good packets and the other offset has bad packets.
We have another build that sends the packet a programmable number of clocks after bfn_strobe. We saw that the counter offset affects the packets received. Sometimes we would get 50% packets, sometimes we would get 0% packets and sometimes we would get lower packet loss. This was very predictable too. A certain count value would always give the same results.
Again, for each of these build the packets over CPRI looked good.
Could this be related to an AIF2 configuration? I saw this section:
Noted this: in AIF2 spec:
7.3.4.2 Symbol Alignment
Symbol alignment based on K28 comma symbols is utilized by all standards that use 8b10b encoded data.Setting ALIGN to 01 will enable alignment of the received data stream based on comma symbols. b However, it is normal to disable comma detection by setting ALIGN to 00 once some number of aligned commas have been received. SYNC indicates that an aligned comma has been received. This prevents inadvertent realignmen occurring if a bit error changes another symbol into a comma. Typically, small hardware state machines for each receiver chan channel is required to control ALIGN. The exact alignment protocol is dependent on the standard being used.
and this:
sd_auto_align_en - one bit
Enables the RM to automatically disable SerDes symbol alignment when the receiver state machine reaches state ST3. Disabling comma alignment may be necessary in OBSAI mode due to the K28.7 comma character, in combination with certain data, that tricks the SerDes into falsely realigning.
• 0: Disable auto alignment
• 1: Enable auto alignment
Could these be a clue as to what is happening?
Thanks,
Bryan
We are pretty sure the problem is the AIF2 does not detect packets that are not aligned to CPRI control word boundaries. A control word is 40bits (3G rate) in our case.
We captured packets from the DSP and from the RRH and saw these alignments:
The alignments look like this where the rightmost zero is the first bit of the packet. Packets read right to left.
DSP:
0110101110101101101101010101011011000100
Offset=40
RRH
Capture1
Packet 1
1011000100111111111111111111111111111111
Offset=10
Packet 2
0001001111111111111111111111111111111111
Offset=6
Requiring CPRI control word alignment in not a CPRI specification so the AIF2 is non-conforming.
As a side note: We think there are other alignments that AIF2 detects because we see 20% of our packets getting through even though we don't think we are control word aligned. We have not been able to pin point which alignments do work.
Please advise on AIF2 can detect packets anywhere within a control word.
Hardware (AIF2) 4b5b encoding/decoding has never worked for us. We currently run the AIF2 in null delimiter mode and do the 4b5b encode/decode in software on the SOC ARM and are living with the packet segmentation and software resource utilization of this approach. We would like to off-load the ARM so we are going back to the AIF2 4b5b approach.
When we switch from 4b5b software to hardware we also enable a nibble swap work around in the Xilinx core.
It is difficult for us to control the packet alignment over cpri because we have little control over the Xilinx core. Over the last few weeks we have been working on trying to time packets going into the Xilinx core to align the packets at deterministic times but this has proven difficult and did not yield consistent results. We are working two more options to force the alignment. One option we will have control over and the other option involves Xilinx modifying the core (which will take some time because of their schedule).
CPRI is configured for 3G with 15bit I/Q at 15.36MSPS. Also, we have Ethernet configured for the max rate (p=20).
Hope that answers your questions.