TMS320TCI6614: AIF2 Fast CnM Errata Workaround

Bryan Hehn

Intellectual 630 points

Part Number: TMS320TCI6614

Getting 80% packet loss in DSP ingress direction with AIF2 Fast CnM over CPRI mode.

Setup:

Appleton baseband unit (BBU) to our RRH through CPRI

Xilinx CPRI core byteswap fix enabled (to overcome DSP errata)

RRH FPGA fabric nibbles swaps packet data (to overcome DSP errata)

DSP enables 4b5b mode

Run ping from RRH to BBU

Results:

The BBU->RRH (DSP egress) direction seems to work reliably

The RRH->BBU (DSP ingress) seems many packets lost (~80% as shown by ping)

As an experiment we changed the DSP to null delimiter mode and the DSP then received all the packets. No loss. We confirmed this by counters in the DSP and logic analyzer captures on the MII interface in the RRH.

The only difference between the null delimiter DSP code and 4b5b code is the 4b5b sets the 4b5b in the “PD Link Register 1” register.

For register: 0x01f6A82C and 0x01f62804

Ie, regval &= 0x00ffffff;

regval |= 0x01000000; //set 4b/5b encoding ON (channel 0)

Are there more changes needed?

Below is a logic analyzer screen shot showing the tx & rx packets on the RRH FPGA with the DSP in 4b5b mode.

This is a trace of pings sent every 1 seconds for 60 seconds. The rising edge of dbg1_txdv (tx data valid) represents a tx (to DSP) packet. The rising edge of dbg1_rxdv (rx data valid) represents a rx (from DSP) packet. There should be one rx packet for every tx packet. The screenshow shows about 60 tx packets and only 6 rx packet. When the DSP software received a packet is looped it back. However, the DSP software never got most of the packets.

Please advise,

Thanks

Bryan

over 8 years ago

0 Yordan Kovachev over 8 years ago

TI__Guru**** 161600 points

Hi,

SOrry for the late reply.
I've notified the RADAR team.

Best Regards,
Yordan

0 Bryan Hehn over 8 years ago in reply to Yordan Kovachev

Intellectual 630 points

We have an update:

We created a DSP executable that sends packets on egress with a 32bit count. The ingress side of the code checks the count and updates a missed sequence counter or a correct sequence count. The DSP code sends a preamble of six 55's and a 5D.
We put the DSP in Serdes loopback 4b5b encoding on and we received 100% of the packets correctly.

We then did a loopback in our RRH after the Xilinx CPRI core.
In this case we lost about 3 of every 4 packets sent by the DSP. Also, in the is case we see the 802.1 compatible Ethernet packet with preamble of seven 55's and a D5.

We then created a packet generator in the RRH FPGA.
The generator only had six 55.
We were then able to get 100% of the packet through.

We were somewhat surprised at this. Our understanding is:

DSP egress -> six 55 -> AIF2 (append SSD) -> SSD + six 55 -> Xilinx Core Rx (replace SSD with 55) -> seven 55 ->
-> Xilinx Core Tx (replace 55 with SSD) -> SSD + six 55 -> AIF2 ingress (remove SSD) -> six 55 to DSP memory
If this is the case then our packet generator should not have worked because there would only be SSD + five 55 to AIF2 ingress. Can you shed some light on this?

Thanks,

Bryan

0 Bryan Hehn over 8 years ago in reply to Bryan Hehn

Intellectual 630 points

You can ignore the previous post regarding five 55's. We were faked out into believing there was five. Another process added back the 55 so the packet being sent had 7x 55's as it should.

But we have some other important findings:

We are now able to sniff the CPRI traffic going from the RRH to the DSP. We never saw any bad or missing packets which indicates the packets are being dropped in the DSP AIF2.

Furthermore, we also measured the offset in relation to bfn_strobe (beginning of cpri frame stobe). We have one FPGA build the gets either 100% or 0% packet loss after we reset the RRH CPRI core. We also noticed that this build had a fixed offset from bfn_strobe that changed after a RRH CPRI core reset.

We have another build that always gets about 50% packet loss. We noticed the packets in this build bounced between two offset. Perhaps one offset had good packets and the other offset has bad packets.

We have another build that sends the packet a programmable number of clocks after bfn_strobe. We saw that the counter offset affects the packets received. Sometimes we would get 50% packets, sometimes we would get 0% packets and sometimes we would get lower packet loss. This was very predictable too. A certain count value would always give the same results.

Again, for each of these build the packets over CPRI looked good.

Could this be related to an AIF2 configuration? I saw this section:

Noted this: in AIF2 spec:

7.3.4.2 Symbol Alignment

Symbol alignment based on K28 comma symbols is utilized by all standards that use 8b10b encoded data.Setting ALIGN to 01 will enable alignment of the received data stream based on comma symbols. b However, it is normal to disable comma detection by setting ALIGN to 00 once some number of aligned commas have been received. SYNC indicates that an aligned comma has been received. This prevents inadvertent realignmen occurring if a bit error changes another symbol into a comma. Typically, small hardware state machines for each receiver chan channel is required to control ALIGN. The exact alignment protocol is dependent on the standard being used.

and this:

sd_auto_align_en - one bit

Enables the RM to automatically disable SerDes symbol alignment when the receiver state machine reaches state ST3. Disabling comma alignment may be necessary in OBSAI mode due to the K28.7 comma character, in combination with certain data, that tricks the SerDes into falsely realigning.

• 0: Disable auto alignment

• 1: Enable auto alignment

Could these be a clue as to what is happening?

Thanks,

Bryan

0 Bryan Hehn over 8 years ago in reply to Bryan Hehn

Intellectual 630 points

We are pretty sure the problem is the AIF2 does not detect packets that are not aligned to CPRI control word boundaries. A control word is 40bits (3G rate) in our case.

We captured packets from the DSP and from the RRH and saw these alignments:

The alignments look like this where the rightmost zero is the first bit of the packet. Packets read right to left.

DSP:

0110101110101101101101010101011011000100

Offset=40

RRH

Capture1

Packet 1

1011000100111111111111111111111111111111

Offset=10

Packet 2

0001001111111111111111111111111111111111

Offset=6

Requiring CPRI control word alignment in not a CPRI specification so the AIF2 is non-conforming.

As a side note: We think there are other alignments that AIF2 detects because we see 20% of our packets getting through even though we don't think we are control word aligned. We have not been able to pin point which alignments do work.

Please advise on AIF2 can detect packets anywhere within a control word.

0 Bryan Hehn over 8 years ago in reply to Bryan Hehn

Intellectual 630 points

Some background: We have some FPGA test builds that can send a pre-canned packet at a certain time.
Sometimes these builds send at a time that AIF2 likes and sometimes they don’t.
Also, the Xilinx CPRI core needs to be reset a random number of times to get it into the ‘good’ mode.
We are trying – again through experimentation - to force the packets to be sent at ‘good’ times.
However, this is not a long-term solution.

The problem is we
do not have control of the bit level timing.
The Xilinx CPRI core (it is encrypted IP) does the bit level timing.
As a long-term solution we are working with Xilinx to have them do the alignment.
However, this may take a few months because they have higher priorities than making a
non-spec compliant work around for their IP block.

What we would like from TI:

1) Verify that packets must be aligned and how.
The alignment seems to be needed based on our experiments but it would be helpful if TI could
verify that that is the case. Also, how must packets be aligned.
We have found that sometimes they do not need to be 40bit aligned and other offsets will work.
We are going through the effort to work with Xilinx to do the alignment but we do not know
for sure if it will fix 100% of the problem. We need to fully understand what the SOC
needs so we can describe the fix to Xilinx.

2) Is there any other workaround or configuration that would allow the SOC to accept non-aligned packets in 4b5b mode?
This is obviously the best-case solution.

Thanks

0 db_woodall over 8 years ago in reply to Bryan Hehn

TI__Mastermind 21025 points

I went back and looked at my notes - and the last time we talked about this was 12/2015 and 01/2016. My memory has certainly faded since then... Was this working and now it's not, or has it not been worked on during this time?

To my knowledge, any departure from the CPRI 4.2 protocol spec has been documented in the device errata. The Fast C&M handling is one of them, but it doesn't mention any special alignment requirements. Other customers have implemented control words have not complained about alignment issues.

It is somewhat confusing to me that the initial post mentions that if you switch from 4b5b to null delimitor that you lose no packets. Did you also switch the firmware running in the Xilinx? I take it that while using null delimitor, you encountered the problem of packet fragmentation whenever the null delimitor character was encountered.

At this point we need to find a quick solution that works. Have you considered using null delimitor, or possibly using other offsets that you mentioned?

0 db_woodall over 8 years ago in reply to db_woodall

TI__Mastermind 21025 points

Also, can you send the following information:

1) What LTE rate are you configuring?
2) What is the AIF2 link rate?

0 Bryan Hehn over 8 years ago in reply to db_woodall

Intellectual 630 points

Hardware (AIF2) 4b5b encoding/decoding has never worked for us. We currently run the AIF2 in null delimiter mode and do the 4b5b encode/decode in software on the SOC ARM and are living with the packet segmentation and software resource utilization of this approach. We would like to off-load the ARM so we are going back to the AIF2 4b5b approach.

When we switch from 4b5b software to hardware we also enable a nibble swap work around in the Xilinx core.

It is difficult for us to control the packet alignment over cpri because we have little control over the Xilinx core. Over the last few weeks we have been working on trying to time packets going into the Xilinx core to align the packets at deterministic times but this has proven difficult and did not yield consistent results. We are working two more options to force the alignment. One option we will have control over and the other option involves Xilinx modifying the core (which will take some time because of their schedule).

CPRI is configured for 3G with 15bit I/Q at 15.36MSPS. Also, we have Ethernet configured for the max rate (p=20).

Hope that answers your questions.

Processors

Processors forum

TMS320TCI6614: AIF2 Fast CnM Errata Workaround