This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TCI6638K2K: Hyperlink initialization fails seldomly

Part Number: TCI6638K2K

Hi,

We have a seldom problem in Hyperlink initialization. Reproduction rate is about 1 out of 100.
The initialization passes, all PLLs are locked and everything looks good. But when it's taken into use, the block and the accessing core stalls.

I managed to read the register set before the crash (afterwards it's impossible) and my eye was caught by the Link Status -register.
That's the only one having difference between the reproductions;

A Good situation the both peers have the same value:
linkStatus = 0xfdf0 bdf0

and non-working situation the values are

DSP1: linkStatus = 0xccf0 bdf0
DSP2: linkStatus = 0xfdf0 8dff

Could you share your view, what has gone wrong?
I can see that the PLS has not connected correctly but is it a symptom or cause?

The forum had an incident where the SERDES configuration was carried out twice and the results looked somewhat the same.
the dual configuration is not an issue in this case, the power cycle is used to re-boot the device.

The environment is custom one. It's possible that the problem occurs more in some particular environments.
Is it possible that glitches in a reference clocks can cause behavior like this?

SERDES values are measured and custom values are applied.

Thank you in advance.

  • Hi,

    I've notified the team. Their feedback will be posted here.

    Best Regards,
    Yordan
  • Hi,

    Please discribe the details for the setup for HW and SW: K2K has two Hyperlink ports, which device is connected to which device via Hyperlink. What is the lane rate? This is customer baord or TI EVM connection with cable? How do you setup the C1, C2, CM, ATT, VREG? Do you do any Serdes tuning for this? What SW is this? RTOS or Linux?

    There maybe PCB layout improvement and Serdes tuning needed to acheive 100% reliable reboot for Hyperlink. You may look at www.ti.com/.../sprac37.pdf Hyperlink portion.

    Regards, Eric
  • Hi,

    It's good to know that we can actually aim to 100% reliability by the design of the K2.

    The Hyperlink Device 0 is used in all K2s. This is a customer board with several K2s.

    Link Speed 10G, Rate Full and 4 lanes in use.

    cm_coeff = 3;
    c1_coeff = 3;
    c2_coeff = 0;
    tx_att = 11;
    tx_vreg = 4;

    Applied with CSL_SERDES_CONFIG_CM_C1_C2()

    BTS SW (custom-and-huge-pile). RTOS. System also has linux but the configuration is done by the DSP core 0.

    Serdes values have been measured by our HW department and applied with CSL_SERDES_CONFIG_ATT_BOOST() lane by lane.

    I'll make the HW-team to check the design of the Hyperlink bus.

    Thank you.

  • Hi,

    It has all been checked out. HW design is fine, SERDES values are measured and applied and still we have seldom errors after the power up.

    XGE guys gave me a hint that they had a problem with a bug in SERDES, which caused unstability in XGE. They suspected the DCR-problem also in Hyperlink.

    There's a comment in the Hyperlink to check the lanes and reset the CDR, so I have been trying to apply the CDR reset to the Hyperlink.

    Since I did not find any register from the SERDES, which would describe when the SERDES reset is needed, I just run some

    tests with the CDR reset with every boot. It seemed to make a difference (but not yet confirmed)

    So, I need some assistance, how to detect the non-working lanes. If you think, this is a dead end, please let me know.

    The Reset itself is carried out by toggling the signal detect, is that sufficient?

  • Hi,

    FInal status;

    * No Errors visible in SERDES status registers at 0x1FF4 and 0x1FE0 ... 0x1FEC
       So it looks like the SERDES can spot no errors and is up.

    * CDR Reset made no difference.

    Problem still persist.

    What can we check next?

  • One more question;
    There's delay loops which says "Wait at least 10us".
    Does it matter, if we wait for longer in the delay loops or should I disable the interrupts during the wait?