This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

AM6442: Enabling ECC with RAM < 2GB

Part Number: AM6442

Hello,

I followed this thread to enable ECC but for AM6442 and with memory <=2GB to modify drivers/ram/k3-ddrss/k3-ddrss.c 

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1058618/faq-tda4vm-how-to-enable-inline-ecc-for-memory-2gb

 The differences are:

- Did not do the RAT since RAM is <2GB

- Only enabled ECC range R1, (R2 & R3 disabled)

- for the route ID and number of lines I used the following:

#define AM64_NUM_ROUTE_ID (20)
#define AM64_NUM_ECC_CACHE_LINES (64)

#define R5FSS00_ROUTE_ID 3000
#define R5FSS01_ROUTE_ID 3001
#define R5FSS10_ROUTE_ID 3002
#define R5FSS11_ROUTE_ID 3003
#define MCU_M4FSS0_ROUTE_ID 4092
#define DMASS0_SEC_PROXY_0_ROUTE_ID 132
#define DMASS0_RINGACC_0_ROUTE_ID 133
#define COMPUTE_CLUSTER0_RD_ROUTE_ID 16
#define COMPUTE_CLUSTER0_WR_ROUTE_ID 0
#define R5FSS00_RD_ROUTE_ID 66
#define R5FSS00_WR_ROUTE_ID 67
#define R5FSS01_RD_ROUTE_ID 69
#define R5FSS01_WR_ROUTE_ID 70
#define R5FSS10_RD_ROUTE_ID 71
#define R5FSS10_WR_ROUTE_ID 72
#define R5FSS11_RD_ROUTE_ID 73
#define R5FSS11_WR_ROUTE_ID 74
#define PRU_ICSSG0_ROUTE_ID 384
#define PRU_ICSSG1_ROUTE_ID 416
#define SA2_UL0_ROUTE_ID 271

static unsigned int am64_soc_route_id[] = {
R5FSS00_ROUTE_ID,
R5FSS01_ROUTE_ID,
R5FSS10_ROUTE_ID,
R5FSS11_ROUTE_ID,
MCU_M4FSS0_ROUTE_ID,
DMASS0_SEC_PROXY_0_ROUTE_ID,
DMASS0_RINGACC_0_ROUTE_ID,
COMPUTE_CLUSTER0_RD_ROUTE_ID,
COMPUTE_CLUSTER0_WR_ROUTE_ID,
R5FSS00_RD_ROUTE_ID,
R5FSS00_WR_ROUTE_ID,
R5FSS01_RD_ROUTE_ID,
R5FSS01_WR_ROUTE_ID,
R5FSS10_RD_ROUTE_ID,
R5FSS10_WR_ROUTE_ID,
R5FSS11_RD_ROUTE_ID,
R5FSS11_WR_ROUTE_ID,
PRU_ICSSG0_ROUTE_ID,
PRU_ICSSG1_ROUTE_ID,
SA2_UL0_ROUTE_ID
};

static unsigned int am64_ecc_cache_lines[] = {
1, // R5FSS00_ROUTE_ID,
1, // R5FSS01_ROUTE_ID,
1, // R5FSS10_ROUTE_ID,
1, // R5FSS11_ROUTE_ID,
1, // MCU_M4FSS0_ROUTE_ID,
4, // DMASS0_SEC_PROXY_0_ROUTE_ID,
4, // DMASS0_RINGACC_0_ROUTE_ID,
4, // COMPUTE_CLUSTER0_RD_ROUTE_ID,
4, // COMPUTE_CLUSTER0_WR_ROUTE_ID,
4, // R5FSS00_RD_ROUTE_ID,
4, // R5FSS00_WR_ROUTE_ID,
4, // R5FSS01_RD_ROUTE_ID,
4, // R5FSS01_WR_ROUTE_ID,
4, // R5FSS10_RD_ROUTE_ID,
4, // R5FSS10_WR_ROUTE_ID,
4, // R5FSS11_RD_ROUTE_ID,
4, // R5FSS11_WR_ROUTE_ID,
4, // PRU_ICSSG0_ROUTE_ID,
4, // PRU_ICSSG1_ROUTE_ID,
2, // SA2_UL0_ROUTE_ID
};

I have a crash when I do a mtest in uboot (which works before enabling ECC):

SoC: AM64X SR1.0
DRAM: 512 MiB
NAND: 4096 MiB
MMC: mmc@fa00000: 1
In: serial@2800000
Out: serial@2800000
Err: serial@2800000
Net: eth0: ethernet@8000000port@1, eth1: ethernet@8000000port@2
Hit any key to stop autoboot: 0
=> mtest
Testing 80100000 ... 83f00000:
Pattern 00000000 Writing... Reading...
ERROR: Unhandled External Abort received on 0x80000000 from EL2
ERROR: exception reason=1 syndrome=0x92000210
Unhandled Exception from EL2
x0 = 0x0000000000000020
x1 = 0x00000000fffffff6
x2 = 0x000000009deb7147
x3 = 0x0000000000000020
x4 = 0x00000000fffffff7
x5 = 0x000000009deb6f48
x6 = 0x0000000000000030
x7 = 0x000000000000000f
x8 = 0x000000009deb75b8
x9 = 0x0000000000000008
x10 = 0x00000000ffffffd8
x11 = 0x0000000000000010
x12 = 0x0000000000000006
x13 = 0x000000000001869f
x14 = 0x000000009deb7900
x15 = 0x0000000000000001
x16 = 0x000000009ff4c7d0
x17 = 0x0000000000000000
x18 = 0x000000009dec0de0
x19 = 0x000000009deb75f0
x20 = 0x000000009deb7149
x21 = 0x00000000ffffffd8
x22 = 0x000000009deb75f0
x23 = 0x000000009deb75f0
x24 = 0x000000009deb7118
x25 = 0x000000009ffa5840
x26 = 0x0000000000000008
x27 = 0x00000000ffffffff
x28 = 0x000000009deb752c
x29 = 0x000000009deb6f90
x30 = 0x000000009ff8e204
scr_el3 = 0x000000000000073d
sctlr_el3 = 0x0000000030cd183f
cptr_el3 = 0x0000000000000000
tcr_el3 = 0x0000000080803520
daif = 0x00000000000002c0
mair_el3 = 0x00000000004404ff
spsr_el3 = 0x00000000800003c9
elr_el3 = 0x000000009ff8dee4
ttbr0_el3 = 0x00000000701ce800
esr_el3 = 0x0000000092000210
far_el3 = 0x000000009ffa5840
spsr_el1 = 0x0000000000000000
elr_el1 = 0x0000000000000000
spsr_abt = 0x0000000000000000
spsr_und = 0x0000000000000000
spsr_irq = 0x0000000000000000
spsr_fiq = 0x0000000000000000
sctlr_el1 = 0x0000000030d00801
actlr_el1 = 0x0000000000000000
cpacr_el1 = 0x0000000000000000
csselr_el1 = 0x0000000000000000
sp_el1 = 0x0000000000000000
esr_el1 = 0x0000000000000000
ttbr0_el1 = 0x0000000000000000
ttbr1_el1 = 0x0000000000000000
mair_el1 = 0x0000000000000000
amair_el1 = 0x0000000000000000
tcr_el1 = 0x0000000000800080
tpidr_el1 = 0x0000000000000000
tpidr_el0 = 0x0000000000000000
tpidrro_el0 = 0x0000000000000000
par_el1 = 0x0000000000000000
mpidr_el1 = 0x0000000080000000
afsr0_el1 = 0x0000000000000000
afsr1_el1 = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1 = 0x0000000000000000
cntp_ctl_el0 = 0x0000000000000000
cntp_cval_el0 = 0x0000000000000000
cntv_ctl_el0 = 0x0000000000000000
cntv_cval_el0 = 0x0000000000000000
cntkctl_el1 = 0x0000000000000000
sp_el0 = 0x00000000701cb400
isr_el1 = 0x0000000000000000
dacr32_el2 = 0x0000000000000000
ifsr32_el2 = 0x0000000000000000
cpuectlr_el1 = 0x0000000000000040
cpumerrsr_el1 = 0x0000000003140175
l2merrsr_el1 = 0x0000000012184d18
cpuactlr_el1 = 0x00001000090ca000

If you have any thoughts about this or if you have an implementation that works on AM6442 please let me know.

Regards.

  • Hi Mehdi,
    I'm checking internally, and will get back to you on DDRSS inline ECC support on AM64x.
    Best,
    -Hong

  • Hello Hong,

    The crash was due to usage of addresses that should have been removed from the size of the RAM. I removed t 1/9 of the size of the ddr from the 3 dts files (2 uboot dts files and the the kernel dts) and also from the uboot functions: dram_init and dram_init_banksize. I'm not sure if I should reduce the size of the RAM in all of these 5 instances or not, please clarify. 

    The kernel now boots, but the performance/LPDDR4 bandwidth is poor: I have a drop of 2.5 times in mem bandwidth where I was expecting only 11% drop in performance (1 additional bit per 8bits), the actual drop i'm seeing is going from ~1500MB/s down to only ~630MB/s. Here's the log from stream with ECC off then ECC on (this is using LPDDR4 at 667MHz):

    ECC off:

    -------------------------------------------------------------
    STREAM version $Revision: 5.10 $
    -------------------------------------------------------------
    This system uses 8 bytes per array element.
    -------------------------------------------------------------
    Array size = 10000000 (elements), Offset = 0 (elements)
    Memory per array = 76.3 MiB (= 0.1 GiB).
    Total memory required = 228.9 MiB (= 0.2 GiB).
    Each kernel will be executed 10 times.
    The *best* time for each kernel (excluding the first iteration)
    will be used to compute the reported bandwidth.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 122601 microseconds.
    (= 122601 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Best Rate MB/s Avg time Min time Max time
    Copy: 1538.8 0.108162 0.103976 0.117037
    Scale: 1572.9 0.105863 0.101724 0.111943
    Add: 1318.8 0.183694 0.181977 0.194650
    Triad: 1326.4 0.181419 0.180944 0.181748
    -------------------------------------------------------------
    Solution Validates: avg error less than 1.000000e-13 on all three arrays
    -------------------------------------------------------------

    With ECC on:

    -------------------------------------------------------------
    STREAM version $Revision: 5.10 $
    -------------------------------------------------------------
    This system uses 8 bytes per array element.
    -------------------------------------------------------------
    Array size = 10000000 (elements), Offset = 0 (elements)
    Memory per array = 76.3 MiB (= 0.1 GiB).
    Total memory required = 228.9 MiB (= 0.2 GiB).
    Each kernel will be executed 10 times.
    The *best* time for each kernel (excluding the first iteration)
    will be used to compute the reported bandwidth.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 213500 microseconds.
    (= 213500 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Best Rate MB/s Avg time Min time Max time
    Copy: 631.5 0.257656 0.253352 0.285207
    Scale: 662.9 0.245531 0.241375 0.259165
    Add: 587.1 0.415272 0.408818 0.462042
    Triad: 500.1 0.483798 0.479906 0.508016
    -------------------------------------------------------------
    Solution Validates: avg error less than 1.000000e-13 on all three arrays
    -------------------------------------------------------------

    Could you please check the route ID selection to the ECC cache lines I did, that's the only thing I can think of.

    Thank you.

  • Hi Mehdi,
    The AM64x DDR inline ECC support is currently planned for Linux SDK 8.4 release in July-2022 timeframe.
    Best,
    -Hong

  • Hi Hong,

    Understood, thank you.

    Mehdi

  • Hi Mehdi,
    Given the complexity of DDR inline ECC feature, one option is to wait till the support is added in Linux SDK 8.4.
    At the same time, I'll keep you posted with the SDK feature development status.
    Best,
    -Hong

  • Hi Hong,

    Please do. Thank you for the feedback.

    Regards, Mehdi

  • Hi Mehdi,

    Yes, will keep you posted.

    Best,

    -Hong