This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

TDA4VM: sa2ul performance test issue

Part Number: TDA4VM
Other Parts Discussed in Thread: AES-128

We are carrying out sa2ul performance test through TI TDA4 / XJ721E SOC HS equipment, we refer to your test method, input:

insmod tcrypt.ko mode=500 sec=1 &

the result is shown below.

[ 1332.532988] testing speed of async cbc(aes) (cbc-aes-sa2ul) encryption
[ 1332.541674] tcrypt: test 0 (128 bit key, 16 byte blocks): 75656 operations in 1 seconds (1210496 bytes)
[ 1333.548523] tcrypt: test 1 (128 bit key, 64 byte blocks): 77942 operations in 1 seconds (4988288 bytes)
[ 1334.556530] tcrypt: test 2 (128 bit key, 256 byte blocks): 502658 operations in 1 seconds (128680448 bytes)
[ 1335.564864] tcrypt: test 3 (128 bit key, 1024 byte blocks): 56926 operations in 1 seconds (58292224 bytes)
[ 1336.572784] tcrypt: test 4 (128 bit key, 1472 byte blocks): 49726 operations in 1 seconds (73196672 bytes)
[ 1337.580790] tcrypt: test 5 (128 bit key, 8192 byte blocks): 22413 operations in 1 seconds (183607296 bytes)
[ 1338.588903] tcrypt: test 6 (192 bit key, 16 byte blocks): 72694 operations in 1 seconds (1163104 bytes)
[ 1339.596522] tcrypt: test 7 (192 bit key, 64 byte blocks): 74998 operations in 1 seconds (4799872 bytes)
[ 1340.604534] tcrypt: test 8 (192 bit key, 256 byte blocks): 475872 operations in 1 seconds (121823232 bytes)
[ 1341.612866] tcrypt: test 9 (192 bit key, 1024 byte blocks): 57911 operations in 1 seconds (59300864 bytes)
[ 1342.620802] tcrypt: test 10 (192 bit key, 1472 byte blocks): 47996 operations in 1 seconds (70650112 bytes)
[ 1343.628871] tcrypt: test 11 (192 bit key, 8192 byte blocks): 19848 operations in 1 seconds (162594816 bytes)
[ 1344.636982] tcrypt: test 12 (256 bit key, 16 byte blocks): 80624 operations in 1 seconds (1289984 bytes)
[ 1345.644622] tcrypt: test 13 (256 bit key, 64 byte blocks): 77801 operations in 1 seconds (4979264 bytes)
[ 1346.652612] tcrypt: test 14 (256 bit key, 256 byte blocks): 472843 operations in 1 seconds (121047808 bytes)
[ 1347.660950] tcrypt: test 15 (256 bit key, 1024 byte blocks): 55751 operations in 1 seconds (57089024 bytes)
[ 1348.668876] tcrypt: test 16 (256 bit key, 1472 byte blocks): 52313 operations in 1 seconds (77004736 bytes)
[ 1349.676878] tcrypt: test 17 (256 bit key, 8192 byte blocks): 20365 operations in 1 seconds (166830080 bytes)

It can be seen that the encryption speed of aes-cbc-128 increases as the block size increases. However, when the block size reaches 256 bytes, the encryption speed is the fastest, even reaching 100MB+/s. Is there any error in tcrypt?

In addition, whether the performance of sa2ul aes can be tested by the command:

openssl speed -elapsed -evp aes-128-cbc

  • Hi Yongjin ma,

    I am able to see the same behavior at my end on SDK 7.0 on my GP board, I believe you are also on SDK 7.0.

    I will check with team internally and get back to you on this.

    openssl speed -elapsed -evp aes-128-cbc also can be used to measure sa2ul performance.

     openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 192465 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 194583 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 937839 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 150444 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 63532 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 16384 size blocks: 38512 aes-128-cbc's in 3.00s

    Best Regards,
    Keerthy

  • Thank you for your answer.  Yes, I use SDK 7.0. You can see from the above results that there is a problem with the encryption of 256 byte block size.

    In addition, does sa2ul support MD5 algorithm. Because the speed measured by the command: openssl speed -elapsed -evp md5,exceeds the theoretical range when the block size is 8192 bytes and 16384 bytes.

    openssl speed -elapsed -evp md5

    root@j7-evm:/# openssl speed -elapsed -evp  md5        
    You have chosen to measure elapsed time instead of user CPU time.
    Doing md5 for 3s on 16 size blocks: 382930 md5's in 3.00s
    Doing md5 for 3s on 64 size blocks: 372229 md5's in 3.00s
    Doing md5 for 3s on 256 size blocks: 345375 md5's in 3.00s
    Doing md5 for 3s on 1024 size blocks: 267819 md5's in 3.00s
    Doing md5 for 3s on 8192 size blocks: 87149 md5's in 3.00s
    Doing md5 for 3s on 16384 size blocks: 49372 md5's in 3.00s
    OpenSSL 1.1.1g  21 Apr 2020
    built on: Wed Jun 17 14:27:03 2020 UTC
    options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr) 
    compiler: aarch64-none-linux-gnu-gcc  --sysroot=recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map=                      -fdebug-prefix-map=                      -fdebug-prefixG
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    md5               2042.29k     7940.89k    29472.00k    91415.55k   237974.87k   269636.95k

    root@j7-evm:/# openssl speed -elapsed -evp  sha1
    You have chosen to measure elapsed time instead of user CPU time.
    Doing sha1 for 3s on 16 size blocks: 87691 sha1's in 3.00s
    Doing sha1 for 3s on 64 size blocks: 85562 sha1's in 3.00s
    Doing sha1 for 3s on 256 size blocks: 161592 sha1's in 3.00s
    Doing sha1 for 3s on 1024 size blocks: 77782 sha1's in 3.00s
    Doing sha1 for 3s on 8192 size blocks: 38981 sha1's in 3.00s
    Doing sha1 for 3s on 16384 size blocks: 24491 sha1's in 3.00s
    OpenSSL 1.1.1g  21 Apr 2020
    built on: Wed Jun 17 14:27:03 2020 UTC
    options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr) 
    compiler: aarch64-none-linux-gnu-gcc  --sysroot=recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map=                      -fdebug-prefix-map=                      -fdebug-prefixG
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    sha1               467.69k     1825.32k    13789.18k    26549.59k   106444.12k   133753.51k

    Moreover, with the same block size, the speed is several times that of SHA1, sha256 and SHA512.

  • Hi Yongjin ma,

    There is a fuzziness with SA2UL with sizes between 240 & 256 Bytes packet. So we enable Software fallback
    for the size (240 - 256) hence you see a higher throughput due to A72 executing it faster:

    [ 1334.556530] tcrypt: test 2 (128 bit key, 256 byte blocks): 502658 operations in 1 seconds (128680448 bytes)

    This is faster as the size falls under the range 240-256 & hence we see a spike in performance due to employing A72
    to overcome the hardware limitation. This is expected and all other sizes are using SA2UL.

    SDK 7.0 We do not support MD5 on SA2UL.

    Best Regards,
    Keerthy

  • Thank you for your answer. Will TI support AES encryption with SA2UL engine when the block size is 256 in the future?

  • In addition, when aes-128-cbc is implemented in software, when the block size is 256 bytes, the speed is about 100MB/s, which is different from the result measured by evp. why is this happening?

    root@j7-evm:~# openssl speed -elapsed -evp aes-128-cbc
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-cbc for 3s on 16 size blocks: 186053 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 196548 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 887255 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 145804 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 63436 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 16384 size blocks: 37919 aes-128-cbc's in 3.00s
    OpenSSL 1.1.1g  21 Apr 2020
    built on: Wed Jun 17 14:27:03 2020 UTC
    options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr) 
    compiler: aarch64-none-linux-gnu-gcc  --sysroot=recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map=                      -fdebug-prefix-map=                      -fdebug-prefixG
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128-cbc        992.28k     4193.02k    75712.43k    49767.77k   173222.57k   207088.30k

    root@j7-evm:/usr/test# openssl speed aes-128-cbc
    Doing aes-128 cbc for 3s on 16 size blocks: 17731180 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 64 size blocks: 4715763 aes-128 cbc's in 2.99s
    Doing aes-128 cbc for 3s on 256 size blocks: 1192506 aes-128 cbc's in 3.00s
    Doing aes-128 cbc for 3s on 1024 size blocks: 300050 aes-128 cbc's in 3.00s                                           
    Doing aes-128 cbc for 3s on 8192 size blocks: 37620 aes-128 cbc's in 3.00s                                            
    Doing aes-128 cbc for 3s on 16384 size blocks: 18814 aes-128 cbc's in 2.99s                                           
    OpenSSL 1.1.1g  21 Apr 2020                                                                                           
    built on: Wed Jun 17 14:27:03 2020 UTC                                                                                
    options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr) 
    compiler: aarch64-none-linux-gnu-gcc  --sysroot=recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map=                      -fdebug-prefix-map=                      -fdebug-prefixG
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    aes-128 cbc      94566.29k   100939.41k   101760.51k   102417.07k   102727.68k   103093.17k

  • Hi Yongjin,

    As mentioned earlier this is more of a hardware limitation between 240-256 byte length packets.
    So this will go via software.

    Best Regards,
    Keerthy

  • Hi,

    openssl speed -elapsed -evp aes-128-cbc gets all the way till SA2UL Driver and then a fallback is realized hence slower.
    openssl speed aes-128-cbc does not enter the SA2UL driver path hence faster(Pure software).

    If all your questions are answered please resolve this thread.

    Best Regards,
    Keerthy

  • Thank you for your answer. The previous question has not been answered. Will TI support AES encryption with SA2UL engine when the block size is 256 in the future?

  • Hi Yongjin,

    Keerthy J said:
    As mentioned earlier this is more of a hardware limitation between 240-256 byte length packets.
    So this will go via software.

    I have already answered your question on hardware limitation and hence 256 bytes will be supported via software.
    Please resolve this thread.


    Best Regards,
    Keerthy

  • Thank you Keerthy for your patient answer.