hello ,
We discovered a problem while testing hardware acceleration for AM62X platform。
using the devcrypto engine:The cpu usage is 99%
time -v openssl speed -evp SHA512 -engine devcrypto
Invalid engine "devcrypto"
Doing SHA512 for 3s on 16 size blocks: 1410387 SHA512's in 2.98s
Doing SHA512 for 3s on 64 size blocks: 1409513 SHA512's in 2.98s
Doing SHA512 for 3s on 256 size blocks: 765527 SHA512's in 2.98s
Doing SHA512 for 3s on 1024 size blocks: 323599 SHA512's in 2.98s
Doing SHA512 for 3s on 8192 size blocks: 50709 SHA512's in 2.99s
Doing SHA512 for 3s on 16384 size blocks: 25769 SHA512's in 2.97s
version: 3.0.9
built on: Tue May 30 12:31:57 2023 UTC
options: bn(64,64)
compiler: aarch64-oe-linux-gcc --sysroot=recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map= -fdebug-prefix-map= -fdebug-prefix-map= -fdebug-prefix-map= -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SHA512 7572.55k 30271.42k 65763.39k 111196.43k 138932.48k 142154.65k
Command being timed: "openssl speed -evp SHA512 -engine devcrypto"
User time (seconds): 17.90
System time (seconds): 0.01
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 18.03s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 23840
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 13
Minor (reclaiming a frame) page faults: 313
Voluntary context switches: 12
Involuntary context switches: 21
Swaps: 0
File system inputs: 1648
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
do not use the devcrypto engine:The cpu usage is 99%
root@OK62xx:/# time -v openssl speed -evp SHA512
Doing SHA512 for 3s on 16 size blocks: 1410248 SHA512's in 2.98s
Doing SHA512 for 3s on 64 size blocks: 1407024 SHA512's in 2.99s
Doing SHA512 for 3s on 256 size blocks: 764826 SHA512's in 2.98s
Doing SHA512 for 3s on 1024 size blocks: 323385 SHA512's in 2.98s
Doing SHA512 for 3s on 8192 size blocks: 50719 SHA512's in 2.98s
Doing SHA512 for 3s on 16384 size blocks: 25800 SHA512's in 2.98s
version: 3.0.9
built on: Tue May 30 12:31:57 2023 UTC
options: bn(64,64)
compiler: aarch64-oe-linux-gcc --sysroot=recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map= -fdebug-prefix-map= -fdebug-prefix-map= -fdebug-prefix-map= -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SHA512 7571.80k 30116.90k 65703.17k 111122.90k 139426.19k 141848.05k
Command being timed: "openssl speed -evp SHA512"
User time (seconds): 17.90
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 18.02s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 23696
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 313
Voluntary context switches: 1
Involuntary context switches: 10
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
We analyzed sa2ul.c and found a problem with the source code Settings
static struct sa_alg_tmpl sa_algs[] = {
...
[SA_ALG_CBC_AES] = {
.type = CRYPTO_ALG_TYPE_SKCIPHER,
.alg.skcipher = {
.base.cra_name = "cbc(aes)",
.base.cra_driver_name = "cbc-aes-sa2ul",
.base.cra_priority = 30000,
...
[SA_ALG_SHA256] = {
.type = CRYPTO_ALG_TYPE_AHASH,
.alg.ahash = {
.halg.base = {
.cra_name = "sha256",
.cra_driver_name = "sha256-sa2ul",
.cra_priority = 400, [SA_ALG_SHA256] = {
...
[SA_ALG_SHA512] = {
.type = CRYPTO_ALG_TYPE_AHASH,
.alg.ahash = {
.halg.base = {
.cra_name = "sha512",
.cra_driver_name = "sha512-sa2ul",
.cra_priority = 400,
After changing cra_priority = 400 to cra_priority = 30000, I recompiled sa2ul.c for testing,The phenomenon is as follows
root@OK62xx:/# time -v openssl speed -evp SHA512 -engine devcrypto
Invalid engine "devcrypto"
20306A87FFFF0000:error:1300006D:engine routines:dynamic_load:init failed:../openssl-3.0.9/crypto/engine/eng_dyn.c:514:
20306A87FFFF0000:error:13000074:engine routines:ENGINE_by_id:no such engine:../openssl-3.0.9/crypto/engine/eng_list.c:430:id=devcrypto
20306A87FFFF0000:error:12800067:DSO support routines:dlfcn_load:could not load the shared library:../openssl-3.0.9/crypto/dso/dso_dlfcn.c:118:filename(libdevcrypto.so): libdevcrypto.so: cannot open shared object file: No such file or directory
20306A87FFFF0000:error:12800067:DSO support routines:DSO_load:could not load the shared library:../openssl-3.0.9/crypto/dso/dso_lib.c:152:
20306A87FFFF0000:error:13000084:engine routines:dynamic_load:dso not found:../openssl-3.0.9/crypto/engine/eng_dyn.c:442:
Doing SHA512 for 3s on 16 size blocks: 1414478 SHA512's in 2.99s
Doing SHA512 for 3s on 64 size blocks: 1409612 SHA512's in 2.98s
Doing SHA512 for 3s on 256 size blocks: 765600 SHA512's in 2.98s
Doing SHA512 for 3s on 1024 size blocks: 323552 SHA512's in 2.98s
Doing SHA512 for 3s on 8192 size blocks: 50691 SHA512's in 2.98s
Doing SHA512 for 3s on 16384 size blocks: 25750 SHA512's in 2.98s
version: 3.0.9
built on: Tue May 30 12:31:57 2023 UTC
options: bn(64,64)
compiler: aarch64-oe-linux-gcc --sysroot=recipe-sysroot -O2 -pipe -g -feliminate-unused-debug-types -fmacro-prefix-map= -fdebug-prefix-map= -fdebug-prefix-map= -fdebug-prefix-map= -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_BUILDING_OPENSSL -DNDEBUG
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
SHA512 7569.11k 30273.55k 65769.66k 111180.28k 139349.22k 141573.15k
Command being timed: "openssl speed -evp SHA512 -engine devcrypto"
User time (seconds): 17.90
System time (seconds): 0.01
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0m 18.02s
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 23840
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 320
Voluntary context switches: 1
Involuntary context switches: 12
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The test results showed that although we set the -engine devcrypto parameter in the test, our system sa3 failed and still used software encryption.How to fix this?
"time -v openssl speed-evp aes-128-cbc-engine devcrypto" and "time -v openssl speed-evp aes-128-cbc-engine" ."The two instructions execute at almost the same time. Can hardware acceleration reduce instruction runtime?