This thread has been locked.

If you have a related question, please click the "Ask a related question" button in the top right corner. The newly created question will be automatically linked to this question.

Linux/AM5718: AES crypto accelerator issue

Part Number: AM5718


Tool/software: Linux

I am trying to use the hardware crypto on the latest debian prebuilt and "openssl speed" benchmark looks similar to the user guide but I'm not seeing any practical acceleration when encrypting/decrypting data. Any thoughts on why the benchmark differs from actual use?

The prebuilt image has omap_aes_driver, _rng, and others out of the box but no cryptodev/ocf so I built them and openssl from source following the crypto user guide and other online knowledge

> lsmod
...
cryptodev              42926  0
omap_aes_driver        23912  1
...
> openssl version
OpenSSL 1.0.2l  25 May 2017

> openssl engine
(cryptodev) BSD cryptodev engine
(dynamic) Dynamic engine loading support

OpenSSL speed testing (without the -elapsed) shows a huge difference

> modprobe -r cryptodev
> time openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 10795232 aes-128-cbc's in 2.98s
...
OpenSSL 1.0.2l  25 May 2017
...
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      57960.98k    69444.09k    72792.31k    74125.49k    74541.66k

real    0m15.018s
user    0m14.856s
sys     0m0.004s

> modprobe cryptodev
> time openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 408163 aes-128-cbc's in 0.18s
...
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      36281.16k   125167.16k   710721.42k  1855129.60k         infk

real    0m15.018s
user    0m0.528s
sys     0m14.316s

Problem #1 - the previous speed test matches what the guide/knowledge base suggests I should see.  If I re-run the speed benchmark using -elapsed, I see a different story. Looks like the cryptodev has slower bytes/sec

> modprobe -r cryptodev
> time openssl speed -evp aes-128-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 10773301 aes-128-cbc's in 3.00s
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      57457.61k    68461.38k    72319.66k    73284.95k    73654.27k

real    0m15.018s
user    0m14.852s
sys     0m0.008s

> modprobe cryptodev
> time openssl speed -evp aes-128-cbc -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 418051 aes-128-cbc's in 3.00s
...
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc       2229.61k     7969.09k    21457.49k    37310.12k    45110.61k

real    0m15.019s
user    0m0.396s
sys     0m14.456s

Problem #2 - Then the most important test - actually encrypting a file. This file exists on a fast mSATA card. Here are the hdparm/dd speeds and is 1021MB in size.

> hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   1764 MB in  2.00 seconds = 881.94 MB/sec
 Timing buffered disk reads: 616 MB in  3.00 seconds = 205.19 MB/sec

> dd if=/dev/zero of=/home/debian/work/temp bs=8k count=100k; rm -f /home/debian/work/temp
102400+0 records in
102400+0 records out
838860800 bytes (839 MB) copied, 4.82506 s, 174 MB/s

Encrypt the file without/with the cryptodev loaded. Without cryptodev it's encrypting about 34MB/sec, with crypto dev it's slower at 22MB/sec. 

> modprobe -r cryptodev
> time openssl enc -aes-128-cbc -salt -in bbb.mp4 -out bbb.mp4.enc -k "123456"
real    0m29.860s
user    0m15.524s
sys     0m7.256s

> modprobe cryptodev
> time openssl enc -aes-128-cbc -salt -in bbb.mp4 -out bbb.mp4.enc -k "123456"
real    0m46.070s
user    0m1.408s
sys     0m37.260s

  • Hi,

    Debian is not supported by TI. Please try with the AM57x Processor SDK: www.ti.com/.../PROCESSOR-SDK-AM57X You can find performance benchmarks here: processors.wiki.ti.com/.../Processor_SDK_Linux_Kernel_Performance_Guide
  • Great link, it seems to match my findings - that page shows the hardware engine has slower throughput than software for all aes but software has a higher CPU% - is that what we should be expecting with the accelerator? doesn't seem right

    Hardware/Software Test am57xx-evm
    Hardware aes-128-cbc_throughput_8192_by (KBytes/s) 32530.43
    Software aes-128-cbc_throughput_8192_by (KBytes/s 48944.47
  • Hi Shaun,

    Yes, it's true, the A15 is probably going to be faster than the dedicated crypto accelerator in Linux.

    A few things to consider:

    1) Crypto core is running at L3 bus speed, 266MHz max.  A15 is going to be running up to 1.5GHz.  If you application needs to manage power consumption, it is possible offloading to crypto core could save you some significant power (would be interesting to characterize!)
    2) Multiprocessing OS like Linux will limit crypto hardware throughput substantially.  I would estimate we are only realizing ~25% of performance potential of a baremetal application.  Without some serious hacking of the driver, I'm not sure we can do much better.
    3) The A15 is free to do other tasks while the dedicated crypto hw chews on the data - maybe doesn't help some applications that are blocking on the encrypted data.

    Hopefully this helps - the intent is not to be antagonizing, but shed some light on why the crypto core is not faster than the A15.

    Regards,
    Mike