 by swass » Sat Sep 05, 2015 1:14 am
by swass » Sat Sep 05, 2015 1:14 am 
			
			Moonman, we're golden.  It works.  There does appear to be a problem with something else that is causing openssl, for example, to skip using the hardware crypto for aes-ecb, but it works perfectly fine for aes-cbc.  Not sure why at the moment.  This is clear from examining the interrupts associated with the hardware crypto.  On the other hand, it does seem that LUKS is using hardware crypto with aes-ecb.  So this looks to be something wrong with OpenSSL (using the cryptodev version, but same without).
marvell_cesa without DMA:
$this->bbcode_second_pass_code('', '# openssl speed -evp aes-128-cbc -engine cryptodev -elapsed
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 45004 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 41091 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 37630 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 25995 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 5344 aes-128-cbc's in 3.00s
OpenSSL 1.0.2d 9 Jul 2015
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DHASH_MAX_LEN=64 -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=armv5te -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wl,-O1,--sort-common,--as-needed,-z,relro -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc        240.02k      876.61k     3211.09k     8872.96k    14592.68k')
marvell_cesa with DMA:
$this->bbcode_second_pass_code('', '# openssl speed -evp aes-128-cbc -engine cryptodev -elapsed
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-cbc for 3s on 16 size blocks: 43172 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 40990 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 40796 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 32189 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 10079 aes-128-cbc's in 3.00s
OpenSSL 1.0.2d 9 Jul 2015
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DHASH_MAX_LEN=64 -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=armv5te -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wl,-O1,--sort-common,--as-needed,-z,relro -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc        230.25k      874.45k     3481.26k    10987.18k    27522.39k')
marvell_cesa with DMA (other combinations):
$this->bbcode_second_pass_code('', '# openssl speed -evp aes-128-ecb -engine cryptodev -elapsed
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-128-ecb for 3s on 16 size blocks: 2066134 aes-128-ecb's in 3.00s
Doing aes-128-ecb for 3s on 64 size blocks: 557144 aes-128-ecb's in 3.00s
Doing aes-128-ecb for 3s on 256 size blocks: 142277 aes-128-ecb's in 3.00s
Doing aes-128-ecb for 3s on 1024 size blocks: 35786 aes-128-ecb's in 3.00s
Doing aes-128-ecb for 3s on 8192 size blocks: 4477 aes-128-ecb's in 3.00s
OpenSSL 1.0.2d 9 Jul 2015
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DHASH_MAX_LEN=64 -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=armv5te -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wl,-O1,--sort-common,--as-needed,-z,relro -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-ecb      11019.38k    11885.74k    12140.97k    12214.95k    12225.19k
# openssl speed -evp aes-256-ecb -engine cryptodev -elapsed
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-ecb for 3s on 16 size blocks: 1575155 aes-256-ecb's in 3.00s
Doing aes-256-ecb for 3s on 64 size blocks: 416983 aes-256-ecb's in 3.00s
Doing aes-256-ecb for 3s on 256 size blocks: 105958 aes-256-ecb's in 3.00s
Doing aes-256-ecb for 3s on 1024 size blocks: 26600 aes-256-ecb's in 3.00s
Doing aes-256-ecb for 3s on 8192 size blocks: 3323 aes-256-ecb's in 3.00s
OpenSSL 1.0.2d 9 Jul 2015
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DHASH_MAX_LEN=64 -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=armv5te -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wl,-O1,--sort-common,--as-needed,-z,relro -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-ecb       8400.83k     8895.64k     9041.75k     9079.47k     9074.01k
# openssl speed -evp aes-256-cbc -engine cryptodev -elapsed
engine "cryptodev" set.
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 41799 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 39894 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 38471 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 29770 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 7218 aes-256-cbc's in 3.00s
OpenSSL 1.0.2d 9 Jul 2015
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DHAVE_CRYPTODEV -DHASH_MAX_LEN=64 -Wa,--noexecstack -D_FORTIFY_SOURCE=2 -march=armv5te -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -Wl,-O1,--sort-common,--as-needed,-z,relro -O3 -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc        222.93k      851.07k     3282.86k    10161.49k    19709.95k')
LUKS without DMA:
$this->bbcode_second_pass_code('', '  cipher:  aes-ecb
  keysize: 256 bits
')
Write - O_DIRECT
$this->bbcode_second_pass_code('', '# dd if=/dev/zero of=./bigfile count=1024 bs=1M oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 116.527 s, 9.2 MB/s')
Read - O_DIRECT
$this->bbcode_second_pass_code('', '# dd of=/dev/null if=./bigfile count=1024 bs=1M iflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 110.617 s, 9.7 MB/s')
LUKS with DMA:
Write - O_DIRECT
$this->bbcode_second_pass_code('', '# dd if=/dev/zero of=./bigfile count=1024 bs=1M oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 80.6936 s, 13.3 MB/s')
Read - O_DIRECT
$this->bbcode_second_pass_code('', '# dd of=/dev/null if=./bigfile count=1024 bs=1M iflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 78.4823 s, 13.7 MB/s')
So, an improvement to be sure, but not an enormous one yet.  I'll work on optimizing the block sizes and changing from AES 256 to AES 128, which is still quite good.
			
				Last edited by 
swass on Sat Sep 05, 2015 2:12 am, edited 1 time in total.