Does Nginx use the hardware AES support of Intel Core i7 or other similar architectures?
I am trying to use Nginx as a reverse proxy with features like load balancing and SSL offload, and I need to buy the proper hardware.
In some cases I need a high throughput SSL offload, and I am wondering if Nginx uses the hardware AES features of Intel Core i7 (or the server Xeon Nehalem CPU product line) or not!
Does using Nginx with such CPUs gain me more throughput on SSL offload, or it would be a waste of money?
You can verify that nginx was built with OpenSSL by running nginx -V
.
[root@saurok ~]# nginx -V
nginx version: nginx/1.8.0
built by gcc 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC)
built with OpenSSL 1.0.1k-fips 8 Jan 2015
TLS SNI support enabled
...
You can verify that OpenSSL uses Intel AES-NI by running OpenSSL's internal benchmarks.
Compare the output of openssl speed aes-128-cbc
with openssl speed -evp aes-128-cbc
. The former skips hardware acceleration even if present, while the latter uses acceleration if available. Except for the benchmark, it will be used automatically if present.
For example:
[root@saurok ~]# openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 32797518 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 9030109 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 2311493 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 582201 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 72836 aes-128 cbc's in 3.00s
OpenSSL 1.0.1k-fips 8 Jan 2015
built on: Thu Aug 13 12:19:54 2015
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 174920.10k 192642.33k 197247.40k 198724.61k 198890.84k
as compared to
[root@saurok ~]# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 169042680 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 45311567 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 11536773 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2897474 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 362685 aes-128-cbc's in 3.00s
OpenSSL 1.0.1k-fips 8 Jan 2015
built on: Thu Aug 13 12:19:54 2015
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 901560.96k 966646.76k 984471.30k 989004.46k 990371.84k
As you can see, the latter is much faster, indicating hardware acceleration is in use.
Nginx has nothing to do with hardware offloading; that's down to the crypto library in use. Normally, you'll use OpenSSL, and if it is an appropriate version, configured appropriately, that will use hardware offloading for crypto operations supported in silicon on modern CPUs.