Does Nginx use the hardware AES support of Intel Core i7 or other similar architectures?

I am trying to use Nginx as a reverse proxy with features like load balancing and SSL offload, and I need to buy the proper hardware.

In some cases I need a high throughput SSL offload, and I am wondering if Nginx uses the hardware AES features of Intel Core i7 (or the server Xeon Nehalem CPU product line) or not!

Does using Nginx with such CPUs gain me more throughput on SSL offload, or it would be a waste of money?


You can verify that nginx was built with OpenSSL by running nginx -V.

[root@saurok ~]# nginx -V
nginx version: nginx/1.8.0
built by gcc 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC)
built with OpenSSL 1.0.1k-fips 8 Jan 2015
TLS SNI support enabled
...

You can verify that OpenSSL uses Intel AES-NI by running OpenSSL's internal benchmarks.

Compare the output of openssl speed aes-128-cbc with openssl speed -evp aes-128-cbc. The former skips hardware acceleration even if present, while the latter uses acceleration if available. Except for the benchmark, it will be used automatically if present.

For example:

[root@saurok ~]# openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 32797518 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 9030109 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 2311493 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 582201 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 72836 aes-128 cbc's in 3.00s
OpenSSL 1.0.1k-fips 8 Jan 2015
built on: Thu Aug 13 12:19:54 2015
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc     174920.10k   192642.33k   197247.40k   198724.61k   198890.84k

as compared to

[root@saurok ~]# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 169042680 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 45311567 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 11536773 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 2897474 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 362685 aes-128-cbc's in 3.00s
OpenSSL 1.0.1k-fips 8 Jan 2015
built on: Thu Aug 13 12:19:54 2015
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
compiler: -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DKRB5_MIT -m64 -DL_ENDIAN -DTERMIO -Wall -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches  -m64 -mtune=generic -Wa,--noexecstack -DPURIFY -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc     901560.96k   966646.76k   984471.30k   989004.46k   990371.84k

As you can see, the latter is much faster, indicating hardware acceleration is in use.


Nginx has nothing to do with hardware offloading; that's down to the crypto library in use. Normally, you'll use OpenSSL, and if it is an appropriate version, configured appropriately, that will use hardware offloading for crypto operations supported in silicon on modern CPUs.