nginx reverse proxy slows down my throughput by half

I'm currently using nginx to proxy back to gunicorn with 8 workers. I'm using an amazon extra large instance with 4 virtual cores. When I connect to gunicorn directly I get about 10K requests/sec. When I serve a static file from nginx I get about 25 requests/sec.

But when I place gunicorn behind nginx on the same physical server I get about 5K requests/sec. I understand there will be some latency from nginx, but I think there might be a problem since it's a 50% drops. Anybody heard of something similar? any help would be great!

Here is the relevant nginx conf:

worker_processes 4;
worker_rlimit_nofile 30000;

events {
worker_connections 5120;
}

http {

sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;

}

sites-enabled/default:

upstream backend {
server 127.0.0.1:8000;
}

server {
server_name api.domain.com ;

location / {
proxy_pass http://backend;
proxy_buffering off;
}
}

Be sure to add the multi_accept on; directive to your events block. This ensures that each worker accepts as many connections as possible.

Do not use tcp_nodelay on; if you're not serving huge data / streams. Even if you are you should only activate it in the appropriate location block.

Do not proxy just everything to your backend, only proxy what should really be served from your backend. You might also want to create a proxy cache to speed up everything even more. The following is an example configuration I made up according to the configuration you posted above.

# /etc/nginx/nginx.conf

worker_processes                  4;
worker_rlimit_nofile              20480; # worker_connections * 4
events {
  multi_accept                    on;
  worker_connections              5120;
  use                             epoll;
}
http {
  charset                         utf-8;
  client_body_timeout             65;
  client_header_timeout           65;
  client_max_body_size            10m;
  default_type                    application/octet-stream;
  keepalive_timeout               20;
  reset_timedout_connection       on;
  send_timeout                    65;
  server_tokens                   off;
  sendfile                        on;
  server_names_hash_bucket_size   64;
  tcp_nodelay                     off;
  tcp_nopush                      on;
  include                         sites-enabled/*.conf;
}

And the virtual host.

# /etc/nginx/sites-available/default.conf

upstream backend {
  server 127.0.0.1:8000;
}

server {
  server_name api.domain.com;

  location / {
    try_files $uri $uri/ @backend;
  }

  location @backend {
    proxy_buffering off;
    proxy_pass http://backend;
  }
}