How to make an existing caching Nginx proxy use another proxy to bypass a firewall?

My question is about using Nginx as a proxy behind another proxy. (Somewhat confusing.)

I want to set up Nginx so it acts as a caching proxy server to an npm mirror. Here is the link: http://eng.yammer.com/a-private-npm-cache/

On my local machine, which is not restricted by a firewall, the following configuration works fine:

proxy_cache_path /var/cache/npm/data levels=1:2 keys_zone=npm:20m max_size=1000m
inactive=365d;
proxy_temp_path /var/cache/npm/tmp;

server {
   listen 80;
   server_name classen.abc.lan;
   location / {
      proxy_pass http://registry.npmjs.org/;
      proxy_cache npm;
      proxy_cache_valid 200 302 365d;
      proxy_cache_valid 404 1m;
      sub_filter 'registry.npmjs.org' 'classen.abc.lan';
      sub_filter_once off;
      sub_filter_types application/json;
   }
}

Now I want to apply it to a server that is behind an additional firewall. In the logs, I can confirm that it accesses the correct upstream IP, but the request fails because of the internal firewall.

We have one internal proxy, which I can use to bypass the firewall, for example:

$ curl http://registry.npmjs.org
curl: (7) couldn't connect to host
$ http_proxy=http://proxy.abc.lan:1234/ curl http://registry.npmjs.org
... succeeds ...

This trick does not work with Nginx, as it ignores the http_proxy environment variable. After reading the documentation, I still could not figure out how to modify the configuration, so that it can use the proxy internally.

Is it possible to combine both solutions? It is important that the caching still works, otherwise, you can just use the external mirror registry.npmjs.org directly.

Maybe, Nginx should use the internal proxy (proxy.abc.lan) as proxy_pass, but then how does the internal proxy know that the request should be sent to the external npm mirror (http://registry.npmjs.org)?

Update to Lukas answer

I tried Lukas solution:

rewrite ^(.*)$ "http://registry.npmjs.org$1" break;
proxy_pass http://proxy.abc.lan:1234;

The logs show that the URL is rewritten but it results in a redirect (triggered by curl classen.abc.lan/test-url):

2014/03/24 11:31:16 [notice] 13827#0: *2 rewritten redirect: "http://registry.npmjs.org/test-url", client: 172.18.40.33, server: classen.abc.lan, request: "GET /test-url HTTP/1.1", host: "classen.abc.lan"

The result of the curl call is not the expected JSON string from http://registry.npmjs.org but a html page generated by Nginx:

$ curl classen.abc.lan/test-url
<html>
<head><title>302 Found</title></head>
<body bgcolor="white">
<center><h1>302 Found</h1></center>
<hr><center>nginx/1.4.7</center>
</body>
</html>

Solution 1:

The issue with Lukas's solution is HttpRewriteModule , which automatically turns everything with http(s) at the front into a 302.

If you instead do the rewrite in two stages - the second one 'break' - it should work. e.g.

rewrite ^(.*)$ "://registry.npmjs.org$1";
rewrite ^(.*)$ "http$1" break;
proxy_pass http://proxy.abc.lan:1234;

I suspect there's a nicer way to do this, but it appears to work.

Solution 2:

RFC 2616, Section 5.1.2 states

The absoluteURI form is REQUIRED when the request is being made to a proxy.
[...]
An example Request-Line would be:

  GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1

So what you are supposed to do is pass the request to the proxy with those modified directives:

rewrite ^(.*)$ "http://registry.npmjs.org$1" break;
proxy_pass http://proxy.abc.lan:1234;

According to the nginx docs, using rewrite ... break; will force nginx to use the rewritten URI (now an absolute URI as the protocol requires) instead of trying to build it from the proxy_pass directive.

Solution 3:

I think it may be simpler than either of the examples above. They are using rewrite to rewrite the url, I think you can use proxy_pass but pass the url to the proxy setting the host header param to the location you want to go to. e.g.

http {

  upstream corporate_proxy  {
      server web-proxy.mycorp.com:8080;
  }

server {
   listen 80;
   server_name classen.abc.lan;
   location / {
      proxy_pass_header on;
      proxy_set_header Host "registry.npmjs.org";
      proxy_pass http://corporate_proxy;
      proxy_cache npm;
      proxy_cache_valid 200 302 365d;
      proxy_cache_valid 404 1m;
      sub_filter 'registry.npmjs.org' 'classen.abc.lan';
      sub_filter_once off;
      sub_filter_types application/json;
   }
}