HTTPS + Nginx Reverse Proxy + URL rewrite

I am trying to direct all HTTPS traffic to Nginx server where it will handle all the requests as HTTP requests to all internal servers. So far, I am able to get the template below to work for most of my servers.

server {
    listen 443 default ssl;
    ssl_certificate /etc/letencrypt/live/somesite.com/fullchain.pem;
    ssl_certificate_key /etc/letencrypt/live/somesite.com/privkey.pem;
    server_name somesite.com;

    location ^~ /Service {
      proxy_pass http://192.168.1.2;
    }

    location / {
      proxy_pass http://192.168.1.3;
    }
}

However, I am restricted to always having to match up https://somesite.com/Service with http://192.168.1.2/Service in order for the above to work.

I can't have https://somesite.com/Service to work with http://192.168.1.2/Hello.

Or that I can't direct this to other port like https://somesite.com/Service with http://192.168.1.2:3000.

For instance, when I changed the above to this:

location /Service/ {
    proxy_pass http://192.168.1.2:80/;
}

location / {
    proxy_pass http://192.168.1.3;
}

Using the following logging setup:

log_format upstreamlog '[$time_local] $remote_addr - $remote_user - $server_name to: $upstream_addr: $request upstream_response_time $upstream_response_time msec $msec request_time $request_time';
access_log /var/log/nginx/access.log upstreamlog;

This is the log I got:

[22/Jan/2021:09:56:28 +0000] 172.56.38.95 - - - somesite.com to: 192.168.1.2:80: GET /Service/ HTTP/1.1 upstream_response_time 0.004 msec 1611309388.445 request_time 0.004
[22/Jan/2021:09:56:28 +0000] 172.56.38.95 - - - somesite.com to: 192.168.1.2:80: GET /Service/js/default.cache.a331c8c3.js HTTP/1.1 upstream_response_time 0.000 msec 1611309388.547 request_time 0.002
[22/Jan/2021:09:56:28 +0000] 172.56.38.95 - - - somesite.com to: 192.168.1.2:80: GET /Service/favicon.ico HTTP/1.1 upstream_response_time 0.012 msec 1611309388.757 request_time 0.012
[22/Jan/2021:09:56:28 +0000] 172.56.38.95 - - - somesite.com to: 192.168.1.3:80: GET /api/v1/oauth.json?_=1611309389573 HTTP/1.1 upstream_response_time 0.016 msec 1611309388.771 request_time 0.017

It can be seen that from the log that the first three fetches are correct. The fourth one is wrong. No further request was made afterward. After tracing a bit more, I realize that 192.168.1.2 already has a Nginx server running and processing PHP pages using FastCGI. I don't know if that makes a difference or not.

So I tried using rewrite in combination of what I have above, but I ran into a Page Not Found. I presume that it doesn't seem to work for HTTPS maybe? Thus, it led me to asking the question of how to configure Nginx to reverse proxy with URL rewrite and HTTPS externally.


Solution 1:

See NGINX documentation, the first example is exactly what you are doing and is well explained:

if you confiure a location block with a path1 and a proxy_pass with a path2, you'll end up with whole "path tree" being relocated from path1 to path2:

server {
    ...
    server_name example.net;
    location /some/path/ {
        proxy_pass http://www.example.com/link/;
    }
}

if you'll access https://example.net/some/path/blah it'll proxy that to http://www.example.com/link/blah.

Next caveat is what exactly your origin server returns. It may emit some HTML, some JS, some CSS, and they all can contain HTTP(S) links; some of them are relative, but some arent. The very important consideration is a proxy_pass directive doesn't instruct Nginx to rewrite any links inside proxied data. If there are non-relative paths, they'll remain as they were and client will interpret them as instruction to go "outside" of proxied prefix.

As we see in you log file, web server returns some Javascript code on request 2:

[22/Jan/2021:09:56:28 +0000] 172.56.38.95 - - - somesite.com to: 192.168.1.2:80: GET /Service/js/default.cache.a331c8c3.js HTTP/1.1 upstream_response_time 0.000 msec 1611309388.547 request_time 0.002

That code likely contains some non-relative paths, i.e. it has a path starting with /api/. Client then makes a request according to that path, which is outside of proxied tree (which is only /Service/ for now), we see it on the line 4:

[22/Jan/2021:09:56:28 +0000] 172.56.38.95 - - - somesite.com to: 192.168.1.3:80: GET /api/v1/oauth.json?_=1611309389573 HTTP/1.1 upstream_response_time 0.016 msec 1611309388.771 request_time 0.017

Nginx, according to its configuration, correctly proxies this request elsewhere (it gets catched with location / rule).

There are several ways to get around that. If you have just a few prefixes used with non-relative paths, and nothing else is already using them, you may just proxy them as they are returned:

# (inside a "server" block, above "location /" rule)
    location /api/ {
        proxy_pass http://www.example.com/api/;
    }

Other way is to install a filter in the Nginx using a ngx_http_sub_module, which would alter served content, updating all URIs to the new base.

Please, bear in mind, there can't be a completely bulletproof way to do this. Links may appear not only in text-based HTML, CSS and JS files, but also in proprietary binary files. And we don't know a priori what links they may contain to proxy them preventively. In the recent past an Adobe Flash was an example of such binary format. Nobody knows what gets invented in the future to expose the same problem.