How to make nginx rewrite URIs in HTTP body content?

This is a follow-on to my earlier question on how to Make nginx reverse proxy 302 redirect to a URI sub-folder instead of root.

I have an nginx proxy server that uses the rewrite and proxy_pass directives to proxy external requests to a URL like https://domain.com/my/web/app/ to an internal LAN server at https://10.0.0.22/. Here's my attempt to represent the translation in ASCII UML:

                                    .-------------.       .------------------.
                                    | Nginx proxy  |      | Local web server |
                                    | (domain.com) |      | (10.0.0.22)      |
                                    '-------------'       '------------------'
                                               |                 |
                                               |                 |
GET https://domain.com/my/web/app/ ----------->|                 |
                                               |---------------->| GET /
                                               |                 |
                                               |<----------------| 302 redirect /login.php
302 redirect /my/web/app/login.php <-----------|                 |
                                               |                 |
GET https://domain.com/my/web/app/login.php -->|                 |
                                               |---------------->| GET /login.php
                                               |                 |
                                               |<----------------| 200
HTML body content (images, CSS, links) <-------|

Here's the actual location block in my nginx configuration file:

location ^~ /my/web/app/
{
    proxy_buffering   off;
    rewrite           /my/web/app/(.*) /$1 break;
    proxy_pass        https://10.0.0.22/;
    proxy_redirect    default;
}

It works great for URI translation between the internal and external URI paths for HTTP requests and responses, but any URIs in the HTML content (body images, CSS, scripts) are not translated.

For example, images with relative paths embedded in the HTML response, with URIs like /images/logo.png, are passed back to the web client and interpreted as https://domain.com/images/logo.png instead of https://domain.com/my/web/app/images/logo.png.

I can understand why this is happening, but it would be wonderful if there was a way to dynamically proxy content as well as requests. Is there a way to get nginx to also convert URIs embedded in the HTML content? Is it possible to dynamically parse and update HTML content as it passes through the proxy server?


The only solution to this that I have found so far is the HttpSubsModule (see also the github page).

The module is not part of the official Nginx sources, so you will probably need to build Nginx yourself to use this module.


Using the ngx_http_sub_module, one can do something like the following (added into the config example above, immediately under the proxy_pass directive):

    sub_filter_once off;
    sub_filter ' href="/' ' href="/my/web/app/';
    sub_filter ' src="/' ' src="/my/web/app/';
    sub_filter ' action="/' ' action="/my/web/app/';

That should replace link URLs in various contexts (<a href=, <img src=, <link href=, <script src=, <form action=, etc.) to have anything that starts with a / get that / replaced with /my/web/app/.

The first line tells it to keep scanning beyond the first match (critical for getting every link in the resulting HTML); the other three lines update the various forms of links and resource identifiers.

If your server does redirects, you may also need one or more of the following (you probably only need one, but which one depends a bit on how the app behaves):

    proxy_redirect / /my/web/app/; # for redirects just using /
    proxy_redirect https://10.0.0.22/ /my/web/app/; # redirects using backend url
    proxy_redirect https://$http_host/ /my/web/app/; # proxy-aware (see below)

Note on that third item: At least in the environment I was using (a test rails app, using http -- so a bit different from yours), the app (Rails) was generating a Location: header that used the Host header I'd passed in with proxy_set_header (which I needed for other reasons), so it looked something like: Location: https://example.com/something; this third line was therefore what I needed to use, so that I could match https://example.com/ and replace it with /my/web/app/.

(It seems to me that I've somewhere seen another header that can be set that would make the app aware of the base URL pre-proxy, which might be able to make this not required... but I failed to find it just now, and this worked for me, so...)