Using nginx to rewrite urls inside outgoing responses

We have a customer with a site running on Apache. Recently the site has been seeing increased load and as a stop gap we want to shift all the static content on the site to a cookieless domains, e.g. http://static.thedomain.com.

The application is not well understood. So to give the developers time to amend the code to point their links to the static content server (http://static.thedomain.com) I thought about proxying the site through nginx and rewriting the outgoing responses such that links to /images/... are rewritten as http://static.thedomain.com/images/....

So for example, in the response from Apache to nginx there is a blob of Headers + HTML. In the HTML returned from Apache we have <img> tags that look like:

<img src="/images/someimage.png" />

I want to transform this to:

<img src="http://static.thedomain.com/images/someimage.png" />

So that the browser upon receiving the HTML page then requests the images directly from the static content server.

Is this possible with nginx (or HAProxy)?

I have had a cursory glance through the docs but nothing jumped out at me except rewriting inbound urls.


Solution 1:

There is a http://wiki.nginx.org/HttpSubModule - "This module can search and replace text in the nginx response."

copy past from docs:

Syntax:

sub_filter string replacement

Example:

location / {
  sub_filter      </head>
  '</head><script language="javascript" src="$script"></script>';
  sub_filter_once on;
}

Solution 2:

It is best to use the proxy feature and fetch the content from the appropriate place, as opposed to rewriting URLs and sending redirects back to the browser.

A good example of proxying content looks like:

#
#  This configuration file handles our main site - it attempts to
# serve content directly when it is static, and otherwise pass to
# an instance of Apache running upon 127.0.0.1:8080.
#
server {
    listen :80;

    server_name  www.debian-administration.org debian-administration.org;
        access_log  /var/log/nginx/d-a.proxied.log;

        #
        # Serve directly:  /images/ + /css/ + /js/
        #
    location ^~ /(images|css|js) {
        root   /home/www/www.debian-administration.org/htdocs/;
        access_log  /var/log/nginx/d-a.direct.log ;
    }

    #
    # Serve directly: *.js, *.css, *.rdf,, *.xml, *.ico, & etc
    #
    location ~* \.(js|css|rdf|xml|ico|txt|gif|jpg|png|jpeg)$ {
        root   /home/www/www.debian-administration.org/htdocs/;
        access_log  /var/log/nginx/d-a.direct.log ;
    }


        #
        # Proxy all remaining content to Apache
        #
        location / {

            proxy_pass         http://127.0.0.1:8080/;
            proxy_redirect     off;

            proxy_set_header   Host             $host;
            proxy_set_header   X-Real-IP        $remote_addr;
            proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

            client_max_body_size       10m;
            client_body_buffer_size    128k;

            proxy_connect_timeout      90;
            proxy_send_timeout         90;
            proxy_read_timeout         90;

            proxy_buffer_size          4k;
            proxy_buffers              4 32k;
            proxy_busy_buffers_size    64k;
            proxy_temp_file_write_size 64k;
        }
}

In this configuration, instead of redirecting requests to static.domain.com and expecting the browser to make another request, nginx simply serves the file from the relevant local path. If the request is dynamic then the proxy kicks in and fetches the response from an Apache server (local or remote) without the end user ever knowing.

I hope that helps