Is it possible to replace content on every page passed through a proxy similar to how mod_rewrite is used for URLs?

Is it possible to replace content on every page passed through a proxy similar to how mod_rewrite is used for URLs? The documentation on substitute is not clear.

I have some pages I am reverse proxying that have absolute paths. This breaks the site. They need replacing and tools like mod_rewrite are not picking them up as they are not URL requests.

<VirtualHost *:80>
    ServerName  servername1
    ServerAlias servername2

    ErrorLog "/var/log/proxy/jpuat_prox_error_log"
    CustomLog "/var/log/proxy/jpuat_prox_access_log" common

    RewriteEngine on
    LogLevel alert rewrite:trace2
    RewriteCond %{HTTP_HOST} /uat.site.co.jp$ [NC]
    RewriteRule ^(.*)$ http://jp.uat.site2uk.co.uk/$1 [P]

    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|i"


    ProxyRequests Off

    <Proxy *>
            Order deny,allow
            Allow from all
    </Proxy>

    ProxyPass / http://uat.site.co.jp/
    ProxyPassReverse / http://uat.site.co.jp/
</VirtualHost>

Neither of the above works at replacing the HTML string

<link href="//uat.site.co.jp/css/css.css

with

<link href="//uat.site2uk.co.uk/css/css.css

Conf after changes:

<VirtualHost *:80>
    ServerName  jp.uat.site2uk.co.uk
    ServerAlias uat.site.co.jp
    ErrorLog "/var/log/proxy/jpuat_prox_error_log"
    CustomLog "/var/log/proxy/jpuat_prox_access_log" common
    ProxyRequests Off
    <Proxy *>
        Order deny,allow
        Allow from all
    </Proxy>
    ProxyPass / http://uat.site.co.jp/
    ProxyPassReverse / http://uat.site.co.jp/
    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|ni"
</VirtualHost>

Solution 1:

There's an apache module called mod_substitute that can do this. Here's a short example:

<Location "/">
    AddOutputFilterByType SUBSTITUTE text/html
    Substitute "s/uat.site.co.jp/jp.uat.site2uk.co.uk/ni"
</Location>

Or, when combined with mod_proxy:

ProxyPass / http://uat.site.co.jp/
ProxyPassReverse / http://uat.site.co.jp/

Substitute "s|http://uat.site.co.jp/|http://jp.uat.site2uk.co.uk/|i"

There's more information at the Apache documentation for mod_substitute.

Solution 2:

If you haven't restarted Apache, be sure to do that, but if you've already done so, you could try a global output filter that runs a custom PHP script to do your replacing just to see if that solves it for some reason.

EDIT: based on your comment, it could be that substitute isn't working because the content is compressed. To turn off compression, add these lines to your VirtualHost:

RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity

If that doesn't work, try the following:

Add these to your conf, updating the paths of course:

#add this outside of any VirtualHost tags
ExtFilterDefine proxiedcontentfilter mode=output cmd="/usr/bin/php /var/www/proxyfilter.php"

#add these in your VirtualHost tag
RequestHeader unset Accept-Encoding 
RequestHeader set Accept-Encoding identity
SetOutputFilter proxiedcontentfilter

In proxyfilter.php have some code like the following:

#!/usr/bin/php
<?php
$html = file_get_contents('php://stdin');
$html = str_ireplace('uat.site.co.jp', 'jp.uat.site2uk.co.uk', $html);
file_put_contents('php://stdout', $html);

If this works, then narrow the focus of this to just text/html content as you have in your example.

Solution 3:

According to https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypassreverse which rewrites the headers, you use "

To rewrite HTML content to match the proxy, you must load and enable mod_proxy_html.