Is it possible to replace content on every page passed through a proxy similar to how mod_rewrite is used for URLs?
Is it possible to replace content on every page passed through a proxy similar to how mod_rewrite is used for URLs? The documentation on substitute is not clear.
I have some pages I am reverse proxying that have absolute paths. This breaks the site. They need replacing and tools like mod_rewrite are not picking them up as they are not URL requests.
<VirtualHost *:80>
ServerName servername1
ServerAlias servername2
ErrorLog "/var/log/proxy/jpuat_prox_error_log"
CustomLog "/var/log/proxy/jpuat_prox_access_log" common
RewriteEngine on
LogLevel alert rewrite:trace2
RewriteCond %{HTTP_HOST} /uat.site.co.jp$ [NC]
RewriteRule ^(.*)$ http://jp.uat.site2uk.co.uk/$1 [P]
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|i"
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyPass / http://uat.site.co.jp/
ProxyPassReverse / http://uat.site.co.jp/
</VirtualHost>
Neither of the above works at replacing the HTML string
<link href="//uat.site.co.jp/css/css.css
with
<link href="//uat.site2uk.co.uk/css/css.css
Conf after changes:
<VirtualHost *:80>
ServerName jp.uat.site2uk.co.uk
ServerAlias uat.site.co.jp
ErrorLog "/var/log/proxy/jpuat_prox_error_log"
CustomLog "/var/log/proxy/jpuat_prox_access_log" common
ProxyRequests Off
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
ProxyPass / http://uat.site.co.jp/
ProxyPassReverse / http://uat.site.co.jp/
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s|uat.site.co.jp|jp.uat.site2uk.co.uk|ni"
</VirtualHost>
Solution 1:
There's an apache module called mod_substitute that can do this. Here's a short example:
<Location "/">
AddOutputFilterByType SUBSTITUTE text/html
Substitute "s/uat.site.co.jp/jp.uat.site2uk.co.uk/ni"
</Location>
Or, when combined with mod_proxy:
ProxyPass / http://uat.site.co.jp/
ProxyPassReverse / http://uat.site.co.jp/
Substitute "s|http://uat.site.co.jp/|http://jp.uat.site2uk.co.uk/|i"
There's more information at the Apache documentation for mod_substitute.
Solution 2:
If you haven't restarted Apache, be sure to do that, but if you've already done so, you could try a global output filter that runs a custom PHP script to do your replacing just to see if that solves it for some reason.
EDIT: based on your comment, it could be that substitute isn't working because the content is compressed. To turn off compression, add these lines to your VirtualHost:
RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity
If that doesn't work, try the following:
Add these to your conf, updating the paths of course:
#add this outside of any VirtualHost tags
ExtFilterDefine proxiedcontentfilter mode=output cmd="/usr/bin/php /var/www/proxyfilter.php"
#add these in your VirtualHost tag
RequestHeader unset Accept-Encoding
RequestHeader set Accept-Encoding identity
SetOutputFilter proxiedcontentfilter
In proxyfilter.php have some code like the following:
#!/usr/bin/php
<?php
$html = file_get_contents('php://stdin');
$html = str_ireplace('uat.site.co.jp', 'jp.uat.site2uk.co.uk', $html);
file_put_contents('php://stdout', $html);
If this works, then narrow the focus of this to just text/html content as you have in your example.
Solution 3:
According to https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#proxypassreverse which rewrites the headers, you use "
To rewrite HTML content to match the proxy, you must load and enable mod_proxy_html.