Storing duplicate files efficiently on linux

I host a lot of websites and our system makes it easy to duplicate items in these sites which is handy, but leads to lots of duplicated (and potentially quite large) files. I was wondering if these is any mechanism in linux (specifically Ubuntu) where the filesystem will only store the file once but link to it from all its locations.

I'd need this to be transparent, and also handle the case that if a user changes one of the files, it doesn't alter the contents of the main file but creates a new file for just this particular instance of the file.

The point of the exercise is to reduce wasted space used by duplicated files.

I'd need this to be transparent

ZFS-on-Linux × feature called "on-line deduplication".

UPD.: I've re-read your question once again now it looks like Aufs can be of help for you. It's very popular solution for hosting environments. And actually I can mention Btrfs by myself now as well — the pattern is you have some template sub-volume which you snapshot every time you need another instance. It's COW, so only changed file blocks would need more space. But keep in mind, Btrfs is, ergh… well, not too stable anyways. I'd use it in production only if data on it are absolutely okay to be gone.

There is a linux user space/fuse filesystem that will do this dedup.

http://sourceforge.net/p/lessfs/wiki/Home/

Linux Journal has a good article on it in it's August 2011 issue. There are also various filesystem specific options with btrfs and zfs.

SMART "power on hours" attribute for different HDD manufacturers

In Apache how to define multiple ProxyPass to different servers with the same context-root?

GPO: Run PowerShell logon script after explorer.exe has been loaded

How to properly configure fail2ban to ban IP if it is accessing some wrong files

Does CoreOS have a cluster aware job scheduler?

Using more advanced filters with ansible setup module

Windows Server Backup Fails: There is not enough disk space to create the volume shadow copy...

MySQL remote connection - Access denied for user 'username'@'localhost' (using password: YES)

Getting a chunked request through nginx