How do I get transparent, efficient, file system snapshotting or versioning on ext3/4?

Solution 1:

If you wrap your file systems using LVM, then you can create a snapshot volume using the underlying logical volume layer. It's a pretty simple process and surprisingly effective for standard "snapshotty" things, such as backup and undoing rm -fr oopsies.

Solution 2:

After 8 years of searching I found the SVNFS by Marco R. Gazzetta (which is different from older project with the same name by John Madden [which one does different things]). This SVNFS uses svn transparently in r/w operations:

Instead of creating a file system that does its own versioning, I used an existing versioning tool, subversion, and made its use transparent. The advantage is that this file system doesn't require you to learn a new tool, if you know subversion

It's written in Python and uses FUSE:

Now you start the versioning file system by invoking the script attached:

python svnfs.py -o svnroot=/home/marco/svnfiles /home/marco/myfiles

Once everything is fine, you should be able to get a listing of both directories and see that the contents are the same.

Now, if you create (almost) any file in either directory, it will show up on the other side of the fence, as well. The big difference is that if you create a file in the myfiles directory, it will automatically be placed under version control (the opposite is not true).

In the example SVNFS uses separate directory for the repo. Although I haven't tested it. For my needs I'd like to have repository right in my working dir.


I also have found reference to Reiser4's versioning capabilities 4 years ago:

See Reiser 4. Files are directories.

eg: diff -u main.C main.C/r/123

Or to access properties

cat main.C/p/svn-eolstyle

echo "foobar" > main.C/p/my-property 

It seems that it would be best to follow that model, since a major filesystem is already going that route.

-Paul Querna

But I haven't checked it too.


Two years ago I went for searching further, found project FiST for generating stackable file systems and contacted prof. Erez Zadok of Stony Brook University who was adviser / mentor for the project called versionfs long ago. Quoting:

http://www.fsl.cs.sunysb.edu/docs/versionfs-fast04/

http://www.fsl.cs.sunysb.edu/docs/versionfs-msthesis/versionfs.pdf

allows users to manage their own versions easily and efficiently. Versionfs provides this functionality with no more than 4% overhead for typical user-like workloads. Versionfs allows users to select both what versions are kept and how they are stored through retention policies and storage policies, respectively. Users can select the trade-off between space and performance that best meets their individual needs: full copies, compressed copies, or block deltas. Although users can control their versions, the administrator can enforce minimum and maximum values, and provide users sensible defaults.

Additionally, through the use of libversionfs, unmodified applications can examine, manipulate, and recover versions. Users can simply run familiar tools to access previous file versions, rather than requiring users to learn separate commands, or ask the system administrator to remount a file system. Without libversionfs, previous versions are completely hidden from users.

Finally, Versionfs goes beyond the simple copy-on-write employed by past systems: we implement copy-on-change. Though at first we expected that the comparison between old and new pages would be too expensive, we found that the increase in system time is more than offset by the reduced I/O and CPU time associated with writing unchanged blocks. When more expensive storage policies are used (e.g., compression), copy-on-change is even more useful.

It seemed very interesting to me but contacting the guys who worked on the project revealed that threre is no known place of it's source code. Professor himself stated in mail:

Versionfs's code is very old now, and it only worked in kernel 2.4. If you still want a stackable versioning f/s, then one would have to write it from scratch — possibly based on wrapfs (see wrapfs.filesystems.org/).

So there is no working project here though concept of stackable filesystems seems very nice to me. Would anyone like to start project based onf wrapfs, notify me please:)

Solution 3:

You can check gitfs. It's a FUSE filesystem based on git, pretty stable and super easy to use.

Basically, it's an overlay over git. Whenever you update a file or directory it create a commit with that change (knows to batche the commits so you don't end up with 100 commits when you unzip an archive). Also knows to sync your remote and merge the conflicts using 'always accept mine' strategy.

When you mount it, it brings you two directories: current and history. ├── current │   ├── test1.md │   ├── test2.md │   ├── test3.md -> current/test2.md │   ├── test4.md │   └── test_directory └── history ├── 2014-11-23 │   ├── 20-00-21-d71d1579a7 │   │   └── testing.md │   └── 20-42-32-7d09611d83 │   ├── test2.md │   └── testing.md ├── 2014-12-08 │   ├── 16-38-30-6d6e71fe47 │   │   ├── test2.md │   │   └── test1.md

More information can be found on this page.