Solution 1:

There are many options out there depending on your goals, infrastructure, and media preferences.

First you'll probably need to figure out how to setup cron jobs no matter what solution you end up going with. It's what runs scheduled tasks on *nix.

As for the backup itself I tend to go with rsnapshot as it's simple enough to setup and does what I need. Amanda and Bacula are both great solutions but involve databases and other things which complicate the backup and recovery. I tend to avoid complicated things when I need something reliable such as in the case of backup. Rsnapshot uses rsync over ssh to transfer the data between systems so it's secure and efficient. It then uses hardlinks so that you have many point in time snapshots of the filesystem you're backing up.

Databases have to be treated a bit special b/c you either need to lock the tables while you're running your backup job or dump the database tables to another location where you then backup using what ever method you choose. This can be done with a tool like mysqldump if you're using MySQL. This dump is normally automated using a cron job.

Solution 2:

Backups are always hard to tune properly; especially because people have different needs, and such needs are usually a mix between data 'snapshot' backup, data archiving, server (config) backup, reliable service, etc).

3dinfluence and davey are both right: it's important to try the restore operation (as Joel says), and a set of cron scripts is usually the first things to do; but additional actions have to be done depending on how much data you can "accept to loose", and which level of reliability you need.

So the questions you have to ask yourself are:

  • the purpose of the backup - "protect" your data/services against:

    • (local) hardware failure, like disk crashes;
    • bigger damage, like fire on the building;
    • user mistake (accidental deletion) or need to retrieve old data;
    • buggy package releases (upgrading the services is usually tough, etc).
  • acceptable downtime (and data losses), in case of different type of issue

    • disk failure? eg. no loss, no downtime(?)
    • other hardware failure (MB, CPU, etc)? one day of work lost, few hours downtime
    • fire (and water from the fireman)? one week lost, few days downtime
    • earthquake or blackout?

Depending on the answer to those questions, you'll see if daily backups are enough, or if you need a warm standby server staying in a different geographical location.

I'm not guru at all on this field, but maybe my example can gives you some idea.

I'm managing a small (debian) server, providing databases (postgresql), subversion repositories, trac sites and some others similar functions; the server is mainly used by our R&D group, so few people (~20 clients for the subversion), and some instruments (~50 clients for the database), but they are working almost 24/24h, 7/7days (especially the instruments, which feed the database with measures).

In case of average issue (like main hardware failure), a downtime of 2 to 4 hours is acceptable (instruments can work locally for a while). So I didn't have (yet) warn standby server, but only a set of local and distant backup and dumps.

So the requirements are not drastic at all: about hundred gigs of data, and less than hundred clients to serve.

The first "line of defense" is provided by the disk redundancy and partitioning (which not only help in case of disk crashes, but also for further backup or server upgrade) The machine is equipped with 4 disks (500Gb each).

  • 2 (soft)raid arrays (type 1, on 3 disks):
    • a small one, dedicated to /boot
    • and a large one, used by lvm (see below)
  • 2 lvm groups:
    • one made upon the large raid array (+ 1 non-raid partition on the 4th disk)
    • another one made with non-raid partitions only (50Gb on each of the first 3 disk, and half of the 4th disk)
  • finally, the partitions:
    • / and /var on two lvm volumes from the large raid; user data are all stored on /var ...
    • the non-raid extends of the first vgroup are reserved for snapshots (lvm)
    • /boot directly on the small raid 1 array
    • /tmp and a special /backup on two lvm (linear volumes) from the second vgroup (non-raid). the last drive is used, extends on the 3 others are reserved for snapshots.

The second defensive line is the regular backups: they are made from cron scripts, essentially launched every days (eg. hotbackup for svn, trac sites, copy of the db files, etc.) or every week (database dump, svn dump, etc.) Exact ways of doing each backup depends on the services; for example, subversion provides tools for the (fast) hot backup (using hard links, etc) and textual dump, but a simple rsync is used for the database, made from a lvm snapshot.

All those backups goes on the (local!) /backup partition (to be fast); this partition is usually mounted in read-only; two sudoeable scripts are used to re-bind it in rw mode (beginning of a backup), and back to ro (at the end). A semaphore made upon lockfiles is used to take care of concurrent backups.

Each time that the /backup is switched to ro (and also every 4 hours), a mirroring action is scheduled (using 'at' with a small delay to concatenates changes from the third line). The mirroring is made (using rsync) to a different server, from which data are archived on tapes (every day, with only one year of retention) and, over the network, to a set of distant terra stations.

To avoid the risk of loosing an entire day of work, at minimal bandwidth cost, a third line is also in place, which consist in doing incremental backup - where it's possible.

Examples:

  • for subversion, every revisions are dumped in a single (compressed file), from the post-commit and post-revpropschange hook (using svn-backup-dumps)
  • for the database, using continuous archiving (& PITR concept)

These increments are saved using the same concept as for the daily and weekly copy (to the local backup partition first, and with a small delay, on the second host). Of course, the daily copy and weekly dump scripts have to take care of the rotation of the increments.

Notice that this third line is really close to a warn standby server (well, it's a necessary step toward this concept).

Finally, the config of the server itself (to monimize the work if a new one has to be setup). As I wasn't convince by the ghosting solution (new machine will have different hardware, disk config, etc.), I'd setup a dedicated subversion repository, on which I put every script or config file that I've manually edited (directly, or indirectly via a user-interface). So the root of the filesystem (/) is a working copy. In addition, a cron task scheduled daily is responsible to save the list of installed packages (dpkg), the partitions tables (fdisk), raid and lvm setup and so on.

BTW, it's certainly the weakest point: the server config is on a subversion repos, served by the "same" host. Anyway, it's fairly easy to use one of the repository backup (either a fast backup or the dump) from a different machine (even a windows one), if needed.

In addition, I also try to conscientiously make lvm snapshots before touching any script (or package upgrade) on the main system. But the lifetime of those lvm snapshot should be as short as possible, due to the big penalty introduced on the other services.

Well, I hope it can help, or at least gives ideas...