Cliffhanger: The backups are right... here... right?

At my work, backups have a surprisingly low priority. The backup strategy was implemented a while ago, and since then it's just assumed the backups are fine. If you ask the sysadmins, they'll say everything is backed up.

But then, when you ask for a SPECIFIC backup, half the time they are not there:

  • The disk got full
  • The tape failed
  • Looks like someone disabled the backup job
  • The network connection had downtime
  • We ordered that disk years ago, but finance hasn't approved the purchase order
  • The files are corrupt
  • File contains wrong database
  • Only transaction log backups (useless without a full one)

A few weeks ago, disaster came real close as one of the servers lost one too many raid disks. Luckily one disk was still kind enough to copy the data, if you tried a lot of times.

But even after that near-disaster, I can't seem to convince the sysadmins to improve the situation. So I'm wondering, any tips for opening people's eyes? It seems to me we're walking along the edge of a cliff.


You always have to get these things fixed from the top.

Is the current backup strategy backed by and understood by management? If not, it's useless.

The executive management needs to know about the problems and what risks are involved (losing financial data that you need to bring out legally to survive, or customer data that has taken years to collect?) and weigh that in deciding on actions, or deciding on letting someone (like you) take action.

If you can't get to management, try business controllers or other financial positions where data retrieval and its integrity is of high importance to the company's reports. They in turn can "start the storm" if needed...


Where to begin? This is a disaster waiting to happen. A Sysadmins primary job function is to ensure data is backed up and recoverable. Everything else is secondary. No if's no but's.

Here are a few things you can do:

  1. Track KPIs for restores. It should be possible to produce a report showing how many requests for restores have been successful. Anything less than 100% should be investigated thoroughly. Management love reports and this is hard evidence.

  2. There should be documented procedures for all backup and restore operations, including all systems and their backup strategy, tape rotations, schedules, escalation paths, test restores etc. Ask to see them.

  3. Speak to the manager of the sys admins and voice your concerns. Go armed with proof that restores aren't working. If no joy go higher.

Seriously - kick up a fuss. Stuff like this can destroy a company.