Long-term storage of Business Critical data
This is a really tricky one, and to some extent it's not a technical problem, and might not belong here, but
Server Fault is for system administrators ... who manage or maintain computers in a professional capacity
And I do.. and this is one of my tasks.. Anyway.
Imagine you had 5000 + double-sided pages of A4. Company data, all business critical.
You need to back it up, somehow.
Proposed solutions so far are:
- PDF -> Online storage
- PDF -> DVD / BluRay / Tape
- PDF -> Portable HDD / SSD / Flash drive.
- Buy/Lease/Hire/'Steal' a big photocopier, and make copies.
- ???
Immediate problems with the above:
- What if the storage partner goes bust?
- DVDs do rot over time. Tapes similarly.
- These too, break over time.
- Expensive. Slow. Heavy. Not Tree Friendly.
The Question(s):
What is the gold-standard for long-to-medium term data preservation and archiving? Have you solved a similar problem in the workplace?
After the initial loading, there is some requirement to add to the collection roughly 100 pages a month. Retrieval should be possible, easily, but probably is infrequent.
Ideally I'd like to guarantee that the solution will be workable long after I have left the company, and that it won't require a massive amount to keep it maintainable, so storing many many DVDs is not only not ideal, but also not a good long-term solution.
While just making paper copies is certainly the easiest, it's not the most environmentally friendly, not by a long way. It's also not very manageable, difficult to search, index, and so on. Combined with heavy, and difficult to physically store.
I quite like the idea in principle of having everything stored electronically, but the actual mechanism of doing this needs to be transparent and easy. I really don't want to be responsible for this forever and a day, supporting office users as they cock it up, and lose documents. I also don't want to be reliant on a single storage vendor, what if Dropbox (We have an online backup solution ATM, but it isn't Dropbox.) were to go bust, or otherwise experience a catastrophic event, how many businesses who are using their services would be up the creek, sans paddle?
There's some budget flexibility here, but I suspect anything that costs more than our current online backup (which is like 2500USD/year), would be viewed less than favourably, compared to just putting it in a shoebox under a bed. Which is no-doubt what would happen if I did nothing, and resigned tomorrow.
Any ideas?
-Edit-
The reason for doing this is twofold.
1) provide a sensible secure backup of business critical paperwork in the event the office burns down.
2) to satisfy data archiving laws WRT uk tax law for businesses and so on.
Edit 2:
Having some mechanism for indexing the documents would be bloody useful too..
Solution 1:
Keeping the data in a format like PDF is probably safe, because there are Free tools to read it. The volume of data you're talking about is fairly small (1,200 pages / year) so even at a 300 dpi scan resolution you're only talking about tens of gigabytes per year.
The physical storage device problem is never going to go away, though. Whatever media you use to store electronic data (tape, optical, etc) will eventually need to be updated to a newer media. Plan and budget for "kicking the data down the road" to new formats as new formats replace older formats.
I'd probably look at optical media as a first choice simply because you have so little data. I'd also plan on burning 3x duplicates of everything and refreshing the media every 2 - 3 years.
If optical media is too small I'd go with LTO tape and refresh the media every 4 - 5 years. That's going to be pretty expensive, though, for such a small amount of data.
Solution 2:
There are specific systems that internally use DVDs and migrate the data to new media every so often. Look up digital preservation.
Since the storage requirements rise pretty quickly, it is advisable to switch to a newer, bigger type of media every few years anyways.
Assuming you get the data in paper form, you need to:
- List the data at mail entry. This may mean giving each sheet a unique barcode.
- Scan it. Use the barcode identifier as filename. Archive the paper.
- Archive the data. Put the data on a revision secure archiving system. A fileserver will not be good enough because something might happen to the files if they are write accessible.
- Make it read accessible for other systems.
In the customers case, it is all the invoices for a large organizations that have to be transferred to online system (SAP). The archive storage went through several iterations by now. Currently they are moving to blue ray.
On the other hand, nowadays everything goes onto disks, so maybe something along these lines would be your way to go: http://www.eurostor.com/german/iTernity.D.php
Solution 3:
Our solution: Scan to PDF -> Backup to Tape
We have a document scanner, does ~30 pages/min and produces OCRed PDF files. We back those up to Tape (LTO4 specifically) which has a shelf life of 50 to 100 years (finding a tape drive might be difficult in the time frame, but there are data recovery places that will still recover 8" floppy disks around).
Solution 4:
I think Amazon's new Glacier service is an interesting offering in this space.
Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. With Amazon Glacier, customers can reliably store large or small amounts of data for as little as $0.01 per gigabyte per month, a significant savings compared to on-premises solutions.