What is the best way to backup Azure Blob Storage contents
I know that the Azure Storage entities (blobs, tables and queues) have a built-in resiliency, meaning that they are replicated to 3 different servers in the same datacenter. On top of that they may also be replicated to a different datacenter altogether that is physically located in a different geographical region. The chance of losing your data in this case is close to zero for all practical purposes.
However, what happens if a sloppy developer (or the one under the influence of alcohol :)) accidentally deletes the storage account through the Azure Portal or the Azure Storage Explorer tool? Worst yet, what if a hacker gets hold of your account and clears the storage? Is there a way to retrieve the gigabytes of deleted blobs or is that it? Somehow I think there has to be an elegant solution that Azure infrastructure provides here but I cannot find any documentation.
The only solution I can think of is to write my own process (worker role) that periodically backs up my entire storage to a different subscription/account, thus essentially doubling the cost of storage and transactions. Any thoughts?
Regards,
Archil
Solution 1:
Depending on where you want to backup your data, there are two options available:
Backing up data locally - If you wish to backup your data locally in your infrastructure, you could: a. Write your own application using either Storage Client Library or consuming REST API or b. Use 3rd party tools like Cerebrata Azure Management Cmdlets (Disclosure: I work for Cerebrata).
Backing up data in the cloud - Recently, Windows Azure Storage team announced Asynchronous Copy Blob functionality which will essentially allow you to copy data from one storage account to another storage account without downloading the data locally. The catch here is that your target storage account should be created after 7th June 2012. You can read more about this functionality on Windows Azure Blog: http://blogs.msdn.com/b/windowsazurestorage/archive/2012/06/12/introducing-asynchronous-cross-account-copy-blob.aspx.
Hope this helps.
Solution 2:
The accepted answer is fine, but it took me a few hours to decipher through everything.
I've put together solution which I use now in production. I expose method Backup()
through Web Api
which is then called by an Azure WebJob
every day (at midnight).
Note that I've taken the original source code, and modified it:
- it wasn't up to date so I changed few method names
- added retry copy operation safeguard (fails after 4 tries for the same blob)
- added a little bit of logging - you should swap it out with your own.
- does the backup between two storage accounts (replicating containers & blobs)
- added purging - it gets rid of old containers that are not needed (keeps 16 days worth of data). you can always disable this, as space is cheap.
the source can be found from: https://github.com/ChrisEelmaa/StackOverflow/blob/master/AzureStorageAccountBackup.cs
and this is how I use it in the controller (note your controller should be only callable by the azure webjob - you can check credentials in the headers):
[Route("backup")]
[HttpPost]
public async Task<IHttpActionResult> Backup()
{
try
{
await _blobService.Backup();
return Ok();
}
catch (Exception e)
{
_loggerService.Error("Failed to backup blobs " + e);
return InternalServerError(new Exception("Failed to back up blobs!"));
}
}
note: I wanted to add this code as part of the post, but wasted 6 minutes trying to get that code into this post, but failed. the formatting didn't work at all, and it broke completely.
Solution 3:
I have used Azure Data Factory to backup Azure storage with great effect. It's really easy to use, cost effective and work very well.
Simply create a Data Factory (v2), set up data connections to your data sources (it currently supports Azure Tables, Azure Blobs and Azure Files) and then set up a data copy pipeline.
The pipelines can merge, overwrite, etc. and you can set up custom rules/wildcards.
Once you've set up the pipeline, you should then set up a schedule trigger. This will kick off the backup at an interval to suit your needs.
I've been using it for months and it's perfect. No code, no VMS, no custom PowerShell scripts or third party software. Pure Azure solution.
Solution 4:
I have had exactly the same requirements: backing up blobs from Azure as we have millions of blobs of customers and you are right - a sloppy developer with full access can compromise the entire system.
Thus, I wrote an entire application "Blob To Local Backup", free and open source on github under the MIT license: https://github.com/smartinmedia/BlobToLocalBackup
It solves many of your issues, namely: a) you can give only READ-access to this application, so that the application cannot destroy any data on Azure b) backup to a server, where your sloppy developer or the hacker does not have the same access as to your Azure account. c) The software provides versioning, so you can even protect yourself from e. g. ransom/encryption attacks. d) I included a serialization method instead of a database, so you can even have millions of files on Azure and you are still able to keep the synch (we have 20 million files on Azure).
Here is how it works (for more detailed information, read the README on github):
- You set up the appsettings.json file in the main folder. You can give the LoginCredentials here for the entire access or do it more granular on a storage account level:
{
"App": {
"ConsoleWidth": 150,
"ConsoleHeight": 42,
"LoginCredentials": {
"ClientId": "2ab11a63-2e93-2ea3-abba-aa33714a36aa",
"ClientSecret": "ABCe3dabb7247aDUALIPAa-anc.aacx.4",
"TenantId": "d666aacc-1234-1234-aaaa-1234abcdef38"
},
"DataBase": {
"PathToDatabases": "D:/temp/azurebackup"
},
"General": {
"PathToLogFiles": "D:/temp/azurebackup"
}
}
}
- Set up a job as a JSON file like this (I have added numerous options):
{
"Job": {
"Name": "Job1",
"DestinationFolder": "D:/temp/azurebackup",
"ResumeOnRestartedJob": true,
"NumberOfRetries": 0,
"NumberCopyThreads": 1,
"KeepNumberVersions": 5,
"DaysToKeepVersion": 0,
"FilenameContains": "",
"FilenameWithout": "",
"ReplaceInvalidTargetFilenameChars": false,
"TotalDownloadSpeedMbPerSecond": 0.5,
"StorageAccounts": [
{
"Name": "abc",
"SasConnectionString": "BlobEndpoint=https://abc.blob.core.windows.net/;QueueEndpoint=https://abc.queue.core.windows.net/;FileEndpoint=https://abc.file.core.windows.net/;TableEndpoint=https://abc.table.core.windows.net/;SharedAccessSignature=sv=2019-12-12&ss=bfqt&srt=sco&sp=rl&se=2020-12-20T04:37:08Z&st=2020-12-19T20:37:08Z&spr=https&sig=abce3e399jdkjs30fjsdlkD",
"FilenameContains": "",
"FilenameWithout": "",
"Containers": [
{
"Name": "test",
"FilenameContains": "",
"FilenameWithout": "",
"Blobs": [
{
"Filename": "2007 EasyRadiology.pdf",
"TargetFilename": "projects/radiology/Brochure3.pdf"
}
]
},
{
"Name": "test2"
}
]
},
{
"Name": "martintest3",
"SasConnectionString": "",
"Containers": []
}
]
}
}
- Run the application with your job with:
blobtolocal job1.json