What happens during a live SQL Server backup?
Some of my coworkers were surprised when I told them that I can back up an SQL Server database while it's still running and wondered how that's possible. I know that SQL Server is capable of backing up a database while it is still online but I'm not sure how to explain why it's possible. My question is what effect does this have on the database?
If data is modified (by an insert, update, or delete) while the backup is running, will the backup contain those changes or will it be added to the database afterwards?
I'm assuming that the log file plays an important role here but I'm not quite sure how.
edit: Just as a note, my case involves backing up the databases using SQL Server Agent and the effects of database modifications during this process.
Full backup contains both the data and log. For data, it simply copies each page of the database into the backup, as is at the moment it reads the page. It then appends into the backup media all the 'relevant' log. This includes, at the very least, all the log between the LSN at the start of the backup operation and the LSN at the end of the backup operation. In reality there is more log usually, as it has to include all active transactions at the start of backup and log needed by replication. See Debunking a couple of myths around full database backups.
When the database is restored, all the data pages are copied out into the database files, then all the log pages are copied out into the log file(s). The database is inconsistent at this moment, since it contains data page images that may be out of sync with one another. But now a normal recovery is run. Since the log contains all the log during the backup, at the end of the recovery the database is consistent.
You can't just copy it over since there can be alterations to the database mid-copy as you alluded to in the question.
It has to be done with agents that are aware of the database functionality and then take a "snapshot" via OS functions or can use a utility to dump the database in a safe state (like mysqldump, if using mysql).
Otherwise you get a backup that can be corrupted and you won't know it until you restore it. I think Joel and Jeff recently talked about it a little on a recent StackOverflow podcast.
And you're right in that the log file is important. If the journal/log file is out of sync with the actual data, restoring the files will result in corruption.
It boils down to a backup taken using a safe state of the database, either through a database-aware agent or snapshot application or application that is aware of how to properly hook the database into dropping data without interfering with updates during the data dump then backing up the resulting file.
During backup, snapshot will be created for the database and the data will be read for backup from that snapshot. The actual live DB operations won't affect the backup operation.