Database transactions - How do they work?
I'm trying to learn more about database transactions, I found the ACID rule of thumb for writing transactions and thought of a few questions.
The ACID rule of thumb:
A transaction must be:
- Atomic - it is one unit of work and does not dependent on previous and following transactions.
- Consistent - data is either committed or roll back, no “in-between” case where something has been updated and something hasn’t.
- Isolated - no transaction sees the intermediate results of the current transaction.
- Durable - the values persist if the data had been committed even if the system crashes right after.
I was wondering how they work under the hood, so I can better understand the factors that need to be considered when writing such a transaction. I guess the specific details will vary between the database implementations that are avaliable, but certain rules will always be in place.
- How does the database handle concurrent transactions whilst still supporting the Atomic rule?
- Is there a queue of transactions that is processed in order?
- How is a lengthy transaction that is holding up all others handled?
- Are updates to tables done in memory so if a crash does occur before commit, there is no alteration to the database?
- Or are there some intermediate tables that are updated to survive such a crash?
- Whilst a transaction is in progress, is all read and write access to the affected tables prevented?
- Or would the database allow writes but the transaction would overwrite all changes upon commit?
Solution 1:
-
There are many different ways, including transaction queueing, optimistic concurrency control etc. This is actually a very complex question, there are books written about it:
http://www.amazon.co.uk/Databases-Transaction-Processing-Application-Oriented-Approach/dp/0201708728/ref=sr_1_3?ie=UTF8&s=books&qid=1281609705&sr=8-3
-
It depends on the level of logging in the database. If strict write-ahead logs are kept then in the case of a system crash, the database can be wound back to a consistent state.
-
It depends on the type of concurrency. Optimistic concurrency involves no locks, but if the state of the db has changed once the transaction has finished, it is abandoned and restarted. This can speed up dbs where collisions are rare. There are also different levels of locking: row,table, or even the entire db.
These are very complex questions, I'd advise buying a book, or attending a concurrent systems lecture series if you want to be able to fully answer them :-)
Solution 2:
A few nitpickings on your definitions:
Atomic - it is one unit of work and does not dependent on previous and following transactions.
A more correct definition of atomicity would not mention any "previous or following" transactions. Atomicity is a property of a single transaction taken by itself, namely that in the final countdown, either all of its actions persist, or none at all. In other words, it shall not be the case that "only half a transaction" is allowed to persist.
The concept is, however, blurred by concepts such as nested transactions, savepoints, and the ability for the user to request explicit rollbacks up to a taken savepoint. These do allow, in a certain sense, that "only half the actions of a transaction" persist, allbeit at the explicit user's request.
Consistent - data is either committed or roll back, no “in-between” case where something has been updated and something hasn’t.
This interpretation is totally wrong. Consistent means that the transaction processor (in this case, a DBMS engine) cannot leave the system (the database) in a state of violation of any declared constraint that it (the transaction processor) is aware of. See, for example, "Introduction to database systems", Chpt 16.
Isolated - no transaction sees the intermediate results of the current transaction.
Nitpicking : no transaction other than the current is allowed to see intermediate states (states, not really results). Note furtermore that the "Isolation levels" of transaction processing engines typically define the degree to which the I property can be violated !
Durable - the values persist if the data had been committed even if the system crashes right after.
But this property too is blurred a bit by the possibility of nested transactions. Even if an inner transaction has committed and completed, the containing transaction can still undo that commit by itself rolling back completely.
Solution 3:
The actual details would probably depend somewhat on which DB server it is, but this article might be of interest to you: Transaction Processing Cheat Sheet