Transaction isolation levels relation with locks on table

I have read about 4 levels of isolation:

Isolation Level       Dirty Read    Nonrepeatable Read  Phantom Read  
READ UNCOMMITTED      Permitted       Permitted           Permitted
READ COMMITTED              --        Permitted           Permitted
REPEATABLE READ             --             --             Permitted
SERIALIZABLE                --             --              --

I want to understand the lock each transaction isolation takes on the table

READ UNCOMMITTED - no lock on table
READ COMMITTED - lock on committed data
REPEATABLE READ - lock on block of sql(which is selected by using select query)
SERIALIZABLE - lock on full table(on which Select query is fired)

below are the three phenomena which can occur in transaction isolation
Dirty Read- no lock
Nonrepeatable Read - no dirty read as lock on committed data
Phantom Read - lock on block of sql(which is selected by using select query)

I want to understand where we define these isolation levels : only at jdbc/hibernate level or in DB also

PS: I have gone through the links in Isolation levels in oracle, but they looks clumsy and talk on database specific


I want to understand the lock each transaction isolation takes on the table

For example, you have 3 concurrent processes A, B and C. A starts a transaction, writes data and commit/rollback (depending on results). B just executes a SELECT statement to read data. C reads and updates data. All these process work on the same table T.

  • READ UNCOMMITTED - no lock on the table. You can read data in the table while writing on it. This means A writes data (uncommitted) and B can read this uncommitted data and use it (for any purpose). If A executes a rollback, B still has read the data and used it. This is the fastest but most insecure way to work with data since can lead to data holes in not physically related tables (yes, two tables can be logically but not physically related in real-world apps =\).
  • READ COMMITTED - lock on committed data. You can read the data that was only committed. This means A writes data and B can't read the data saved by A until A executes a commit. The problem here is that C can update data that was read and used on B and B client won't have the updated data.
  • REPEATABLE READ - lock on a block of SQL(which is selected by using select query). This means B reads the data under some condition i.e. WHERE aField > 10 AND aField < 20, A inserts data where aField value is between 10 and 20, then B reads the data again and get a different result.
  • SERIALIZABLE - lock on a full table(on which Select query is fired). This means, B reads the data and no other transaction can modify the data on the table. This is the most secure but slowest way to work with data. Also, since a simple read operation locks the table, this can lead to heavy problems on production: imagine that T table is an Invoice table, user X wants to know the invoices of the day and user Y wants to create a new invoice, so while X executes the read of the invoices, Y can't add a new invoice (and when it's about money, people get really mad, especially the bosses).

I want to understand where we define these isolation levels: only at JDBC/hibernate level or in DB also

Using JDBC, you define it using Connection#setTransactionIsolation.

Using Hibernate:

<property name="hibernate.connection.isolation">2</property>

Where

  • 1: READ UNCOMMITTED
  • 2: READ COMMITTED
  • 4: REPEATABLE READ
  • 8: SERIALIZABLE

Hibernate configuration is taken from here (sorry, it's in Spanish).

By the way, you can set the isolation level on RDBMS as well:

  • MySQL isolation level,
  • SQL Server isolation level
  • Informix isolation level (Personal Note: I will never forget about SET ISOLATION TO DIRTY READ sentence.)

and on and on...


As brb tea says, depends on the database implementation and the algorithm they use: MVCC or Two Phase Locking.

CUBRID (open source RDBMS) explains the idea of this two algorithms:

  • Two-phase locking (2PL)

The first one is when the T2 transaction tries to change the A record, it knows that the T1 transaction has already changed the A record and waits until the T1 transaction is completed because the T2 transaction cannot know whether the T1 transaction will be committed or rolled back. This method is called Two-phase locking (2PL).

  • Multi-version concurrency control (MVCC)

The other one is to allow each of them, T1 and T2 transactions, to have their own changed versions. Even when the T1 transaction has changed the A record from 1 to 2, the T1 transaction leaves the original value 1 as it is and writes that the T1 transaction version of the A record is 2. Then, the following T2 transaction changes the A record from 1 to 3, not from 2 to 4, and writes that the T2 transaction version of the A record is 3.

When the T1 transaction is rolled back, it does not matter if the 2, the T1 transaction version, is not applied to the A record. After that, if the T2 transaction is committed, the 3, the T2 transaction version, will be applied to the A record. If the T1 transaction is committed prior to the T2 transaction, the A record is changed to 2, and then to 3 at the time of committing the T2 transaction. The final database status is identical to the status of executing each transaction independently, without any impact on other transactions. Therefore, it satisfies the ACID property. This method is called Multi-version concurrency control (MVCC).

The MVCC allows concurrent modifications at the cost of increased overhead in memory (because it has to maintain different versions of the same data) and computation (in REPETEABLE_READ level you can't loose updates so it must check the versions of the data, like Hiberate does with Optimistick Locking).

In 2PL Transaction isolation levels control the following:

  • Whether locks are taken when data is read, and what type of locks are requested.

  • How long the read locks are held.

  • Whether a read operation referencing rows modified by another transaction:

    • Block until the exclusive lock on the row is freed.

    • Retrieve the committed version of the row that existed at the time the statement or transaction started.

    • Read the uncommitted data modification.

Choosing a transaction isolation level does not affect the locks that are acquired to protect data modifications. A transaction always gets an exclusive lock on any data it modifies and holds that lock until the transaction completes, regardless of the isolation level set for that transaction. For read operations, transaction isolation levels primarily define the level of protection from the effects of modifications made by other transactions.

A lower isolation level increases the ability of many users to access data at the same time, but increases the number of concurrency effects, such as dirty reads or lost updates, that users might encounter.

Concrete examples of the relation between locks and isolation levels in SQL Server (use 2PL except on READ_COMMITED with READ_COMMITTED_SNAPSHOT=ON)

  • READ_UNCOMMITED: do not issue shared locks to prevent other transactions from modifying data read by the current transaction. READ UNCOMMITTED transactions are also not blocked by exclusive locks that would prevent the current transaction from reading rows that have been modified but not committed by other transactions. [...]

  • READ_COMMITED:

    • If READ_COMMITTED_SNAPSHOT is set to OFF (the default): uses shared locks to prevent other transactions from modifying rows while the current transaction is running a read operation. The shared locks also block the statement from reading rows modified by other transactions until the other transaction is completed. [...] Row locks are released before the next row is processed. [...]
    • If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions.
  • REPETEABLE_READ: Shared locks are placed on all data read by each statement in the transaction and are held until the transaction completes.

  • SERIALIZABLE: Range locks are placed in the range of key values that match the search conditions of each statement executed in a transaction. [...] The range locks are held until the transaction completes.


The locks are always taken at DB level:-

Oracle official Document:- To avoid conflicts during a transaction, a DBMS uses locks, mechanisms for blocking access by others to the data that is being accessed by the transaction. (Note that in auto-commit mode, where each statement is a transaction, locks are held for only one statement.) After a lock is set, it remains in force until the transaction is committed or rolled back. For example, a DBMS could lock a row of a table until updates to it have been committed. The effect of this lock would be to prevent a user from getting a dirty read, that is, reading a value before it is made permanent. (Accessing an updated value that has not been committed is considered a dirty read because it is possible for that value to be rolled back to its previous value. If you read a value that is later rolled back, you will have read an invalid value.)

How locks are set is determined by what is called a transaction isolation level, which can range from not supporting transactions at all to supporting transactions that enforce very strict access rules.

One example of a transaction isolation level is TRANSACTION_READ_COMMITTED, which will not allow a value to be accessed until after it has been committed. In other words, if the transaction isolation level is set to TRANSACTION_READ_COMMITTED, the DBMS does not allow dirty reads to occur. The interface Connection includes five values that represent the transaction isolation levels you can use in JDBC.