Saturday, 25 January 2014

All about transaction

As you can find many articles, tutorials in the web explaining about transaction, I will try to write this article in FAQ style. Hope you can find something interesting after reading this.

A. What is transaction?

Transaction in Java context normally mean database transaction. It comprises of one or more interactions with database, which should be treated as one unit of work. This unit of work can be committed or rolled back.

B. Why people invent transaction?

There are two concerns that cannot be solved without transaction:

1/ First, if you have some logical unit of work that involve interactions with multiple database that you will want it to success or fail together (think of transferring of money from 2 bank accounts store in 2 different databases)

2/ Secondly, there are many users accessing the database concurrently and your work requires a series of interaction with database. You do not want people to see the intermediate result of your work. For example, transfering money from one account to other account in the same DB still require 2 interactions with DB and you must not let the user see the DB changes after the first interaction (whether you have deduct the money from one account but not credit to the other or opposite).

C. Can transaction fully solve the concerns above?

Up to a certain extent.

For the first concern, transaction fit well as long as you do not write to anywhere except database. Sometimes, you do not have that luxury. For example, writing to file system, uploading content or interacting with legacy system are some of the things that do not have roll back feature.

For the second concern, you cannot fully isolate your database interaction in concurrent environment. Unless you lock other DB connections from reading records or tables, otherwise, there will be some expose up to certain level. The more isolated your transaction is, the more performance your system will suffer. Because of that, in real situations, developers rarely let the transaction fully isolated. Depend on your business requirements, you will choose one of the transaction isolation levels as below (from Productive JavaEE 5 by Adam Bien):

READ UNCOMMITTED
Already written, but still uncommitted data is visible for other transactions. If any of the changes are rolled back, another transaction might have retrieved an invalid row. This setting can even be problematic for read-only applications such as reporting. Reports created with READ_UNCOMMITTED can be partially based on stale or even non-existing data.

READ COMMITTED
Only committed data is visible to other transactions. The isolation is higher, because the stale data is isolated until the commit―and only then visible to others. You can rely on the consistency of the read data; however, the data remains consistent only for the length of the read operation. When you re-execute the same query in the current active transaction, you might see different results.

REPEATABLE READS
On this level, a query operation with the same parameters ensures the same results. Even if other transactions committed the changes, which affected some rows in the result set, it will not change the result set. Although delete and change operations are not visible in the result set, new rows can still be added by another transaction and appear in the result.

SERIALIZABLE
This is the highest possible level. The name is derived from serial access, no interferences between transactions are allowed. Rereading a query always returns the same result, regardless of whether the data was changed, added or deleted. This level is very consistent and very costly. Read and write locks are needed to ensure this consistency level. The price for the consistency is high as well: the lowest scalability (because lack of concurrency) and increased chances for deadlocks.

D. How you deal with the limit of the transaction

To make it short, to deal with the limit of transaction, your system need to know how to recover when error does happen. There are two categories of errors that may will need to tackle.

1/ Non database related error. It happens when you happen to write something to somewhere at the middle of the transaction and the transaction roll back due to some DB error. Fortunately, it is quite safe to assume that you will have some kind of exception thrown by transaction manager if the roll back happen. Unfortunately, as your system is not the database, it does not know how to undo the changes if there is transaction roll back. You will need to take responsibility to identify the exception type and attempt to recover system integrity if possible.

2/ The second issues rise from the transaction isolation. It is possible that the data you want to work with may not be latest or may not be available any more at the time the transaction commit. As the issue is well known, the solution for it is quite also well known. It is divided to two strategy:

Pessimistic Locking: As the solution is pessimistic, it is assume that interference between transaction will happen and the better way to fix the issue is to never let that happen. The idea is to avoid sharing resource, whether by row lock, table lock or synchronized access in application code. This approach sure work with the great performance sacrifice.

Optimistic Locking: As the performance sacrifice for resource locking is high, the preferred approach is optimistic locking. Optimistic locking is based on the assumption that we will have a reliable way to identify if there is any any data change at the middle of the transaction. If that happen, the transaction exception will be thrown but this unfortunate case should not happen often. To track this change, people often include version column in table and increase version any time modification happen.