Welcome, concurrency-minded nuonians, to part 3 of the MVCC overview! In part 1, we laid out the basics of MVCC and described how MVCC enables much higher concurrency than conventional read/write locks (even per row). In part 2, we explored how MVCC works in the presence of interleaved reads and updates. This post is going to cover visibility and the subtleties of maintaining transactional consistency with MVCC.
What is visibility?
The whole reason why we’re employing MVCC is to increase the amount of concurrency we can reasonably support in the database. This is good, because it means higher performance. However, it means that the system has to reason about two (or more) transactions executing concurrently while maintaining the illusion that one transaction has logically ‘happened before’ the other. In an MVCC system, multiple versions of a given record may be installed in the table and a reading transaction may have to compute which record version is ‘visible’ and therefore should be returned when the transaction reads that record. Given a record with multiple versions, a transaction needs to figure out which transactions installed those versions. With that information, any transaction can decide which version was installed when the transaction logically started. That version is the one that is logically visible.
Think about a system with a large table (many millions of rows), and one transaction that’s aggregating all the rows (summing them, say). While it is executing, many short transactions are coming in and updating those rows (due to user actions). Logically, the long-running transaction ‘happens before’ all of the updates (this is a circumstance where snapshot consistency is equivalent to having a serial ordering of all transactions), therefore it would be an error to have the long-running transaction read the updates of transactions that happen after the long-running transaction (assuming it’s running with an isolation level that mandates consistent read behavior).
Visibility calculation requires a couple of things: we need a way to figure out which transaction installed a given record version (or at least some high water mark); we also need a way of sorting all recent transactions so that we can establish a partial order. The first part is simple, some systems actually use the transactionId as a version number, but however one does it, being able to say ‘version X of this row was created by transaction Y’ is the important thing. Given that databases are always on, and may have been running for weeks/months/years the number of transactions can be huge. So the response might be something more like ‘version X of this row was created by a transaction that happened before transaction Y’. The second requirement is more subtle. The partial order of transactions is really about having transactions in three piles. The first pile is of transactions that definitely happened before (from the point of view of the transaction doing the visibility calculation); the second pile is of transactions that definitely happened after (think committed short-run transactions from the point of view of the long-running transaction); the third pile is of transactions that are still active and we can’t make a determination.
This ordering is different from strict serializability (for those who may care), it is about creating distinct points in time that we can ‘pivot’ events around. This is more like the concept of ‘linearizability’ than ‘serializability’, because visibility doesn’t care about the order of transactions that happened before or after with respect to each other, just that they happened before or after the current transaction. For isolation levels that demand consistent reads, the only visible record versions are those installed by transactions that definitely ‘happened before’ the transaction doing the calculation.
To illustrate visibility further, I’m going to employ an example. Assume a database with two tables. One maps a division id to an employee count. Another maps a division id to a sales count. Now assume that this is the end of the sales period and that some new sales people have been hired. So there are two transactions running to update those values (one to bump up the employee count, one to bump up the sales count). Let’s assume that there’s another transaction that’s gathering metrics, one of which is the average number of sales per employee. A possible interleaving is illustrated below:
Transaction 2 started while transaction 1 was still active, and transaction 3 began after transaction 2 started. So there is no clear ‘happens-before’ relationship here. Therefore, it’s on transaction 2′s shoulders to figure out which values are visible. Let’s just assume that transaction 2 will read the newest committed value. If it does that, it will read 400 for the sales figure and 20 for the employee count. That means that it will compute half as many sales per employee as if it’d done the calculation with all new or all old values. This number might be used to calculate bonuses and so it could be very important to get it right (or at least consistent). Therefore, just reading the most recent committed values is insufficient to achieve read consistency. Note too that if transaction 2 executed a little slower and read the sales figure after transaction 3 committed, it would get a different result.
In general, consistency can’t be dependent upon vagaries of transaction execution interleaving, which raises the question of what should the visibility be? The visibility calculation needs to be consistent given the same partial ordering of transactions. That is, as long as transaction 1 and 3 are concurrent with transaction 2, it must make the same visibility calculations. In general, the only versions that should be visible are those that were either written by the transaction itself, or by transactions that it’s deemed ‘happened before’. This basically means transactions that committed before our transaction began. All other transactions are suspect.
In the example above, that means that transaction 2, for consistent reading, should only read version 1 of records 100 and 71. If transaction 2 is run again later, it will read version 2 of both records. A good rule of thumb is that a transaction is reading consistently if at the end (the commit call) redoing the visibility calculation will result in the same record versions for all read records as when it was done for the initial read. Of course, the example above is purely illustrative. Transaction 1 and 3 probably should be the same transaction (atomicity violation). But the main point was to illustrate that visibility isn’t just reading the latest committed version.
Newly inserted records:
There is an additional subtlety. What if a record has only one version? That is, it was just installed by another transaction. Should that be visible? Again, there is some dependence upon the user’s desired isolation level, but if a row was inserted AFTER a transaction started, that row could rightly be considered ‘invisible’ and therefore won’t be read at all.
I hope that’s cleared up visibility somewhat. Stay tuned loyal readers, as part 4 is forthcoming, in which I will describe distributed MVCC with particular emphasis on NuoDB.