Lesson about concurrent updates

Timo Partanen wrote:

I would have a question regarding serialized updates. We have a table as follows:

CREATE TABLE MyTable ( ID INT NOT NULL, Data INT NOT NULL )
INSERT INTO MyTable VALUES ( 1, 1 )

There are two types of transactions for the MyTable table. Writer transactions and reader transactions. The writer transactions are very short and they only update values in the table as follows:

UPDATE MyTable
SET Data = Data + 1
WHERE ID = 1
RETURNING Data;

I.e. each writer increments a value by one and determines the incremented value.

The reader transactions read values from the MyTable table. They use optimistic concurrency by using the REPEATABLEREAD isolation transaction.

Application requirements:

Writers cannot block readers (and readers cannot block readers)
Each writer must update and get a unique value

How to fulfil both of these requirements? By definition, readers utilizing row versioning are not blocked by other readers and writers. So, I think R1 is already fulfilled. What about R2? Can we issue updates in a way that in case of N concurrent writers, each updates and gets a unique value, and the value in the table is incremented by N? Does my update already guarantee this if run under READCOMMITTED? And if it doesn't, how to change it and what isolation level to use?

Ann W. Harrison answers:

You've only got one record and you're updating it. In that case, you don't need a unique index - it won't do any good with only one record. You'll get update conflicts automatically. Roll back and try again. When you succeed, you will have stored the next value. If there would be more records, put a unique index on the field. When you get a duplicate key error, rollback your transaction and restart it.

Somewhere under the covers Firebird has a way of locking a table that allows one writer and n readers (called Protected access) but it's not surfaced in the SQL interface. Besides, it has almost exactly the same effect as trying, failing, rolling back, and trying again.

As a further elaboration, despite what you may have heard, repeatable read, as implemented by Firebird, is not "optimistic concurrency control". Under optimistic concurrency control, a transaction runs without checking for conflicts, but keeping a list of changes it has made - and in some implementations also records it has read. On commit, the database system compares that transactions changes with changes made by concurrent transactions and decides whether it succeeded or failed. Optimistic concurrency determines the outcome at the end of the transaction.

When a Firebird transaction - repeatable read or read committed - finds a conflict, it stops and waits for the conflicting transaction to complete. The conflict can be an update conflict, meaning that a transaction has attempted to change a record when it cannot read the most recent version, or it can be a unique constraint conflict. In either case, the transaction that sees the conflict waits until the previous update transaction finishes. If that other transaction commits, the waiter gets an error and must roll back and retry in a new transaction. Pessimistic concurrency control assumes that if there's the appearance of a problem during an operation, it's a problem that needs to be investigated.

Answer from Ann W. Harrison continues:

If you had selected the max value in the table and then stored a new row with that value plus one, you'd need the unique descending index.

Here's the reason you don't want to use generators...

no transactions: V1 is the current value in MyTable

T1: starts

T2: starts

T2: generates a next unique value, V2, updates the value in MyTable

T2: commits (new transactions see V2 after this, it is ok)

T1: reads value from MyTable, gets V1 (getting V2 would be incorrect)

T1: commits

If using a generator, T1 gets V2 if it reads the current value of the generator, and this is wrong in my case. I hope you understand what I am trying to say.

Well, another short answer is to use generators. Rather than reading the record and adding one to the stored value, get the next generator value, update the records with that, and return the value you got. T1 will continue to see V1.

That will leave holes in the sequence if any transaction rolls back, which may be a problem. Given that you're updating a single row rather than inserting new rows, you're going to get a lot of update conflicts using the generator solution and so a lot of rolled back transactions. Inserting the new value rather than updating one existing record will reduce the number of rolled back records and if you have a descending unique index, performance should be OK.

If holes in the sequence are a problem, search this list for the term "auditable" to find a solution - not simple, but a solution.

A longer answer is that transaction serializability is a hard problem, especially if you are willing to consider several different concurrency control methods. The SQL standard levels of isolation corresponds very closely to the behavior of systems that use two-phase locking for concurrency control. Two-phase locking works like this:

Don't hold any locks                  --- read uncommitted
Exclusive locks on records written    --- read committed
Shared locks on records read          --- repeatable read
"Predicate" locks                     --- serializable

One of the may cute aspects of the SQL standard is that reads in "repeatable read" isolation are not repeatable. They suffer from "phantoms" - meaning that if you look for brown bears (select species from bears where color = upcase ('brown'); several times in the same transaction, you may get more results on later queries than earlier, either because somebody stored and committed a new brown bear, or because somebody changed some bear from white to brown.

Thus "predicate" locks - i.e. locks on the "brownness" of bears. In practice predicate locking is normally accomplished by locking the access paths or by locking position in an index. The result is that any full table scan locks out writers for the whole table. And, for example, if you've got a table where some bears are black, some are brown, and some are white, with an index on color, a transaction that selects brown bears will block the insertion of grey bears, green bears, blue bears, and purple bears. Albino bears and yellow bears can still get in. As you can imagine, a fully-serializable lock based system is likely to deadlock.

So that's the locking perspective. When you get to serious levels of isolation, the effect of the locks is to order both reads and writes. If you remember, I said something earlier about ordering writes when I was blathering about optimistic concurrency control. Your particular problem requires that both reads and writes be ordered.

MVCC (multi-version concurrency control) establishes an order for writes but not reads.

Read stability is maintained by keeping multiple versions of records, each tagged with the transaction identification of the transaction that created it. When a transaction starts, it decides which other transactions were committed at that instant, and reads the newest version of each record that was created by a committed transaction. OK?

Seems pretty easy ... I see the state of the world as it was when I started. If you and I are running at the same time, we see the same thing. If we both write things that are ignored by the other, there's no problem - either of us could go first. If we both try to update or delete the same row, Firebird recognizes a fatal condition - one of us is trying to change a row when we can't see the most recent version. That rule prevents lost writes - where you make brown bears green and I make brown bears grey, accidentally eliminating all green bears. As long as we're changing the same records, our writes are ordered.

What MVCC doesn't prevent are the effects of unordered reads. Suppose that you change something that I subsequently read. I read the value before yours, so, thinking about serializing or order, I should run as if I ran before you. But suppose that I then change something that you subsequently read. You also read the version before mine, so in the ultimate ordering, you have to come before me. Which is not possible. Easy anomalies, like the one you suggest, can be avoided by adding unique constraints.

But there are harder ones. Some people take the position that MVCC just doesn't work because there is no way to serialize transactions. Others use it because the view it provides of data is actually stable, and the anomalies it produces are less disruptive than any more stable lock-based concurrency. Ask me some other time about the hybrid solution that InnoDB, the MySQL transactional engine, uses. It give different results for plain SELECT and SELECT ... FOR UPDATE.

Timo: "I am wondering there should be a more optimal solution in terms of performance. Is that really the case that a writer cannot increment a value by one and get the incremented value without running under the "sophisticated transaction control" of Firebird?"

Yes, it's really true. Two writers who start at the same time will see the same state of the record - say 23. Each will add one and try to write '24'. There are two ways to avoid that if you're updating a single record. The easier is to rollback and retry if you get an update conflict. Or, do your original read as a SELECT ...FOR UPDATE WITH LOCK.

Timo: "Is there a way to avoid using transactions?"

No, and transactions aren't your problem. Your problem is that there's an inherent inconsistency between seeing a non-blocking stable view of data, and also seeing all the changes going on in the system.

Timo: "As for Firebird transactions, the "no wait" option also allows a transaction to immediately fail without waiting for a concurrent update to complete. So waiting is not necessary but it is usually the best bet if the transaction is about to rerun."

Yes, there's no reason to continue until you're quite sure that the transaction that blocked you is out of the way.

Lesson about concurrent updates

Timo Partanen wrote:

Ann W. Harrison answers:

Answer from Ann W. Harrison continues:

Related Articles

Author

Reading Time

Published

Category

Tags