by Paul Beach

If you have a 24x7 database environment and you have noticed that it quickly starts to slow down, and performance begins to suffer, and the only cure thats seems to work is a backup and restore, and running an on-line sweep doesn't help. Whats happening and whats the possible cause?

First of all - run gstat -h database.gdb and collect some statistics:

Oldest transaction 127641
Oldest active 257007
Oldest snapshot 257007
Next transaction 257008

Now you run a sweep and then run isql.exe and commit at least one transaction, this should move the transaction markers on after the sweep, if it still hasn't moved on:

Oldest transaction 127641
Oldest active 257007
Oldest snapshot 257007
Next transaction 257008

Only after a backup and restore, does the problem resolve itself:

Oldest transaction 1
Oldest active 2
Oldest snapshot 1
Next transaction 17

Whats Happening?

OK lets try and work out whats happening.... First let us define a transaction, what it's possible states are, the life cycle of a transaction, what exactly is the Oldest Interesting Transaction and what is the Oldest Active or Snapshot Transaction, and then how they are set and moved along.

Definition

A transaction is an atomic unit of work made up of one or more operations against the data in one or more databases. It can contain one or many operations that might INSERT, UPDATE, DELETE, or SELECT data. Or it might be work that changes the physical structure of the database itself. The scope of the transaction is defined by the user/programmer when they START a transaction and then end it with a COMMIT or ROLLBACK.

Possible States

A transaction can have one of four states; active, committed, rolled back, or limbo. The life cycle of a transaction is first active, set that way by the execution of a isc_start_transaction() or an isc_start_multiple call(). Then the transaction can be either committed by a isc_commit_transaction() or an isc_commit_retaining() call, or rolled back by an isc_rollback_transaction() call. If the commit is happening for a transaction across multiple databases then the two-phase commit protocol is invoked. This first phase sets the transaction to limbo in each of the databases then the second phase races around the network to just switch the transaction bit to committed. If it fails anywhere in the two phases then the transaction is considered in limbo and the transaction bit is left set at the limbo state.

Definition of OIT and OST or OAT

The Oldest Interesting Transaction (OIT) is the first transaction in a state other than committed in the database's Transaction Inventory Pages (TIP). The TIP is a set of pages that log each transaction's information (transaction number and current state) in the database since the last time the database was created or last backed up and restored. The Oldest Active Transaction (OAT) is the first transaction marked as active in the TIP pages. The Oldest Snapshot Transaction (OST) is the oldest transaction that started in Snapshot mode. The Oldest Active Transaction is likely to be the Oldest Active Transaction, but it is not guarenteed. The way to find out the values of the OIT, OST and OAT is to run gstat -h locally against the database in question.

Movement of OIT and OAT

We have to refine the life cycle a bit first. To create a transaction the start transaction call will first read the header page of the database, pull off the Next Transaction number, increment it, and write the header page back to the database. It also reads the OIT value from the header page and starts reading the TIP pages from that transaction number forward up to the OAT. If the OIT is now marked as committed, then the process continues checking the transactions until it comes to the first transaction in a state other than committed and records that in the process's own transaction header block. The process then starts from the OIT and reads forward until it finds the first active transaction and records that in it's transaction header block also.

If and only if the process starts another transaction, will the information from the process's transaction header block update the information on the header page when it is read to get the next transaction number. Of course if another process has already updated the header page with newer numbers, i.e. larger, then the information will not be written.

There are only two non-committed and non-active transaction states; limbo and rolled back. The only way to change a limbo transaction to committed is for the user to run gfix on the database to resolve the limbo transaction by rolling back or committing it. The only way to change a rolled back transaction to committed is to sweep the database. The sweep can be executed by:

  1. Somebody running a gfix -s process
  2. Programmatically attaching to the database with a database parameter block set to cause a sweep
  3. Have the automatic sweep kicked off The automatic sweep interval is set by default to be 20,000

It can be changed by using the gfix -h N command to set the interval to N. If N is zero then the automatic sweep is completely turned off and the user will have to use options 1 or 2 from above to sweep the database.

Note

The automatic sweep is kicked off if the difference between OST (Oldest Snapshot Transaction) and OIT is greater than the sweep interval defined. The user's process that tried to start the transaction that exceeds the sweep interval by one will sweep the entire database before actually starting the transaction they requested.

As you can see, if you ever rollback a transaction, have an active transaction abnormally terminate, or always use processes that use only one transaction and then exit, then you will have to sweep the database to update the OIT OST, and OAT values. Of course, sweeping the database also provides the added benefit of removing any deleted records from the database. Except in this particular case the Oldest Interesting Transaction (determined by the Oldest Transaction) hasn't moved along after the sweep has finished - or so it seems by the data from the Database Header Page.... shown above.

The sweep can either be kicked off manually or automatically. By default, when the OIT is 20,001 transactions less then the Oldest Snapshot Transaction number, the process that tried to start the transaction will sweep the entire database and remove as many back difference records (versions) thats are no longer required as it possibly can. While this is happening, other users can continue to use the database. This threshold can be changed.

If you are going to start the sweep manually then it is advised that you first make sure there is no one connected to the database. This will not only clean out the back difference records and clean out the erased records, but also update the OIT number on the header page to be one less than the Oldest Snapshot and also the Oldest Active Transaction number when the next transaction starts after the sweep. It can do this because there are no other active transactions that might need to see any of the back difference records.

The Solution

So whats the problem, and what is the solution? In this case it was a transaction which was left permanently open, i.e. was never committed. This prevents the sweep from doing its job, in that the open transaction subsequently becomes the Oldest Active, and Oldest Interesting transaction. As such no garbage collection ever takes place, except when a backup and restore takes place. By making sure that you always commit transactions when you don't need them, and don't rollback transactions unecessarily, Firebird and InterBase can quite happily cope with 24x7 operations. Another possible cause of this, could be a failed two phase commit transaction that has been left in limbo, this is more unlikely than the above possible cause but can be checked and simply corrected by running the relevant gfix command(s).

This paper was written by Paul Beach originally in August 2000, updated May 2010 and is copyright Paul Beach and IBPhoenix. You may republish it verbatim, including this notation . You may update, correct, or expand the material, provided that you include a notation that the original work was produced by Paul Beach and IBPhoenix.

Like this post? Share on: TwitterFacebookEmail


Related Articles


Author

Paul Beach

Reading Time

~6 min read

Published

Category

Articles

Tags