Leng Bengco wrote:
Currently, I am enrolled on a database class. One of the requirements is to report to the class a particular RDBMS. My goupmates and I chose Firebird. Can we seek your help on how "careful write" is peformed on Firebird?
Ann W. Harrison answers:
The simple answer is "by writing pages in the correct order", but that probably doesn't help. The underlying rule for careful write is that you must write the page pointed at before you write the page that points at it.
Firebird uses careful write to keep the database on disk correct at all times. Assuming that the disk subsystem doesn't lie about the order in which it writes pages and there are no bugs, you can crash a Firebird server at any point and the database will restart without corruption. Some space may be unusable for reasons we'll get into, but the database will be correct and will include all committed change made before the crash.
Here's an example of careful write in action. When Firebird creates a data page, it calls a routine called fake_page (I think) to get a page size buffer which Firebird formats as a data page (DPG) and then writes new record versions there - all this is in memory, and uncommitted.
To put the new page on disk, Firebird find a free page in the database from the active a page information page (PIP). Then it must change the state of the page on the PIP, write the data page, then write the page number on a pointer page (PPG) for the table, making it known as a part of the table.
The order of page writes is PIP first, so the page is marked as being in use and can't be allocated by some other thread, then DPG, then PPG. If there were index entries for the newly created records on the page, they are written next on index pages (IDX).
All pages must be on disk before the changes are committed.
If there is a crash before the PIP is written, nothing has changed. If the crash comes between writing the PIP and writing the DPG and PPG, then that page becomes unavailable until someone runs gfix, but everything else is OK. If there is a crash after writing the DPG but before the PPG, the situation is the same - the DPG is allocated but not used. All the records on the page belong to transactions that were rolled back, so there's no data loss.
If there's a crash after writing the PPG but before writing the IDX pages, the page is part of the table, but all records on it belong to a rolled back transaction, so they will be garbage collected eventually.
Consider the case of an index page split. Actually, for that, check the Firebird for Database Experts articles at ibphoenix. They've got pretty colored pictures of index splits and the writes they cause.
All the ordering of page writes is controlled by a dependency graph - a structure that maintains the order of dependencies among unwritten pages. In the case we just looked at the IDX pages depend on the PPG which depends on the DPG which depends on the PIP. If some other transaction makes a change to one of those IDX pages and commits, it will force the write of the IDX which can't happen until the PPG is written which can't happen until the DPG is written, which can't happen until the PIP is writing. So asking for a write of an IDX causes these writes in this order: PIP, DPG, PPG, IDX.
So each page has its place in the dependency graph and will cause the pages it depends on to be written before it. That graph also shows potential loops - page A must be written before page B which must be written before page C which must be written before page A, resulting in an irredeemable mess.
When the dependency graph shows that the next entry will cause a loop, Firebird forces out enough pages to break the loop before entering the new dependency. Those write are necessary only to make careful write possible. Falcon avoids the cost of writing and reading a recovery log at the expense of sometimes writing database pages that could be deferred in other schemes.