How the Validation Tools Work
This document explains the workings of the validation and repair tools included in the command-line utility toolbox gfix that comes with Firebird and InterBase. It draws on an original article written many years ago by two Borland InterBase developers, Deej Bredenberg and David Schnepper, that was included in the opened source module val.c (which became validation.cpp from Firebird 2-onwards).
- How the Validation Tools Work
- Operating Conditions for Validation
- Validation Phases
Operating Conditions for Validation
Because it is essential that no structures in the database be modified during validation by any process other than validation itself, validation will not run unless and until it has exclusive access. On attach, validate attempts to obtain an exclusive lock on the database file. If it cannot get the lock, because other local or remote attachments exist, it gives up, returning the lock_timeout message (isc 335544510):
Lock time-out on wait transaction -- Object "database_filename.fdb" is in use
If other processes or servers are attached to the database, it will retry and wait only one second for the exclusive lock. If it cannot get the lock, it will abandon that attempt.
Normally, when a process gains exclusive access to a database, all active transactions are marked "dead" on the Transaction Inventory Pages. For validation, this feature is turned off because validate needs to encounter the exact state of transactions.
- Terminology In this article, we are using some terms in specific ways. The following list is intended to help clarify what those terms refer to.
- The gfix Validation Switches These are the switches that can be passed on a gfix call when validating or attempting to repair a database:
- Page Types in a Firebird Database References to page types in validation error messages provide a number code that does not tell you a lot about what page type it is. The following are the numeric page type codes with their corresponding page types.
In this article, we are using some terms in specific ways. The following list is intended to help clarify what those terms refer to.
- Delta record
- A reduced record version which represents an update to the record.
- Database Parameter Block (or Buffer), the structure that applications pass to the API when requesting an attachment.
- Page Inventory pages, the set of database pages where the page allocation bitmap is stored.
- Record fragment
- The smallest piece of a record that the engine can recognise as being part of a record that, when linked with other record fragments, is comprised by a distinct, single record version.
- Record version
- A distinct version of a record. In Firebird's multi-generational architecture, many versions of the same record may exist concurrently, each created when a particular transaction successfully requests an INSERT, UPDATE or DELETE. (Yes, even a DELETE operation creates a "new" record version, known as a delete stub).
- Record chain
- A set of record versions "chained together" as a linked list and managed as a single, logical representation of the record.
- Slot, page slot
- Each data page stores the offsets to the record versions stored on it, in a variable-length array. The slot is an index into that array, effectively, it is like a line number for the record version on the page on which it is physically written. Not all versions of a record necessarily live on the same data page, nor even on contiguous pages.
- Transaction Inventory Page, the database page where information is stored about all transactions.
The gfix Validation Switches
These are the switches that can be passed on a gfix call when validating or attempting to repair a database:
|-validate||Invokes validation and repair. All other switches modify this switch. When used alone, validates page structures but does not walk all record versions. isc_dpb_verify|
|-full, -f||Visits the entire database at record level. Without this switch, only page structures will be validated, although some limited checking of records does occur when -validate is used alone. isc_dpb_records|
|-mend, -m||By disabling corrupted structures, attempts to get the database into a state where it can be readable. Since the corrupted structures might involve any old record versions, uncommitted new record versions or "latest committed" versions, -m[end] has the potential to destroy data permanently. isc_dpb_repair|
|Specifies "do-nothing" behaviour for orphan pages and for allocated pages found to be free. Ostensibly, the effect is to not release orphan pages and to not mark the free pages as "in use". Hwoever, when specified along with -m[end] it is a no-op, since -m[end] will update the database, regardless. Use it with other switches to prevent updates that might otherwise occur. isc_dpb_no_update|
|-ignore, -i||Tells the engine to ignore checksums in fetching pages. With databases with an on-disk structure lower than 12 it is ineffectual. Firebird and most versions of InterBase never maintained checksums on data pages. Nevertheless, -validate will report them, regardless of this switch. It's probably a good idea to use it always, if just to avoid having artifacts of deprecated checksum code interfere with validation. isc_dpb_ignore|
Databases of ODS 12 and higher will use the checksum structure to store page numbers.
Page Types in a Firebird Database
References to page types in validation error messages provide a number code that does not tell you a lot about what page type it is. The following are the numeric page type codes with their corresponding page types.
|pag_undefined||0||Undefined page type (purposely)|
|pag_header||1||Database header page|
|pag_pages||2||Page inventory page (PIP)|
|pag_transactions||3||Transaction inventory page (TIP)|
|pag_root||6||Index root page|
|pag_index||7||Index (B-tree) page|
|pag_blob||8||Blob data page|
|pag_log||10||Write ahead log page (not used in Firebird)|
Validation is performed on all pages, in two distinct phases: walk-through and garbage collection.
First Phase of Validation: Walk-through
The first phase of validation is a walk through the entire database, during which the page numbers of all pages visited are stored in a bitmap for later use during the garbage collection phase. Each page that is fetched goes through a basic validation.
The page validation steps are:
Page type check The page is checked against its expected type. If the page header is the wrong type, the message returned is:
"Page xxx wrong type (expected xxx encountered xxx)"
- This could indicate either
- that the database has been overwritten
- some kind of unexpected edge case in the page allocation mechanisms, whereby one page was written over another, or
- a page that was allocated but never written to disk. This is the most likely interpretation if the encountered page type was 0.
Checksum If -ignore is specified, the checksum is specifically checked by the validate process, instead of in the engine. Anything that results in the checksum being "wrong" causes this error to be returned:
Checksum error on page xxx
For Firebird databases with ODS less than 12, it is quite academic, though, since all databases created by Firebird prior to V.3.0 have the same, static checksum (12345) from the moment of creation, for ever. (V.3.0 creates databases with ODS 12, from which point the checksum structure will store page numbers.) A checksum error is harmless when validate finds it and does not stop the page validation.
If the validate call includes -mend, the checksum error does cause the page to be marked for write so that, when the page is written to disk at the end of validation, the checksum (12345) will be rewritten automatically.
Revisit Each page fetched is checked against the page bitmap to make sure it has not been visited already. If it is, the error returned is:
Page xxx doubly allocated
This way, validate should catch the case where a page of the same type has been allocated for two different purposes.
The Revisit mechanism does not check data pages, since they are frequently revisited anyway, when record chains and fragments are being walked with validate -full.
Second Phase of Validation: Garbage Collection
During this phase, the Page Inventory (PIP) pages are checked against the bitmap of pages visited. Two types of errors can be detected during this phase: orphan pages and improperly freed pages.
Orphan pages If any pages in the page inventory were not visited during validation, the following error will be returned:
Page xxx is an orphan
The page will be marked as free on the PIP, unless -no_update was specified
Improperly Freed Pages Any pages marked free in the page inventory that are found to be actually in use during validation will cause an error similar to:
Page nnn [is] in use but marked free
On the PIP, the page will be marked "in use" unless -no_update was specified.
If errors were found during the validation phase, the assumption made is that invalid structures were detected and not all pages therefore had the opportunity to be visited. In this case, no changes will be made to the PIP pages.
The Walk-through Phase in Detail
This section describes each of the tasks that are conducted as validate walks through the pages of the database. Where appropriate, explanations are given for messages returned by validate.
If any corruption of a record fragment is seen during validation, the record header is marked as "damaged" but it does not cause an error to be returned. In subsequent visits, records (but not BLOBs) marked as "damaged" will still be retrieved by the engine and the following error message will appear:
Record xxx is marked as damaged
Small BLOBs (level 0) and records that are within the page size are visited only if -full is set. Without the -full switch, validate visits BLOBs of level 1 and higher and records whose size exceeds the page size.
Once a record is marked as "damaged" it is not validated again. Unless a full validation is done at some point, this error message would never be seen; once the full validation is done, the message will be returned thereafter, even if -full is not specified.
For no known reason, BLOBs are always validated, even if they are marked as "damaged".
The validate process tries to ensure that all pages are fetched during validation, if possible. It starts with the base pages, viz.
Database header page
If this one cannot be validated, all bets are probably off.
In some older versions, running out of disk space could cause the engine to start overwriting the database file from the first page, with dire consequences.
Log pages for after-image journalling
Not interesting for Firebird: write-ahead logging was disabled.
Page Inventory pages
Validate seriously needs the PIPs to be intact.
Transaction Inventory pages If the system relation RDB$PAGES could not be read or it did not contain any TIP pages, you would see the message:
Transaction inventory pages lost
If a particular page is missing from the sequence as established by RDB$PAGE_SEQUENCE, then the following message will be returned:
Transaction inventory page lost, sequence xxx
If -mend was specified, then a new TIP will be allocated on disk and stored in RDB$PAGES in the proper sequence. All transactions that would have been on that page are assumed committed. If a TIP page does not point to the next one in sequence, the following message will be returned:
Transaction inventory pages confused, sequence xxx
Generator pages as identified in RDB$PAGES If these are unaccounted for then, Houston, we have a problem.
Relation (Table) Walking
All the relations in the database are walked. For each relation, all indices defined on the relation are fetched, and all pointer and data pages associated with the relation are fetched.
Scan the metadata from RDB$RELATIONS to fetch the format of the relation
If this information is missing or corrupted the relation cannot be walked. If any bugchecks are encountered from the scan, the following message is returned:
Bugcheck during scan of table xxx (&table_name&)
This will prevent any further validation of the relation.
For views, the metadata is scanned but nothing further is done.
All the pointer pages for the relation are walked. As they are walked, all child data pages are walked.
Lost pointer page If a pointer page cannot be found, the following message is returned:
Pointer page (sequence xxx) lost
Pointer page does not fit with relation If the pointer page is not part of the relation we expected or if it is not marked as being in the proper sequence, the following message is returned:
Pointer page xxx is inconsistent
Pointer page is out of sequence If each pointer page does not point to the next pointer page as stored in the RDB$PAGE_SEQUENCE field in RDB$PAGES, the following error is returned:
Pointer page (sequence xxx) inconsistent
Each data page referenced by the pointer page is fetched. Here is where page-level corruption is determined. Both of the following conditions will cause a data page to be treated as corrupt:
- The data page not marked as part of the current relation.
- The data page is not marked as being in the proper sequence.
If either of these conditions occurs, the following error is returned:
Data page xxx (sequence xxx) is confused
Any page found to be corrupt at the page level, with -mend specified, is deleted from its pointer page, causing the whole page of data to be lost.
Each of the slots on the data page is examined, up to the count of records stored on the page.
Retrieve record fragment from non-zero slot If the slot is non-zero and within the bounds of the slots array, the record fragment at the specified offset is retrieved.
Lose record fragment that is out of bounds If the record begins before the end of the slots array, or continues off the end of the page, the following error is returned:
Data page xxx (sequence xxx), line xxx is bad
The term "line" in this message means the slot number. If this condition is encountered, the data page is considered corrupt at the page level. If -mend was specifed, it will be removed from its pointer page and thus, result in the loss of any data on that page.
The record at each slot is examined for basic validation, whether or not -full was specified. The fragment could be any of the following:
Back version If the fragment is marked as a back version, then it is skipped. It will be fetched as part of its record.
Bad transaction If the record is marked with a transaction id greater than the last transaction started in the database and either -full is specified, or the record is larger than the page size or the data is a BLOB of level 1 or higher, the following error is returned:
Record xxx has bad transaction xxx
Damaged If the fragment is marked damaged already from a previous visit or a previous validation, the following error is returned:
Record xxx is marked as damaged
where xxx is the record number.
Corrupt If the fragment is determined to be corrupt for any reason, and -mend was specified, then the record header is marked as damaged.
If -full is specified, and the fragment encountered is the first fragment in a logical record, then the record at this slot number is fully retrieved. This involves retrieving all versions, and all fragments of each particular version. In other words, the entire logical record will be retrieved.
Back versions If there are any back versions, they are visited at this point. If the back version is on another page, the page is fetched but not validated, since the other page will be walked separately. The message:
Chain for record xxx is broken
- is returned when any of the following is encountered:
- the slot number of the back version is greater than the maximum number of records on the page, or
- there is no record stored at that slot number, or
- it is a BLOB record, or
- it is a record fragment, or
- the fragment itself is invalid
Marked "incomplete" If the record header is marked as incomplete, it means that there are additional fragments to be fetched: this occurs whenever a record was too large to be stored in one slot. In this circumstance, a pointer to the next fragment in the list is stored in the record. For fragmented records, all fragments are fetched to form a full record version. If any of the fragments is not in a valid position, or is not the correct length, the following error is returned:
Fragmented record xxx is corrupt
Once the full record has been retrieved, the length of the format is checked against the expected format stored in RDB$FORMATS.
The format number is stored with the record, representing the exact format of the relation at the time the record was stored.
If the length of the reconstructed record does not match the expected format length, the following error is returned:
Record xxx is wrong length
This check is not made for delta records.
If the slot on the data page points to a BLOB record, then the BLOB is fetched (even without -full).
BLOB fetching has several cases, corresponding to the various BLOB levels:
For each blob page found, some further validation is done:
Invalid backward pointer If the page does not point back to the lead page, the following error is returned (where xxx corresponds to the BLOB record number):
Warning: blob xxx appears inconsistent
Bad sequence If any of the blob pages are not marked in the sequence we expect them to be in, the following error is returned:
Blob xxx [is] corrupt
The message for the same error in level 2 includes the verb "is", while for level 3 blobs its is omitted.
Missing pieces If any of the BLOB pages in the sequence are missing, the following error is returned:
Blob xxx is truncated
If the fetched BLOB is determined to be corrupt for any of the reasons described, and -mend was specified, then the BLOB record is marked as "damaged".
BLOB records marked as "damaged" cannot be opened. This means that they will not be deleted from disk.
During backup, the damaged structures will be fetched and cause the backup to stop with a Blob not found error unless the backup is run with the -ignore switch. With the -ignore switch, gbak adds isc_dpb_damaged to its DPB, which tells the engine to return an empty BLOB in place of the damaged one and not to raise the exception.
BLOB Levels for Validation
The following BLOB levels apply.
|0||These are just records on page, and no further validation is done.|
|1||All the pages pointed to by the blob record are fetched and validated in sequence.|
|2||All pages pointed to by the blob pointer pages are fetched and validated.|
|3||The blob page is itself a blob pointer page; all its children are fetched and validated.|
If the index root page is missing, validate reports Missing index root page and the indices are not walked. Otherwise, the index root page is fetched and all indices on the page are fetched. For each index, the btree pages are fetched top-down, left to right.
Basic validation On non-leaf pages, basic validation is done to verify that each node on the page points to another index page. If -full validation was specified, the lower level page is fetched to verify that its starting index entry is consistent with the parent entry.
On leaf pages, the records pointed to by the index pages are not fetched. Instead, the keys are examined to verify that they are in the correct ascending order.
Dissociated index page If a visited page is not part of the specified relation and index, the following error is returned:
Index xxx is corrupt at page nnn
Orphan child page If there are orphan child pages, i.e. a child page does not yet have its entry in the parent page, although the child's left sibling page has its btr_sibling updated, the following error is returned:
Index xxx has orphan child page at page xxx
Unexpected node count If the page does not contain the number of nodes expected from its marked length, the following error is returned:
Index xxx is corrupt on page xxx
Missing index entries During the walk of the leaf pages, a bitmap is kept of all record numbers seen in the index. At the conclusion of the index walk, this bitmap is compared to the bitmap, calculated during the data page/Record Validation phase, of all records in the relation. If the bitmaps match, it indicates that the index is good.
If the bitmaps are not equal, it indicates a corrupt index and the following error is reported:
Index %d is corrupt (missing entries)
There is no "one-to-one" check done to verify that each version of each record has a valid index entry; nor is any check performed to verify that the stored key for each item corresponds to a specific version of the record in hand.
Since Firebrd 2, more checks are done on the index structures. Watch this space!
Separate counts are kept of the number of back versions seen while walking pointer pages and record chains. The counts should match. If they do not, it indicates either "orphan" back version chains or double-linked chains. The message returned is:
Relation has xxx orphan backversions (nnn in use)
Validate merely reports this condition: it takes no action of its own to try to correct it. Subsequent housekeeping should correct them, as follows:
- Clearing orphan back versions The space occupied by orphan back versions will be reclaimed by a backup/restore.
- Clearing double-linked back versions A sweep should remove double-linked back versions.