Indices and Sparse Bit Maps
By Paul Beach
Firebird has a relatively sophisticated query optimiser, which draws on information from a variety of sources when choosing its data access paths. Lets look at one aspect of the query optimiser and how it can use multiple indices at the same time to improve performance for data access.
Many database optimisers when accessing a single table are able to make use of one index (or key) at a time. Firebird on the other hand is able to make use of multiple indices at the same time. Taking advantage of this particular feature can dramatically improve query performance.
The one index at a time approach is used to reduce the time that is taken to retrieve the first record, but it does not consider how long a total query may take. This approach (whether multiple records with a single value are being retrieved or records constrained by a “between” clause) reads to the index and data pages are inter mixed. After each read of an index leaf node, the relevant data pages will be retrieved before the next index leaf node is read. This permits data records to be fetched before all the index data is read. The result of this approach is that all reads to data pages are effectively done at random. The order of reading is determined by the order of their appearance in the index.
Firebird takes a different approach. First of all Firebird will retrieve all the required index information in one pass (minimising index lock conflicts). This read allows the page addresses of all desired records to be sorted on page address (sequentially from 0 – n). Where the page address for a record in the query is true, then a bit is set in a sparse bit map. The true bits have a relative position that corresponds to the page address and record offset. This approach permits Firebird to generate sequential reads against the appropriate data pages. As such no data page is ever visited twice, thus ensuring maximum efficiency of I/O.
This bit map approach doesn’t just apply to single indices, Firebird can use multiple indices at the same time. Multiple indices use multiple bit maps that can be AND’d or OR’d in memory, this will then produce a single true bitmap. Data page and record retrieval is then determined by this bitmap. This approach to complex queries can result in significant performance gains. In its efforts to make best use of all available indices and reduce physical and logical reads to the database Firebird will attempt to use all applicable indices when executing a query.
For example, if you have a table where several different fields are used to restrict the data retrieved from a query, then most databases would require that you define a single or multiple indices that includes all the fields and supports all the various access methods. With Firebird “if you are looking for a movie that was released in 1964, directed by Stanley Kubrick, and distributed by Columbia you would need an index on Year, Director, and Distributor. If you ever wanted to find all pictures distributed by Stanley Kubrick, you would also need an index on Director alone etc. With Firebird, you would define one index on Director, one on Distributor, and one on ReleaseDate and they would be used in various combinations.”* Note that Firebird looks at the first column of an index to determine if it can be used, so if you used conventional methods of indexing where columns are placed into an index in the order that they will be accessed. It’s likely that you will create unnecessary overhead, as redundant indices are then used in the bit map merge. It is therefore recommended that you do not use indices that start with the same column in Firebird. In other words, single field indices tend to be better than compound indices.
But a database designer still needs to design indices to support how the data will be accessed (for the WHERE clause of the relevant SQL statements), adding more indexes simply in the hope they might prove useful, will actually add to the overhead of bitmap processing. Equally indices with a high degree of selectivity (fewer records for each index value) will be more useful. If selectivity is low then the bitmap merge will be flooded with bitmaps that refer to a large number of database pages and as such the resultant bitmaps would not be sparse. Although in reality most of the indices would be discarded by the optimiser if there is a highly selective index available.
- Ann Harrison, Firebird for the Database Expert: Episode 1 - Indexes