Introduction

The documentation for ALTER INDEX …​ [IN]ACTIVE is somewhat ambiguous. At first sight it appears to be obvious - you would use it to activate or de-activate an index. A switch to turn them on or off. De-activating an index can be useful, especially during bulk insert operations. It could also be useful for deactivating indexes with very little selectivity. Such indexes are usually created by foreign keys and they are surprisingly common, require additional page writes to maintain and can never be used in any useful sense. If only there was a way to turn them off! And then the documentation states quite clearly that you can’t de-activate indexes that are created by constraints. Hmmm…​ the command is suddenly not that useful after all. At this point most users will just shrug and move on. This is not the solution they were looking for. Dropping constraints just to de-active a bunch of indexes is way too complicated.

But hidden in the documentation for ALTER INDEX …​ ACTIVE is a little nugget. It says executing the command will rebuild the index. That is right - REBUILD it. It does not say that an index must be INactive in order to activate. And with no mention whether indexes created by constraints are excluded or not. So, what is going on here? Does using ALTER INDEX …​ ACTIVE on any already active index really rebuild it?

Why would you want to rebuild an index?

When a database is restored it recreates each index precisely based on the existing data. Each index page has space for changes, in the same way that tables are filled to 80% so are indexes. This is a reasonable compromise. Pages are reasonably well packed and can still accommodate some changes by writing to existing pages. But when a table sees many inserts, updates and deletes to records then holes start appearing, both in the data pages and in the index pages.

When a table is freshly restored retrieving a record should take around 4 page reads if the index has a depth of three. Unless the index depth changes retrieving a single record will always take 4 page reads. In a table with millions of records and thousands of data pages this is a win. And when pages are well packed an attempt to retrieve, say, 100 records will immediately benefit from this as many of the records in the set will already be in fetched pages. As holes start to appear in pages then obviously more pages need to be read to retrieve the data.

To take a hypothetical example, if most of the index pages are now half full then approximately twice many pages will need to be read in order to retrieve the hypothetical 100 rows. In theory ALTER INDEX …​ ACTIVE will rebuild the indexes and performance should return to something similar to the state after the last backup/restore cycle.

Well, that’s the theory. Does it work on all indexes, built by a constraint or otherwise? How long does it take? And is performance better afterwards? Let’s see.

The test database

The test database is just a simple table with a primary key, a foreign key and some indexes on two data columns.

create table lookup_colour(
 ID D_INTEGER,
 COLOUR varchar(12),
 constraint pk_lookup_colour PRIMARY KEY (ID)
);

create table test_indexes (
  id D_INTEGER,
  col_id D_INTEGER,
  txt D_GUID,
  txt2 D_GUID_TEXT,
  constraint pk_test_pk primary key (id)
);
commit;

create unique index test_uq_idx on test_indexes(id);
create index test_nonuq_idx on test_indexes(id);
create index test_nonfk_idx on test_indexes(col_id);
create asc  index test_txt_idx on test_indexes(txt);
create desc index test_txt__desc_idx on test_indexes(txt);
create unique asc  index test_txt2_idx on test_indexes(txt2) where txt2 is not null;
create unique desc index test_txt2_desc_idx on test_indexes(txt2) where txt2 is not null;

alter table test_indexes add constraint fk_lkp_col foreign key (col_id) references lookup_colour (id);

commit;

The tests

10 million rows were inserted in blocks of one million rows. After each insertion block 10% of the table was randomly selected for update and then another 10% was randomly selected for deletion. The final test database had around 7 million rows in the test table.

For these tests background garbage collection was turned off.
These tests were carried out using Firebird 5.0.1 SuperServer on Linux. The behaviour of earlier versions will be slightly different but the overall results will be similar.

Result of setting FK_LKP_COL active

To start with let’s look at what happens when we run ALTER INDEX …​ ACTIVE on a single index. For this test FK_LKP_COL was chosen.

Here is the gstat output from before and after the index was rebuilt.

DB before resetting FK_LKP_COL DB after executing ALTER INDEX ACTIVE
TEST_INDEXES (129)

  Index FK_LKP_COL (8)
Root page: 25240, depth: 3, leaf buckets: 10053     (1)
nodes: 9274374
Average node length: 4.99, total dup: 9274364, max dup: 928845
Average key length: 2.00, compression ratio: 0.90
Average prefix length: 1.80, average data length: 0.00
Clustering factor: 951868, ratio: 0.10
Fill distribution:
      0 - 19% = 0
     20 - 39% = 0
     40 - 59% = 9446
     60 - 79% = 7
     80 - 99% = 600     (2)
TEST_INDEXES (129)

  Index FK_LKP_COL (8)
Root page: 25062, depth: 3, leaf buckets: 5727      (1)
nodes: 9274374
Average node length: 4.99, total dup: 9274364, max dup: 928845
Average key length: 2.00, compression ratio: 0.90
Average prefix length: 1.80, average data length: 0.00
Clustering factor: 951868, ratio: 0.10
Fill distribution:
      0 - 19% = 0
     20 - 39% = 0
     40 - 59% = 0
     60 - 79% = 0
     80 - 99% = 5727    (2)
1 Leaf buckets (the actual pages storing index values) have almost been halved. ✅
2 Index pages are repacked to high density ✅

Side effects of setting FK_LKP_COL active

So far, so good. Now let’s look at the rest of the table after rebuilding a single index.

DB before resetting FK_LKP_COL DB after executing ALTER INDEX ACTIVE
TEST_INDEXES (129)
    Primary pointer page: 234, Index root page: 235
    Total formats: 1, used formats: 1
    Average record length: 33.41, total records: 9274374
    Average version length: 28.48
    total versions: 4112353  (1)
    Average fragment length: 34.00
    total fragments: 39439  (2)
    Average unpacked length: 172.00, compression ratio: 5.15
    Pointer pages: 60, data page slots: 96352
    Data pages: 96352, average fill: 83%  (3)
    Primary pages: 95196, secondary pages: 1156, swept pages: 0
    Empty pages: 5, full pages: 93737
    Fill distribution:
	 0 - 19% = 6
	20 - 39% = 1
	40 - 59% = 0
	60 - 79% = 35522
	80 - 99% = 60823  (4)
TEST_INDEXES (129)
    Primary pointer page: 234, Index root page: 235
    Total formats: 1, used formats: 1
    Average record length: 33.41, total records: 9274374
    Average version length: 44.76
    total versions: 2159963  (1)
    Average fragment length: 34.00
    total fragments: 36809  (2)
    Average unpacked length: 172.00, compression ratio: 5.15
    Pointer pages: 60, data page slots: 96384
    Data pages: 96384, average fill: 77%  (3)
    Primary pages: 95191, secondary pages: 1193, swept pages: 0
    Empty pages: 0, full pages: 89851
    Fill distribution:
	 0 - 19% = 0
	20 - 39% = 58
	40 - 59% = 688
	60 - 79% = 39157
	80 - 99% = 56481  (4)
1 Total back record versions appears to have halved. ✅
2 Record fragments are down by about 10% ✅
3 Average page fill is worse ❌
4 Fill distribution appears to be worse ❌

It appears that rebuilding a single index has had some side effects. They appear to be mostly positive but it is posible that in some cases performance may be worse due to some 700 pages that are now half empty.

Result of setting rebuilding ALL of the indexes

From the above it looks as if rebuilding a single index will improve performance where that index is used. What happens when we rebuild all the indexes?

We will start by taking a look at the primary key index

DB before resetting PK_TEST_PK DB after executing ALTER INDEX PK_TEST_PK ACTIVE
    Index PK_TEST_PK (0)
	Root page: 24634, depth: 3, leaf buckets: 7784 (1)
	nodes: 9274374 (2)
	Average node length: 6.03, total dup: 0, max dup: 0
	Average key length: 3.04, compression ratio: 1.52
	Average prefix length: 3.60, average data length: 1.02
	Clustering factor: 214796, ratio: 0.02
	Fill distribution:
	     0 - 19% = 1
	    20 - 39% = 0
	    40 - 59% = 517
	    60 - 79% = 1590
	    80 - 99% = 5676  (3)
    Index PK_TEST_PK (0)
	Root page: 23906, depth: 3, leaf buckets: 5414 (1)
	nodes: 7114411 (2)
	Average node length: 6.14, total dup: 0, max dup: 0
	Average key length: 3.15, compression ratio: 1.48
	Average prefix length: 3.59, average data length: 1.07
	Clustering factor: 210009, ratio: 0.03
	Fill distribution:
	     0 - 19% = 1
	    20 - 39% = 0
	    40 - 59% = 0
	    60 - 79% = 0
	    80 - 99% = 5413  (3)
1 Number of pages (leaf buckets) in the index have now been reduced by ~25% ✅
2 Number of indeex entries (nodes) now reflects the actual number of records in the table. ✅
3 Index pages have been cleaned up with more nodes stored per page. ✅

What happens to the other indexes ?

The effect upon the other indexes is similar. By way of example we will look at TEST_TXT_IDX in detail.

DB before resetting TEST_TXT_IDX DB after executing ALTER INDEX TEST_TXT_IDX ACTIVE
	Root page: 2334, depth: 3,
	leaf buckets: 32806  (1)
	nodes: 9274374  (2)
	Average node length: 19.77, total dup: 0, max dup: 0
	Average key length: 16.77, compression ratio: 0.95
	Average prefix length: 2.22, average data length: 13.78
	Clustering factor: 9274272, ratio: 1.00
	Fill distribution:
	     0 - 19% = 121
	    20 - 39% = 0
	    40 - 59% = 9226    (3)
	    60 - 79% = 16146
	    80 - 99% = 7313
	Root page: 2310, depth: 3
	leaf buckets: 17504  (1)
	nodes: 7114411  (2)
	Average node length: 19.82, total dup: 0, max dup: 0
	Average key length: 16.82, compression ratio: 0.95
	Average prefix length: 2.17, average data length: 13.82
	Clustering factor: 7114331, ratio: 1.00
	Fill distribution:
	     0 - 19% = 1
	    20 - 39% = 0
	    40 - 59% = 0
	    60 - 79% = 0
	    80 - 99% = 17503  (3)
1 Number of pages (leaf buckets) in the index have now been reduced by ~50%. ✅
2 The number of nodes now reflects the actual number of records in the table. ✅
3 Pages have been cleaned up with nodes stored more densely per page. ✅

There is no need to examine the effect of rebuilding the other indexes. In all case they exhibited similar results to the above.

The effect of rebuilding all a tables indexes on the data storage of the table itself.

Now that the indexes have all been rebuilt let’s take another look at the data pages.

Data Page analysis before resetting FK_LKP_COL Data Page analysis after executing ALTER INDEX ACTIVE on all indexes
TEST_INDEXES (129)
    Primary pointer page: 234, Index root page: 235
    Total formats: 1, used formats: 1
    Average record length: 33.41,
    total records: 9274374 (1)
    Average version length: 44.76
    total versions: 2159963 (2)
    Average fragment length: 34.00
    total fragments: 36809
    Average unpacked length: 172.00
    compression ratio: 5.15
    Pointer pages: 60, data page slots: 96384
    Data pages: 96384, average fill: 77%  (3)
    Primary pages: 95191, secondary pages: 1193
    swept pages: 0
    Empty pages: 0, full pages: 89851
    Fill distribution:
	 0 - 19% = 0
	20 - 39% = 58
	40 - 59% = 688
 	60 - 79% = 39157  (4)
	80 - 99% = 56481
TEST_INDEXES (129)
    Primary pointer page: 234, Index root page: 235
    Total formats: 1, used formats: 1
    Average record length: 40.82,
    total records: 7114411  (1)
    Average version length: 0.00
    total versions: 0, (2)
    Average fragment length: 34.00
    total fragments: 36809
    Average unpacked length: 172.00
    compression ratio: 4.21
    Pointer pages: 60, data page slots: 95840
    Data pages: 95728, average fill: 54%  (3)
    Primary pages: 95191, secondary pages: 537
    swept pages: 0
    Empty pages: 12, full pages: 28467
    Fill distribution:
	 0 - 19% = 14
	20 - 39% = 859
	40 - 59% = 61081
	60 - 79% = 33774  (4)
	80 - 99% = 0
1 The total number of records stored is now correct at the data page level. ✅
2 Back record versions have been cleaned out ✅
3 Average page fill is significantly worse ❌
4 Fill distribution is also worse ❌

Overall data page storage is better. And while page fill appears to have deteriorated the reality is that these pages stored back record versions of rows that had either been updated or deleted. So it was inaccessible to new transactions anyway. And now that the data pages have been cleaned up this space is now available for re-use. But this does indicate that we should be wary of the page fill statistics.

What about rebuilding indexes for a table that is in use?

Let’s try two tests:

  1. Table with an active (uncommitted) select

  2. Table with active deletes

Table with an active (uncommitted) select

Executing this statement

select r.ID, r.COL_ID, r.TXT
from TEST_INDEXES r
join LOOKUP_COLOUR c on r.COL_ID = c.id
where c.colour = 'White'
order by r.ID
rows 1 to 10

and subsequently executing ALTER INDEX FB_LKP_COL ACTIVE worked fine.

Gstat output showed the same results as earlier - so active selects do not appear to block rebuilding indexes.

Table with active deletes

Running this query

execute block
as
declare anum D_INTEGER;
begin
  execute procedure populate_pk_list ( 0, 9999); -- generate a random list of records to delete
  for
    select pk_list_id
    from pk_list
    into :anum
  do
    delete from test_indexes
    where id = :anum;

end ^
Statement prepared (elapsed time: 0.012s).
-- line 12, column 5
PLAN (TEST_INDEXES INDEX (PK_TEST_PK))
-- line 7, column 3
PLAN (PK_LIST NATURAL)


Executing statement...
Statement executed (elapsed time: 2.217s).
7505801 fetches, 1830 marks, 96397 reads, 6 writes.
711 inserts, 0 updates, 370 deletes, 481 index, 7115125 seq.
Delta memory: 412688 bytes.
TEST_INDEXES: 370 deletes.
PK_LIST: 711 inserts.
0 rows affected directly.
Total execution time: 2.267s
Script execution finished.

produces some minor changes to the data pages of the table:

Now let’s see if we can rebuild the index for FK_LKP_COL:

Starting transaction...
Preparing statement: ALTER INDEX FK_LKP_COL ACTIVE
Statement prepared (elapsed time: 0.000s).
Plan not available.


Executing statement...
Statement executed (elapsed time: 0.000s).
29 fetches, 2 marks, 9 reads, 0 writes.
0 inserts, 1 updates, 0 deletes, 5 index, 0 seq.
Delta memory: 55408 bytes.
RDB$INDICES: 1 updates.
Total execution time: 0.027s
Script execution finished.

All looking good, so far. Now let’s commit the transaction.

Committing transaction...
*** IBPP::SQLException ***
Context: Transaction::Commit

SQL Message : -901
Unsuccessful execution caused by system error that does not preclude successful execution of subsequent statements

Engine Code    : 335544345
Engine Message :
lock conflict on no wait transaction
unsuccessful metadata update
object TABLE "TEST_INDEXES" is in use

Oops.

It is clear that ALTER INDEXES …​ ACTIVE does not work on tables that are actively being changed. ❌
This is obvious really but it is worth bearing in mind. The command needs to take an exclusive lock on the table.

And how about performance?

We have established that rebuilding indexes improves the page storage of data pages and index pages but does performance actually improve? Let’s see…​

We executed this query on a table with seven million records:

select r.ID, r.COL_ID, r.TXT, r.TXT2
from TEST_INDEXES r
where TXT starting with x'aa' || x'aa'
--where TXT starting with x'aa'||x'aa'
and r.TXT2 is not null;

and it returned twelve rows.

DB before rebuilding indexes DB after rebuilding
Current memory = 53128800
Delta memory = 156672
Max memory = 53194240
Elapsed time = 0.165 sec (1)
Cpu = 0.000 sec
Buffers = 6000
Reads = 16271  (2)
Writes = 0
Fetches = 16503 (3)
Current memory = 57003328
Delta memory = 22864
Max memory = 147186672
Elapsed time = 0.083 sec  (1)
Cpu = 0.000 sec
Buffers = 6000
Reads = 7723  (2)
Writes = 0
Fetches = 7775  (3)
1 Execution time halved. ✅
2 Page reads from disc more than halved. ✅
3 Page fetches from cache more than halved. ✅

Summary

As data is changed indexes become unbalanced leading to poor density of index pages. An unbalanced index will require more page reads in order to retrieve a records.

It is clear that ALTER INDEX …​ ACTIVE will rebuild the index whether it was created by a constraint or not.

It is also clear that ALTER INDEX …​ ACTIVE must be executed for all indexes in a table in order to fully gain its benefits.

Performance will improve.

There is an overhead to execution of ALTER INDEX …​ ACTIVE however. The first execution takes the longest as it requires reading every row in the table. Subsequent executions to rebuild other indexes on the table complete far more quickly.

Indexes cannot be rebuilt if data in the table is being changed.

It is probable that the entire database needs to be put into single user maintenance mode in order to rebuild all the indexes. This is probably faster and easier than trying to gain exclusive locks, table by table.

The cost of putting the database into maintenance mode for this operation will require downtime, however this will be cheaper than a full backup/restore cycle. On the other hand, it is less effective than backup/restore at optimally filling pages. And neither does it recover lost pages.

Conclusion

Rebuilding indexes can be a definite win for performance if…​

  • The database is very large

  • The database sees many changes.

  • The backup/restore cycle is longer than the time required to rebuild indexes.

Like this post? Share on: TwitterFacebookEmail


Author

Paul Reeves

Reading Time

~12 min read

Published

Category

articles