Do inserts in FB benefit from ordered or sequential Primary Keys?

Jeff wrote:

Hi all,

(Using FB 1.5)

Do inserts in FB benefit from ordered or sequential PK?

Please allow me to clarify. I intend to use High/Low for table PKs. With this approach, it is very possible that PKs will not be in sequential order as they are inserted into the db. Will this be an issue in FB?

I based this question on this article.

Helen Borrie answers:

Firebird doesn't store row data in any particular order. In fact, it doesn't even necessarily store all row data for a particular table contiguously or even on the same disk. (A Firebird database can be split across multiple physical files.) Firebird cares about "pages" and maintains an inventory of various types of pages in order to locate specified rows and the indiexes, blob data and other stuff. It can't even be said that physical storage order in any way reflects arrival sequence, since Firebird cleans up garbage and releases whole pages and also the space freed on occupied pages for future re-use.

About the article: Despite the author's stating that his article was broader than the MSSQL focus that referenced articles had, he still pitched it at the assumption that all database engines store rows in some intrinsic sort order. Firebird doesn't. I suspect that including Oracle in that bunch was an error, as I believe Oracle does not rely on storage sequence for consistency, either....however, Ann will be able to provide a much better "take" on his approach than I could.

Thomas Steinmaurer answers:

I once run a test with insertion performance in Firebird 2.5 with a) sequential PK values coming from a generator and b) via the built-in function giving me a UUID.

For me, Firebird's (bulk) insertion performance had always been better with sequential PK values, because with somehow "random" values, for actualizing the index tree, index pages might need to be re-read from disk again etc. You can somehow tackle that with a larger page cache, which allows to cache more index pages, but still, sequential PK values perform better, at least for me.

Ann W. Harrison reply:

GUIDs and UUIDs are normally generated in a way that defeats prefix compression.

Nando Dessena reply:

Reversed UUIDs, by putting the fixed part at the front, should perform better in this regard than plai UUID. I don't know if the function you have used generates that kind of UUID. I know that I once used a UDF that did that and performance was better than with (at the time, client-generated) UUIDs. High/Low is also good, as huge batch of inserts would inherently use sequential IDs, but this requires a little more setup.

Ann W. Harrison answers:

Do inserts in FB benefit from ordered or sequential PK?

Yes. Unlike the databases that the article considers, Firebird stores data and indexes separately - even primary key indexes, so its talk about "shuffling" data is irrelevant. But still there are advantages to inserting keys into an index in ascending (for ascending indexes) or descending (for descending indexes) order. The most efficient way to generate primary keys is with a generator - also called a "sequence" in newer versions.

Please allow me to clarify. I intend to use High/Low for table PKs. With
this approach, it is very possible that PKs will not be in sequential order
as they are inserted into the db. Will this be an issue in FB?

Firebird is designed to handle random inserts into indexes with reasonable efficiency, but building indexes sequentially creates a dense index at low insert cost. When creating a new index on an existing table, Firebird reads the table, sorting by they index key and builds the index from the sorted output.

Why is that fast? Two reasons, neither of which as to do with the (relatively) low cost of moving index entries in memory.

The first is that random index entries tend to land on random pages in the index. Unless you can hold the entire index (and everything else you need) in memory, that means that pages will be written and read multiple times.

Note

One of the key misunderstandings in databases is that an index is a good alternative to a sort because the cost of an index look up is K*n, while the cost of a sort is L*nlog(n). It's rarely noticed that the value of K (the cost of a page read) is huge compared with log(n).

Filling an index page with ordered values and going on to the next page is more efficient that putting one entry on one page, the next on another, and so on, even if all the pages are in memory.

The second reason to store rows in the order of their index (i.e. not just sorted, but sorted ascending for ascending, descending for descending) is prefix compression. Firebird drop the first bytes of a key if they match the previous key value. So if you store AAAAAA, AAAAAAB, AAAAAAC, AAAAABA, what goes into the index is AAAAAA, 6B, 6C, 4BA. Prefix compression makes the indexes dense, meaning that you read fewer index pages to get to the data you want. Clearly if you store those rows in reverse order, the first key in is AAAAABA. When you store AAAAAAC, the index contains AAAAAAC and the second key value becomes 5BA. Each subsequent entry requires a change to the next older entry.

Like this post? Share on: TwitterFacebookEmail


Related Articles


Author

Firebird Community

Published

Category

Gems from Firebird Support list

Tags