woodsmailbox wrote:

A huge sort table gets created (and lousy performance) depending on what I put in the select list, not what I put in the order-by clause. SORT is the root of the plan, but why does it include other columns that are not included in the order-by clause in it? I'd wish it would choose to retrieve the other values _after_ the SORT. Currently, I trick the optimizer by creating a view on top, but this trick might not work in a future version of firebird and I'll be back where I started.

Btw, in fb 2.1.2 or 2.5, does the optimizer try to inline stacked views (views that select from views)?

Ann W. Harrison answers:

Generally, it is much faster to retrieve records in storage order than to access them randomly. Unless you happen to store records in the order that you're sorting them, selecting after the sort will produce much more I/O on the database than you're currently seeing on the sort file.

[Optimizer inlines stacked views] when possible, which it is if the inner views are simple joins, meaning that they don't includes group by or aggregation. That's been the case since Firebird was InterBase.

woodsmailbox reply:

Not sure how your answer is related to the problem described. Firebird builds a 1.2GB sort table for ~40 seconds while CPU = 100%.

This only happens when I also include a 4k utf8 varchar column in the _select list_. If I don't include that column in the select list, fb makes a sort table of 300K. The respective column is not part of the ORDER BY clause, so it shouldn't affect performance that much. This is quite a show stopper for me since the query is very simple and the select is a simple join of two tables of 6000 x 20 rows. Also, the execution plan shows no NATURAL joins, and SORT is at the root of the plan. The performance is so lousy for such small tables that I incline to think it's a bug. Besides, how can it build a 1.2GB sort table from a 45MB database on a simple join + order by? Is this normal performance?

Ann W. Harrison answers:

I tried to answer you're question which was, if I understand correctly, why doesn't firebird retrieve only the sort keys on the first pass, then sort them, then go back and retrieve all the other items in the select list. It doesn't because generally, I/O to the database is the most expensive operation it does, and doing it twice, once in storage order and once in random order relative to storage is likely to be slower than retrieving all the rows in the first pass.

"This only happens when I also include a 4k utf8 varchar column in the select list."

Yup. A 4k utf8 varchar field generated a buffer of 12 or 16k - depending on whether the current code is prepared to handle the absolute worst case. In this case, a blob would be a lot faster.

The first element of the plan is the last one executed - which would be the sort. Firebird normally executes indexed access in two stages, first getting all qualifying values from the index and setting bits in a storage-order bitmap, then second retrieving the rows in bitmap order. So you get storage order access even when an index is used.

Dmitry Yemanov adds:

First issue is the one explained by Ann: the engine always prefers two sequential (i.e. storage order) scans for a sort instead of a mixed sequential / random approach. This is by design and it cannot be turned off. The problem you experience is caused by the fact that the sort records are stored expanded, while they're compressed on data pages. This is why you see the big I/O difference. This could be improved via a proper tuning of the SortMemUpperLimit / TempCacheLimit parameters, provided that you have plenty of RAM on board. This is a design limitation and not a bug.

The second issue is that the optimizer is not clever enough to apply the sort to the deepest stream possible and only then join the other streams. It's not always possible and sometimes it could perform worse than sorting the entire resulting row set, but sometimes it could be very useful. Strictly speaking, this isn't a bug either, but an improvement request could be evaluated by the team.

Note

You can reliably convince firebird that it should perform the actions in the right order using a derived table (put the order by clause there).

Like this post? Share on: TwitterFacebookEmail


Related Articles


Author

Firebird Community

Published

Category

Gems from Firebird Support list

Tags