Geoff Worboys wrote:

I was making some changes to a database when I noticed a query like this:

SELECT TEXTBLOB
FROM TABLEA WHERE TEXTBLOB CONTAINING 'SOME TEXT'

was suddenly taking about 4 x longer than before. (A table of almost a million records, so it was quite distinct and very consistent.)

It turns out the difference was in my collation declaration.

The old database had this:

CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;

CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE;

The new database had this:

CREATE COLLATION WIN1252_UNICODE
FOR WIN1252;

CREATE COLLATION NOCASE
FOR WIN1252
FROM WIN1252_UNICODE
CASE INSENSITIVE
ACCENT INSENSITIVE;

Rebuilt the new database without "accent insensitive" and the performance matched the old database.

I have been considering migrating to UTF8 (but keep putting it off because I've quite a bit of other work to do before that is possible.), so I also checked the UNICODE_CI_AI collation. But after testing with a UTF8 based build, even UNICODE_CI_AI is between 3 and 4 times slower than just UNICODE_CI. So it seems accent insensitive has a definite and significant cost. Maybe it's obvious to people in the know but it surprises me.

Like this post? Share on: TwitterFacebookEmail


Related Articles


Author

Firebird Community

Published

Category

Gems from Firebird Support list

Tags