Fabiano wrote:

Firebird 2.5, database page size 4096.

create domain d1 varchar(100) character set utf8;
create table t1 (c1 d1, c2 d1);
alter table t1 add unique (c1, c2);

create domain d2 varchar(100) character set utf8 collate unicode_ci_ai;
create table t2 (c1 d2, c2 d2);
alter table t2 add unique (c1, c2);
-- unsuccessful metadata update. key size exceeds implementation restriction for index "RDB$2".

Is this error expected?

Mark Rotteveel answers:

As far as I know collations add additional information to each key, increasing the length. The maximum length of a key is ± 1/4 of the page size (assuming a single key). For a normal UTF8 collation that would be 253 bytes (see this ). So in the normal UTF8 case your key is > 200 characters. Here I also assume there is some additional overhead as it is a compound index.

I don't know the exact size increase for a collation (and I suspect it depends on the collation), but I guess this pushes it beyond the limit of 1015 bytes allowed per key. The simplest solution is to move to a greater page size. Another option of course is to check if you really need VARCHAR(100) for these columns.

Ann W. Harrison answers:

Yes, and the solution is to increase the page size. Here's what's going on. Without a specific collation, Firebird orders strings by the binary value of the character, leading to the well-known ugliness that all upper case characters sort before all lower case characters and accented characters appear even later.

For more elaborate collations, the key is broken down into a base character (e.g. All 'A' characters (A, a, and each with accents), and another byte is added to the end of the string to hold the more subtle effects. If I were guessing, I would have said that unicode_ci_ai could get by with just the base character, but the code Firebird uses for key generation obviously thinks otherwise.

You've already got a key that's right on the edge - two 100 character UTF8 fields. Compound keys are potentially larger that single field keys in an effort to make them actually smaller. What Firebird is doing is suppressing trailing blanks on the first column and using a marker byte every four bytes to indicate which field the value comes from. Assuming the worst case (which is pretty much necessary), you might have two 100 character fields each character of which is three bytes (or four?). That's 600 or 800 bytes. Add in 120 or 160 marker bytes. Then add an extra byte per character for secondary characteristics...

The actual keys will be much smaller because you'll never have a full 100 characters in the upper planes of Unicode. Back in early ancient history (InterBase V1.0) we allowed any key definition and threw a runtime error if the key was too bit. Users didn't like that very much and the problem was much more common back in the days before multi-byte character sets.

Like this post? Share on: TwitterFacebookEmail


Related Articles


Author

Firebird Community

Published

Category

Gems from Firebird Support list

Tags