mpp12310 wrote:

When I try to create a table with the Primary Key field being VARCHAR of length 255, and run Compile, I get an error with:

SQL Code = -607
IBError Number: 335544351

I checked my Database page size and it is 4096 (default).

Mark Rotteveel answers:

Are you using UTF-8 as a character set? Indices on (VAR)CHAR fields work only for limited lenghts

For a page size of 4096 it is 253 characters if you use UTF8.

mpp12310 adds:

Yes, I am using UTF8 as the character set; I need to ensure that the data supports UNICODE and UTF8 fits best with Delphi 7.

I did not know about the restrictions, based upon character length.

Ann W. Harrison answers:

Not exactly character length.

Here's a longer explanation of the maximum key size.

The relationship of key size to page size is important because a key that's larger than 1/3 of the page size will cause the index to degenerate from a tree to a list - every new entry causes splits all the way up the tree. Not good. With bad luck, a single insert can turn into an infinite split. Very bad for performance and definitely something to avoid.

Character keys are often bigger than they appear to be, first because of multi-byte character sets and then because of collations. Only the simplest collations generate keys that are the same size as the original data. http://unicode.org/reports/tr10/#Multi_Level_Comparison.

Several decades ago, the InterBase group decided that the maximum allowable key size would be set at definition time based on the ratio of the largest possibly representation of a key to the page size. In fact, index keys are compressed in two ways: trailing spaces in strings and trailing zeros in numbers are eliminated before concatenation in the case of compound keys, and prefix compression on the whole key which eliminates any bytes at the start of the key which duplicate the previous key. So, generally, keys are smaller than their maximum size. That's even more true now than it was in 1985 when almost all data was ASCII, using a binary collation. The most heavily used characters in UTF8 are one or two bytes, not three or four. Most characters do sort simply on their base identity.

On the other hand, the limits enforced are pretty minimal and an index where every key was actually at the limit would be very deep and inefficient. So the fact that gstat shows that you're getting 20 entries per page when the key calculation said you'd get five is really a good thing.

Returning to the question at hand, its not the character length that determines the size of an index key, exactly, but the maximum number of bytes required to represent any character in the character set plus the maximum number of bytes required to represent the levels in the collation, with the sum of those two multiplied by the number of characters.

Like this post? Share on: TwitterFacebookEmail


Related Articles


Author

Firebird Community

Published

Category

Gems from Firebird Support list

Tags