Bruce Dickinson wrote:

In next table I want to keep key-value data associated with some object (ID_MASTER) in master table. Currently I have 59 such parameters. Most of the time I am using only 2-5, so I thought that this structure will be ideal to save some space on the disk.

CREATE TABLE DETAILS_DATA
 (ID_MASTER INTEGER NOT NULL,
  ID_PARAM INTEGER NOT NULL,
  PARAM_VALUE VARCHAR(64) NOT NULL
 );

ALTER TABLE DETAILS_DATA ADD CONSTRAINT FK_DETAILS_DATA_ID_MASTER
  FOREIGN KEY (ID_MASTER) REFERENCES MASTER_DATA (ID);
CREATE UNIQUE INDEX UNQ_DETAILS_DATA ON DETAILS_DATA (ID_MASTER,ID_PARAM);

However, after running some tests I saw that database grows quickly. So for another test I create another table, this time I've included all keys in one row. Earlier I wrote that I have 59 parameters, so I needed to create 59 columns. Here is the table:

CREATE TABLE FLAT_DATA
 (ID_MASTER INTEGER NOT NULL,
  P1 VARCHAR(64),
  P2 VARCHAR(64),
  P3 VARCHAR(64),
  -- P4..P57
  P58 VARCHAR(64),
  P59 VARCHAR(64)
 );

ALTER TABLE FLAT_DATA ADD CONSTRAINT FK_FLAT_ID_MASTER
  FOREIGN KEY (ID_MASTER) REFERENCES MASTER_DATA (ID);

I've put to this table exactly the same data that was in DETAILS_DATA table. Obviously, most of the P1-P59 columns were NULL.

Here is the comparision of space taken by both tables:

DETAILS_DATA:

Size of the table: 9592 MB
FK_DETAILS_DATA_ID_MASTER: 953 MB
UNQ_DETAILS_DATA: 1161 MB
TOTAL SPACE: 11706  MB

FLAT_DATA:

Size of the table: 2084 MB
FK_FLAT_ID_MASTER: 52 MB
TOTAL SPACE: 2136 MB

As you can see DETAILS_DATA takes 5 times more of space. I was completely surprised by this result, after all I am not wasting space for 50+ columns. Could you explain me this phenomenon?

Dmitry Yemanov answers:

How long are actual strings inside PARAM_VALUE? It's worth looking at the gstat -r output and compare the average record size in both cases.

With the former (DETAILS_DATA) approach, the table is very narrow. Storage overhead (record header size, 13 bytes) is nearly the same as the data itself (perhaps even more, considering the data compression). It could explain the wasted space.

Bruce Dickinson reply:

Here are the results of gstat:

FLAT_DATA (290)
  Primary pointer page: 5239142, Index root page: 5239147
  Average record length: 213.90, total records: 8288203
  Average version length: 0.00, total versions: 0, max versions: 0
  Data pages: 533746, data page slots: 533746, average fill: 88%
  Fill distribution:
   0 - 19% = 0
  20 - 39% = 0
  40 - 59% = 0
  60 - 79% = 30
  80 - 99% = 533716

DETAILS_DATA (262)
  Primary pointer page: 842, Index root page: 843
  Average record length: 20.19, total records: 164512578
  Average version length: 0.00, total versions: 0, max versions: 0
  Data pages: 2455600, data page slots: 2455600, average fill: 61%
  Fill distribution:
   0 - 19% = 0
  20 - 39% = 1
  40 - 59% = 0
  60 - 79% = 2455599
  80 - 99% = 0

I guess this confirms your statement about record header size?

FLAT_DATA record length is 213.90 which is over 10 times more than DETAILS_DATA record length 20.19. However in DETAILS_DATA we have 164512578 records which is 19.8 times more than in FLAT_DATA.

Like this post? Share on: TwitterFacebookEmail


Related Articles


Author

Firebird Community

Published

Category

Gems from Firebird Support list

Tags