Practical Use of the RDB$DB_KEY
By Bjørge Sæther
About the sample code
The examples below are the results of months of efforts trying to perform a large financial analysis on a large database (> 4Gb) within reasonable timings. The real breakthrough came with the introduction of the RDB$DB_KEY. In addition to enormous speed increase, it minimized the need for indexing, thus making both development and implementation both safer and easier.
It should be emphasized, though, that you should make comparisons from time to time. After "having seen the light", you may easily overlook simpler (and faster) possible statements.
The samples could of course be rewritten in a more tutorial fashion. This takes quite some time, and I also wanted to show a few real-world examples (as opposed to the "CUSTOMER - like" examples).
And remember: These procedures are being run on tables having up to 6 million rows. The "success stories" of some of them is like reducing execution time from 1 hr 30 min down to 7 min when introducing the db_key.
Another tip when working with large databases: Field/Record size.
Although VARCHAR fields don't occupy more space in the database than the actual string lengths require, there is one often overlooked consequence of having some "head room" in VARCHAR field sizes:
When selecting rows from a table, equally sized record buffers are allocated, so the size is determined from max possible VARCHAR lengths. This is true when selecting from a client, but also when the server is storing a copy of the record (multi-generation handling) while updating & deleting !
So, watch out ! The following update statement will result in the server allocating 100Mb of RAM per some 80-90,000 records.
CREATE TABLE TEST ( ID INTEGER, STATUS INTEGER, SOMESTRING VARCHAR 1024); UPDATE TEST SET STATUS = 1;
...so, trim your fields !
The RDB$DB_KEY may be used is an unique key representing no cost in maintenance or in use. In stored procedures and as a parameter in an update- or delete-sql it is the most efficient record identification you may find in InterBase.
The db_key is not troubled with index balancing,
The db_key is available also when having no unique index. Very useful when it is desirable to drop a unique key while performing updates or deletes.
The db_key does not depend on available server RAM to be efficient, so it's more and more useful as the ration [size of dataset] / [database RAM size] grows. It's possible to perform fast updates/deletes with very little RAM allocated on the server using the db_key.
The db_key is slightly faster than locating rows through an unique index.
No PLANS necessary whatsoever.
Use DB_KEY with a FOR SELECT loop when a large part of the records are to be updated, and you would normally use a where-clause.
In such cases, indices are not very efficient.
Example: Different possible sources for getting a certain value, no joins. Approx. 3/4 of the records will be updated
CREATE PROCEDURE COMPUTE_DELIVER_BONUS AS DECLARE VARIABLE v_KEY CHAR(8); DECLARE VARIABLE v_V0 VARCHAR(10); DECLARE VARIABLE v_V2 VARCHAR(10); DECLARE VARIABLE v_V6 VARCHAR(10); DECLARE VARIABLE v_VK1 VARCHAR(10); DECLARE VARIABLE v_OT VARCHAR(2); DECLARE VARIABLE v_BONUS FLOAT; BEGIN FOR SELECT RDB$DB_KEY, V0, V6, VK1, V2, ORDER_TYPE FROM INVOICE WHERE DIREKTE <> 9 INTO :v_KEY, :v_V0, :v_V6, :v_VK1, :v_V2, :v_OT DO BEGIN SELECT MAX(VK2_NUM1) FROM VK2_IMP WHERE VK2 = :v_V6 AND VK2_TEKST1 = :v_VK1 AND (VK2_NAVN ="A" OR VK2_NAVN = :v_OT) AND :v_V2 >= VK2_TEKST2 AND :v_V2 <= VK2_TEKST3 AND :v_V0 >= VK2_TEKST4 AND :v_V0 <= VK2_TEKST5 INTO :v_BONUS; IF (:v_BONUS = 0 OR :v_BONUS IS NULL) THEN BEGIN SELECT MAX(VK2_NUM1) FROM VK2_IMP WHERE VK2 = :v_V6 AND VK2_TEKST1 = :v_VK1 AND (VK2_NAVN ="A" OR VK2_NAVN = :v_OT) AND :v_V2 >= VK2_TEKST2 AND :v_V2 <= VK2_TEKST3 AND VK2_TEKST5 = "0000000" INTO :v_BONUS; END; IF (:v_BONUS IS NULL) THEN v_BONUS = 0; UPDATE INVOICE SET COST5 = BASIC_COST * :v_BONUS /1000000 WHERE RDB$DB_KEY = :v_KEY; END END
Avoid creating an index needed only for conditional updates/deletes, replace with FOR SELECT loop and DB_KEY
Indexing has a cost: It's slowing down delete and update. And, if you do a lot of changes in indexed columns, indices are getting unbalanced, with a penalty in performance.
When you need to do a conditional update / delete on a table it may be a solution to use the db_key, where no indices are needed. The larger the ratio updates / rowcount, the larger is the gain of using db_key.
Setting a value in table based on values in joined tables - use an optimized FOR SELECT loop with DB_KEY and values from joined tables selected into variables
One way to do fast updates when you want to set a value in a table based on field values of a related table, is to create one FOR SELECT statement with related tables joined, and perform updates based on the db_key value of the driving table.
Example: Updating value in smaller table with value retrieved from master table
CREATE PROCEDURE UPD_RESKONT_IMP_FROM_Fimp AS DECLARE VARIABLE v_DBKey CHAR(8); DECLARE VARIABLE v_Faktno VARCHAR(10); DECLARE VARIABLE v_FaktDato DATE; DECLARE VARIABLE v_K0 VARCHAR(10); DECLARE VARIABLE v_FAKT_PERIODE INTEGER; DECLARE VARIABLE v_IMPORT_PERIODE INTEGER; DECLARE VARIABLE v_RESKONT_PERIODE INTEGER; BEGIN SELECT CAST(DATA AS INTEGER) FROM PROD_PARAMS WHERE ID_KEY = "IMPORT_PER" INTO :v_IMPORT_PERIODE; FOR SELECT R.RDB$DB_KEY, F.FAKTNO, F.FAKT_DATO, F.K0, F.FAKT_PERIODE FROM RESKONT_IMP R LEFT JOIN INVOICE_IMP F ON (R.FAKTNO = F.FAKTNO) WHERE F.FAKT_DATO IS NOT NULL OR R.IMPORT_PERIODE = :v_IMPORT_PERIODE INTO :v_DBKey, :v_Faktno, :v_FaktDato, :v_K0, :v_FAKT_PERIODE DO BEGIN IF (:v_FaktDato IS NULL) THEN v_RESKONT_PERIODE = NULL; ELSE v_RESKONT_PERIODE = :v_IMPORT_PERIODE; UPDATE RESKONT_IMP SET FAKT_DATO = :v_FaktDato, FAKT_PERIODE = :v_FAKT_PERIODE, IMPORT_PERIODE = :v_RESKONT_PERIODE, K0 = :v_K0 WHERE RDB$DB_KEY = :v_DBKey; END /*FOR SELECT*/ END
Example: Joining master table with 2 smaller tables
CREATE PROCEDURE ADD_CENTRALLAGER AS DECLARE VARIABLE v_KEY CHAR(8); DECLARE VARIABLE v_IsCL VARCHAR(2); /*K4 er Centrallager*/ DECLARE VARIABLE v_HasCL VARCHAR(2); /*K4 har Centrallager*/ DECLARE VARIABLE v_V7 VARCHAR(2); BEGIN FOR SELECT F.RDB$DB_KEY, K_Has.K2_TEKST1, V_Is.V7 FROM INVOICE_IMP F LEFT JOIN K2_IMP K_Has ON (F.K4 = K_Has.K2) LEFT JOIN V7_IMP V_Is ON (F.K4 = V_Is.V7) INTO :v_Key, :v_HasCL, :v_IsCL DO BEGIN IF (:v_IsCL IS NULL) THEN :v_V7 = :v_HasCL; ELSE :v_V7 = :v_IsCL; UPDATE INVOICE_IMP SET V7 = :v_IsCL WHERE RDB$DB_KEY = :v_KEY; END END
Example: Joining master table with smaller table joined twice
CREATE PROCEDURE ADD_REGIONS /*Region*/ AS DECLARE VARIABLE v_KEY CHAR(8); DECLARE VARIABLE v_K3_S VARCHAR(2); DECLARE VARIABLE v_K3_L VARCHAR(2); BEGIN FOR SELECT F.RDB$DB_KEY, KSelg.K3, KLev.K3 FROM INVOICE_IMP F LEFT JOIN K2_IMP KSelg ON (F.K2 = KSelg.K2) LEFT JOIN K2_IMP KLev ON (F.K4 = KLev.K2) INTO :v_KEY, :v_K3_S, :v_K3_L DO UPDATE INVOICE_IMP SET K3 = :v_K3_L, K9 = :v_K3_S WHERE RDB$DB_KEY = :v_KEY; END
Example: Master table joined with two smaller tables
CREATE PROCEDURE ADJUST_MHO_FIMP AS DECLARE VARIABLE v_KEY CHAR(8); DECLARE VARIABLE pV7 VARCHAR(20); /*=Centrallager*/ DECLARE VARIABLE pV5_T1 VARCHAR(20); /*=MHO Clager*/ DECLARE VARIABLE pV5_T2 VARCHAR(20); /*=MHO Inte Cl*/ DECLARE VARIABLE pMHO VARCHAR(5); /*=MHO justerad*/ BEGIN /* Sett justert MHO for selgende filial som ikke er sentrallager*/ FOR SELECT F.RDB$DB_KEY, V.V5_TEKST1, V.V5_TEKST2, CL.V7 FROM INVOICE_IMP F LEFT JOIN V5_IMP V ON (F.V5 = V.V5) LEFT JOIN V7_IMP CL ON (F.K2_LIKO = CL.V7) INTO :v_KEY, :pV5_T1, :pV5_T2, :pV7 DO BEGIN IF (:pV7 IS NULL OR :pV7 = "") THEN pMHO = :pV5_T2; ELSE pMHO = :pV5_T1; UPDATE INVOICE_IMP SET MHO_SELG = :pMHO WHERE RDB$DB_KEY = :v_KEY; END END