RFC: Tablespaces

By Dimitry Yemanov

Posted on the Firebird-Development list 2nd March 2016

Historically, Firebird databases consist of a sequential set of pages of the fixed size (4-16KB currently). This page set is distributed across one (usually) or multiple files (*) The page number initially was SLONG, now it's ULONG. So the theoretical possible maximum database size is currently limited to 2^32 * 16KB.

When we speak about tablespaces, it usually means that the database consists of multiple files and different database object are stored in different files. Each such file is named within a database and called a tablespace. And each tablespace has its own page set and page numbering.

A typical usage pattern is that tablespaces are used to separate table data from indices (and logs from the rest of the database) and thus allow better concurrent performance due to parallel I/O. Often it's argued that RAIDs now handle the same job and maybe even better. For many usage cases - maybe. But I'm pretty sure that opposite cases are also possible, when a carefully designed partitioning could outperform automatic RAID data management.

Another usage case could be extending the database size beyond the current limits. The current limit is 64TB, the biggest FB database I know about is 7TB. Not that far, I'd say. The limit may be shifted with even larger page sizes, but it has its drawbacks as well.

Someone may think about per-tablespace physical backups and other possible usage cases. So I'm sure this feature is something to be at least considered. From another side, tablespaces complicate maintenance, so it's something more for enterprise users rather than for common Firebird users.

Now back to the code. During the Firebird development, we have introduced a concept of "page spaces", represented with a PageSpace class. It implements a two-level numbering for database pages: pagespace ID + page number. The whole engine is aware of that. Default pagespace (ID == 0, IIRC) is reserved to the database file(s). Non-zero pagespace IDs are currently used for GTTs (global temporary tables) that have their data/indices stored in temporary files.

Technically, nothing prevents us from declaring named tablespaces via DDL (CREATE/ALTER/DROP TABLESPACE?), storing their definitions inside the metadata (RDB$TABLESPACES table?), allocating some pagespace ID to the every tablespace, and allowing to specify a tablespace when creating database objects (tables, indices, what else?).

Of course, there are more details hidden that must be addressed. Maybe I'm missing something in my review. But I think this thread could be a good starting point for discussion.

Others are welcome to contribute their thoughts.

(*) My personal opinion is that legacy multi-file databases must die, preferrably in Firebird 4. They make zero sense in modern filesystems. They're not supported by nbackup. They may complicate implementation of tablespaces. Anyone here still using multi-file databases?