Understanding the Assumptions Behind InterBases Technology
by Ann Harrison, Borland Developers Conference 1999.
The views expressed in this paper are those of its author. The author has no connection with Inprise or borland.com, does not pretend to speak for them, and has no insight into their plans.
When InterBase was created in 1984 and 1985, the design included a number of elements that were unusual. The most important was multi-generational concurrency, but many other decisions affect the way InterBase works, even today. Some of those decisions were based on technical factors that are no longer relevant; sometimes those decisions are still valid despite changes in technology; sometimes they deserve to be reexamined. This paper considers several aspects of InterBase’s technology and the assumptions implicit in their design.
One of the major features of a database system is the ability to allow many users to access and update parts of the data with conflicting with each other or compromising the consistency of data. Another is the concept of an "atomic" transaction: one that "transforms the database from one consistent state to another" without ever allowing an inconsistent state to be visible. A third is recovery: the ability of the database system to undo actions that are abandoned.
InterBase uses its multi-generational architecture to solve all three of those problems. The inspiration for the design was record shadowing, a way to provide a consistent view of data for long read-only transactions without blocking updates. As a performance optimization, back versions of rows were stored in the database, on the same page with the primary record if space existed there. Using the back versions to remove updates by transactions that rollback was a simple step. The real breakthrough was realizing that the versions could also be used for concurrency control.
Most database systems implement concurrency control with locks. Update locks prevent simultaneous users from modifying a single row. Read locks protect the consistency of a reader’s view of data. At the time InterBase was built, locking produced two problems:
- A full implementation of update and read locks reduced concurrency to an unacceptable level. A single large report could block all updates to the database.
- Locks were expensive. The lock table must be kept in memory for performance. Memory was expensive and very limited. Two or three megabytes was a large memory. Using record versions allowed InterBase to work with a lock table that was small and quite stable in size.
The cost of locks is no longer an issue. Improved techniques of lock management (lock promotion and demotion) and cheap memory solved that problem. However, the concurrency problems of read locks and the consistency problems of a system without read-locks are still with us.
One of original goals of the InterBase architecture was to support a "light weight" version of programming that didn’t require a year of data structure analysis before starting to code. One of the requirements was that metadata changes be simple and minimally disruptive. Some systems available at the time required unloading and reloading the database to make any change. At the same time, we wanted to demonstrate that a relational database could be used to solve real problems, and driving the structure of the database from the system tables seemed like a good exercise, as did allowing changes by direct updates to the system tables.
Driving the database from system tables is a good idea. They are extensible, consistent, and easy (for me) to understand. Allowing metadata updates on line is a mixed blessing. When it works, it is very useful, but keeping it working requires very careful coding, and when bugs are introduced, they are difficult to find and fix.
Allowing direct system relation updates introduces more problems. Because the system tables aren’t built to be bomb proof, a user can make an ill-advised metadata change and serious damage a database. Database elements like view definitions, triggers, and stored procedures are not simple to generate. The oldest utilities in InterBase still make changes to the metadata through direct system table updates. Middle-aged utilities use a DDL call that is more restrictive than system table updates, but compensates by being much simpler and less dangerous. Modern utilities use dynamic SQL, which is even more restrictive, without much compensation.
A 1999 design would allow on-line updates to system tables, but only through a data definition language processor or dynamic SQL. The data definition language should extend the SQL DDL to include such desirable features as increasing column sizes, changing the order of columns in a table, etc.
InterBase was designed to run in an environment where processors shared disks, but not memory. At that time, switching between process contexts was very expensive. Threading was unheard of. For those reasons, InterBase was designed to allow many independent processes to access the database at the same time. If Vax clusters had been the wave of the future, that would have been a great decision.
In the actual case, processors have become massively faster and the only multi-processor systems on the market share both disk and memory. In this world, the "process based" architecture adds little value, introduces complexity, and makes certain desirable enhancements difficult if not impossible.
Everyone’s favorite. Why are index keys so small? Why do I run out of format versions? Why does compression work so badly on empty 32Kb character strings?
The answer to nearly all those questions is the PDP-11. The PDP-11 was a 16-bit octal machine with a fixed 64Kb process space. Jim (and I) spent several years cramming complex software into that very small box. Perhaps you have a relative who survived the Great Depression and can’t let go of some penny-pinching habits? Think of the PDP-11 as the programmer’s version of the Great Depression. We just can't stop reusing tea bags.
Indexes and Placement
At the time that InterBase was designed, data placement was a major part of the tuning of a database. Tables could be clustered, hashed, stored in stripes, all based on the DBA’s understanding of the storage and access patterns. Most systems supported several different styles of index; the DBA got to trade off their benefits and drawbacks. To avoid swinging the disk head back and forth between index and data, most systems offered options for index placement, again, based on the DBA’s understanding of the data, the access patterns, and disk performance.
Since simplicity and performance were goals of InterBase, Jim chose another route. InterBase supports one table storage mechanism, a very simple "next available page" algorithm. InterBase supports one style of index, a B* tree variant. To avoid bouncing the disk head between index and data, InterBase finds all the index matches before beginning to look at data. It builds a bitmap of that data, which has two benefits. First, data is read in storage order, which compensates for the lack of data placement. Second, several indexes can be used together to restrict a single query.
The original decisions about InterBase’s cache were based on several factors that no longer apply. First, memory was very expensive and had to be used judiciously. Second, each connection to InterBase had a private cache. Third, the operating system we were most familiar with did not use a file cache, so writes were always performed in the order that they were queued. The file system also used a very small (512 byte) block size. Block writes, were expensive, but still a reasonable trade-off to keep memory usage low.
Taken together, those factors argued for a small, write-back page cache. With the advent of cheap memory, shared cache, and different file system semantics, the InterBase cache design needed a complete revaluation.
One significant difference between InterBase and other databases is that InterBase can recover from a system crash or power failure without any intervention or delay. Instant recovery is available because the InterBase "careful write" strategy guarantees that the database on disk is always consistent.
Instant recovery is not free. Careful write sometimes requires writing pages that could otherwise be left in cache. It always requires meticulous bookkeeping and programming.
Despite the enormous changes in processor speed, disk size, and the cost of memory, most of the major decisions that went into the design of InterBase still seem sound to me. But then, I’m prejudiced.