Dynamic SQL, Plumbing, and the Internal API
From Jim Starkey on the Firebird Development List 25th January 2004
Dynamic SQL handling in Firebird is, well, less than optimal. Vulcan is rapidly approaching a point where significant improvements in efficiency are possible. There are many possibilities that need to be carefully considered. To decide where we want to go, we need to understand where we are.
Where we are is a function of how we got here, rather than an actual design. I thought it would make some sense to talk about the history first, then the Firebird implementation, the Vulcan implementation, and sketch some of the directions we could go in.
Where It Began
When I s tarted the Rdb family in 1981 and Interbase in 1984, there were three contenders in the data manipulation language sweepstakes:
- Ingres, university and commercial, used QUEL, an instantiation of Codd's Alpha, used in his early relational database papers.
- Oracle, and to a confused degree, IBM, were pushing Sequel (originally a pun on QUEL, that was later renamed SQL)
- DEC was pushing the Datatrieve-like language that evolved from my work on the Arpa-funded Datalanguage.
- IBM was also pushing query-by-example as a non-syntax front end, but without success.
It was not clear which, if any, would win, though the smart money was on
Database API thinking at that point revolved around preprocessors. Preprocessors allowed integration of civilized DML technology into host language. They also allowed substantial processing to be done at precompilation times, reducing the runtime load on database engines.
BLR, SQL And The API
BLR (binary language representation) came out of this environment.It was designed for machine generation and processing, not for human consumption. For anyone doing a serious 4GL it was a delight. For anyone using a preprocessor it was below the horizon. For anyone wanting to generate ad hoc queries, it was a disaster.
While we we considering whether to design a dynamic SQL, dynamic GDML, or a civilized BLR generation library, somebody offered us a heap of money to implement dynamic SQL. Since we were also starving, that made the choice very easy.
The Interbase philosophy, bordering on a mantra, was "SQL: we don't fix, we don't extend it, we don't change it, we just implement it; if you want something intelligent, use GDML". This made the choice of dynamic SQL APIs simple. We decided to adopt the IBM DB2 dynamic SQL API and accoutrements lock, stock, and barrel. So that was the API.
I designed the original implementation as a client-side library. It was pretty much a no-brainer, since it had to work with all versions of Interbase as well as our gateways to other database products (primarily Oracle and Rdb). The downside was the DSQL required metadata to process queries, which induced a startup cost, but also provided some database access efficiencies.
At the time Ashton-Tate bought the company, DSQL was a well-established, client-side component of Interbase. Things changed radically after Borland bought Ashton-Tate. Borland proceeded to butcher not only the company (only about 10 of the 70 Interbase employees were retained by Borland) but the product as well.
The Borland Re-implementation
One of Borland's first major changes to the product architecture was to move DSQL from the client to the server. On one hand, this made sense, since all SQL emanating from Borland's non-Interbase data management products was dynamic SQL strings. On the other, since they needed it on the server in the worst way, they implemented it in the worst way.
The Borland dynamic SQL architecture (such as it was) kept the implementation almost unchanged, but moved it from the client to the Y-valve on the server. It was neither a layer on nor a subsystem under the Y-valve, but a wart on the side. It was still single-connection, still required metadata access to process SQL statements, and still made formal external engine API calls.
Wire Transport, Borland-style
Moving the implementation from the client to the server introduced another problem. For better or worse, the DSQL implementation used a rather stupid data structure called the SQLDA (SQL data access or something like that), which was later extended to be an equally stupid but more comprehensive structure called the XSQLDA. The SQLDAs were used to pass both data and metadata across the DSQL interface.
Shoving data structures across wires isn't particularly difficult—xdr, which Interbase shipped before Sun, handles the problem rather elegantly. But, for reasons known only to the developers and perhaps the Scotts Valley health authorities, the Borland developers chose a different path. I'd like to think that they had a professional aversion to cramming structures across formal API boundaries, but I don't think so.
The Borland (and current Firebird) dynamic SQL passes SQLDAs by processing the SQLDA to produce an BLR message declaration describing the data represented in the SQLDA, and a corresponding message structure. A populated SQLDA is first transformed in a BLR string and a corresponding message buffer. The data from the SQLDA is copied to the message buffer. The Y-valve then passes the length and address of the BLR string, the length and address of the message buffer, and the message number (which isn't used, but it must have seemed like a good idea at the time).
For example, one of the primary DSQL API functions, isc_dsql_execute, takes a previously prepared DSQL statement, an input SQLDA, and an (optional) output SQLDA as arguments. The Y-valve then transforms both the input and output SQLDAs in respective BLR string and message buffers, copies data from the input SQLDA into the input message. It passes them along to isc_dsql_execute2, which passes the now huge number of arguments to the appropriate subsystem. On the way out, isc_dsql_execute copies data from the output message to the output SQLDA. The isq_dsql_execute2 function is formally part of the API, but as far as I know, has never been used by anything but the Y-valve. If anyone knows differently, I want to know about it, now!
The Remote Interface
If the connection is to a remote server, the remote interface first passes verb, handle, and the input BLR string across the wire, then invokes the standard BLR-based XDR data mapping for the input data. It then passes the output BLR string so the remote server will know how to send back any resulting data.
The Local Interface
If the connection is a local engine, things are substantially different. As part of the DSQL prepare operation, the local DSQL parsed and analyzed the SQL string, and built an equivalent BLR string to implement the statement's semantics. The DSQL request keeps track of the formats of the input and output messages of the BLR request. When the local DSQL gets the input blr and message strings, it parses the BLR string to get the format of the message buffer, then copies the data from the DSQL message to the BLR message.
This is a lot of copying.
Let's assume a dynamic SQL query of select last_name from employee where ssn=:ssn
The order of battle goes like this:
- Client prepares the query which, as a side effect, formats the SQLDAs.
- The client puts the address of "ssn" into the input SQLDA.
- The client calls isc_dsql_execute with the input SQLDA.
- The Y-valve munches the SQLDA, creating a matching BLR record definition and a corresponding message buffer.
- The Y-valve copies the data from the SQLDA into the message buffer.
- The Y-valve calls isc_dsql_execute2.
- The isc_dsql_execute2 (this is remote request) calls REM_execute2. REM_execute2 passes, among other things, the BLR length and BLR string, then walks the message (driven by BLR string) sending data over the wire in canonical network format.
- The remote server function execute_statement gets the BLR length and string from the wire, allocates a buffer, and maps from the data from wire in canonical network format to local format. It also gets the output BLR length, output BLR string, and buffer length for later use.
- The remote server calls the Y-valve isc_dsql_execute2 with its glorious 13 arguments.
- The Y-valve passes the 13 glorious arguments to DSQL.
- DSQL parses the input BLR string, using this to copy data from the input message to the request message.
- DSQL then calls the Y-valve isc_start_and_send with its message. The Y-valve calls the engine JRD8_start_and_send to actually do something.
If everything was OK, the status rolls all the way back to the client. When the client calls isc_dsql_fetch, the ballad resumes:
- The client sticks the address of his variable to hold last_name in the output SQLDA.
- The client calls isc_dsql_fetch
- The Y-valve munches the SQLDA, creating a matching BLR record definition and a buffer big enough to hold the record.
- The Y-valve calls isc_dsql_fetch_m with the length and address of the BLR string, the address and length of the message buffer, and a useless message number.
- The Y-valve calls the remote dsql_fetch, which puts the BLR length, BLR string, message length, and useless record number over the wire.
- The remote server fetch picks up the stuff off the wire and calls the Y-valve fetch.
- The Y-valve fetch calls the DSQL fetch
- The DSQL fetch calls the Y-valve isc_receive
- The Y-valve calls the engine JRD_receive, which returns a formatted message.
- The DSQL fetch parses the BLR string to copy the data from the request record (returned by engine) to the DSQL message.
- The remote server parses the BLR string to copy the data from the request record to the wire format.
- The remote interface parses the BLR string to copy the data from the wire format to the DSQL message.
- The Y-valve copies the data from DSQL message to the SQLDA.
- The user gets his data.
That's a lot of copying. And most of it isn't necessary.
Wire transport, Vulcan-style
One of the changes I've made in Vulcan is to move DSQL from the Y-valve to the engine. In itself, this doesn't do anything to reduce the copying, but it does allow DSQL to share compiled statements (not yet activated), use internal engine metadata, and go straight to the engine, skipping another trip through the Y-valve. Over time, it should also permit, yea encourage, closer integration of engine and SQL, which is a very good thing.
There are constraints and there are alternatives.
Here are the constraints on a redesign:
- The DSQL API is sacrosanct.
- We're stuck with both the SQLDA and XSQLDA.
- We need to be able to talk to old servers.
- Existing programs need to be able to talk to new servers
The blr/msg forms of DSQL calls can probably be dropped, though the existing remote protocol makes this unfeasible.
Architectural alternatives abound, including:
- Pass the SQLDA/XSQLDAs through the Y-valve to the providers.
- Extend the engine with a BLR verb to give direct access to a SQLDA variable.
- Teach the remote interface to map SQLDAs directly.
- Variation: map SQLDA/XSQLDA to something less stupid and pass it instead.
- Implement a full-bore JDBC interface through the plumbing into the engine.
- Reimplement DSQL as a thin but ugly client service layered on JDBC.
Thoughts? Comments? Suggestions?
If the current architecture makes sense to you, you probably don't understand it yet. Keep trying.