|
|
IBPhoenix Research |
|
Java UDF Functional Specification | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Contents
IntroductionThe term native UDF is used to distinguish UDFs which use the standard C linkage conventions. C, C++, and Delphi compilers can create UDF libraries (DLLs or Unix shared libraries) which use the C calling convention. Native UDFs are deployed natively as shared UDF libraries
( For example, a language such as Delphi can represent the C data type
type
ISC_QUAD = record
isc_quad_high : Integer ;
isc_quad_low : Cardinal ;
end;
PISC_QUAD = ^ISC_QUAD;
Such a pointer may be passed on the stack to the Delphi UDF in exactly the same way as it would be passed to a C UDF. So executing native UDFs for any compiled language supporting C data type representations and C calling conventions requires no additional support within the InterBase engine. Java, on the other hand, is not compiled, and with no pointers or structures, has no language support for C data type representations, and does not support native C calling conventions. So executing Java UDFs requires communication with a Java Virtual Machine (JVM), and Java objects will need to be created within the InterBase engine in a portable way, independently of the object representation used by any particular JVM implementation. The Java Native Interface (JNI) provides a mechanism for creating and
manipulating portable Java objects as internal C structures and
pointers. So for example, the representation for an 8-byte C date structure
( Note: Beyond the lack of language support for C data type
representation, in fact, the Java Language Specification does not even define
how objects are laid out in memory. Java does not expose the structure of
objects. So while a string in C has a known format as a sequence of character
bytes in memory, this is not the case for a So there are three key requirements for supporting Java UDFs from within InterBase
The functional implications of these requirements are detailed in section User Interface/Usability, but may be summarized briefly as follows
This specification assumes an understanding of native UDFs, and does not attempt to describe the meaning of UDFs in general, except where the behavior of Java UDFs differs from that of native UDFs. Also note that Sun changed the name of Java 1.2 to Java 2, and JDK 1.2 to Java 2 SDK. These terms are now used synonymously. DescriptionSupport for Java UDFs (User Defined Functions) will allow for an external library of Java classes and methods to be utilized anywhere that a SQL function may be used. This provides for runtime SQL execution to perform data manipulation tasks by communicating directly with a Java Virtual Machine (JVM) local to the InterBase server. Through the use of the Java Native Interface (JNI) we can embed and use a Java VM in a standardized way that will work with any VM implementation supporting the JNI. All future evolutions of the JNI will maintain complete binary compatibility. The recursive evaluation (execution) of SQL containing a Java UDF
invocation will perform all necessary conversions of the UDF arguments and UDF
return values from the InterBase native datatype structures to the
corresponding data and character representations used by Java. For example,
InterBase User Interface/UsabilityWhen defining a Java UDF, there are two declarations to consider. The first declaration is for the actual Java Method in some external Java class library. The second declaration is for the UDF itself as declared to a database. Although the two declarations must correspond by the typing of their arguments and return value, they are nonetheless distinct declarations, and they will be referred to as the Java Method Declaration and the Java UDF Declaration. For native UDF declarations, UDF type names, rather than C type names, are used to denote the types of UDF arguments and return values. For example, DECLARE EXTERNAL FUNCTION foo BLOB, // C type is a Similarly for Java UDFs, an extended class of UDF type names are used to denote the types of UDF arguments and UDF return values rather than the actual Java type names used in the Java Method Declaration. For example,
DECLARE EXTERNAL JAVA FUNCTION foo
BLOB,
// Java type is class
Although Java type names are not used in the Java UDF declaration, each UDF datatype corresponds strictly with a Java type or class. Note: When a Java UDF is invoked by InterBase, arguments are provided whose engine native types must be converted to the corresponding Java types. Because of the necessary datatype conversions from InterBase native structures to Java representations, Java UDF invocations must be distinguishable from native UDF invocations. Given a Java UDF declaration, it must also be possible [for the user] to infer the Java types of the corresponding method arguments and method return value. Therefore the SQL syntax for declaring a UDF must provide a means to indicate that the UDF is a Java UDF, as well as provide a means to indicate the Java UDF types of the UDF arguments and UDF return value. Java UDF DatatypesThe correspondence between the datatyping of Java UDFs and their corresponding Java Methods is as follows:
So, for example, a Java UDF may be declared to the database using a type
name of Java UDF Declaration Syntax and SemanticsThe Java UDF declaration syntax (DDL) must be supported by DSQL, ISQL (which, as a design note, happens to be built on top of DSQL), and GPRE. The Java UDF invocation syntax (DML) will be identical with the native UDF invocation syntax. Which Java method is actually executed as a result of a Java UDF invocation depends on three settings of class name, method name, and classpath. The proposed syntax for declaring Java UDFs will follow. Please see the section Syntax Conventions for a description of the extended BNF notation used below. The LALR(1) syntax for Java UDFs is deferred as a design consideration.
DECLARE EXTERNAL JAVA FUNCTION udf-name
[ java-udf-datatype .,..]
[ RETURNS { java-udf-datatype
| PARAMETER argument-position } ]
CLASS "class-name"
METHOD "method-name";
java-udf-datatype ::=
JSTRING (maximum-character-length)
| NUMERIC(p,s) | NUMERIC(p) | DECIMAL(p,s) | DECIMAL(p)
| DATE | TIME | TIMESTAMP
| BLOB
| DOUBLE PRECISION
| INTEGER
| SMALLINT
The semantics of java-udf-datatype have already been described under
Java UDF Datatypes above. For native UDFs, type
Note: The number of parameters to a native UDF is limited to 10. There is no such limit to the number of parameters to a Java UDF. Note: For simplicity, java-udf-datatype is used for both
input-parameter datatypes and return-parameter datatypes. However, certain
limitations are imposed on the syntactic rules. In particular, for both native
UDFs and Java UDFs, a Note: Memory management of returned values from Java UDFs does not
need to be explicitly controlled by the user as with native UDFs via Note: The setting of a classpath will be a JVM configuration, and not a Java UDF setting. Public Class Signature for com.borland.interbase.Blob The user interface, or class signature, provided in support of type
package com.borland.interbase;
/**
* This class represents a Blob as passed to a Java UDF.
* A Blob UDF cannot open or close a Blob,
* but instead invokes Blob methods to perform Blob access.
* A UDF that returns a Blob does not actually define a return value.
* Instead, the return-Blob must be passed as the last
* input parameter to the UDF.
**/
public class Blob
{
/**
* Read a Blob segment into a buffer, and return the number
* of bytes read.
**/
public int getSegment (byte[] buffer)
/**
* Write a Blob segment of bytesToPut bytes from a buffer.
**/
public void putSegment (byte[] buffer, int bytesToPut);
/**
* Returns the total number of segments in the Blob.
**/
public long numberOfSegments ();
/**
* The size, in bytes, of the largest single segment in the Blob.
**/
public int maxSegmentLength ();
/**
* Returns the actual total size, in bytes, of the Blob.
**/
public long size ();
}
JVM ConfigurationA JVM may be shared by the InterBase server and all its connections (users). The JVM is thread-safe and therefore may be shared by concurrent query threads. The JVM must be configured when the JVM is initialized, so the JVM may only be configured once after the InterBase server is started, and the configuration must be at the server level. If the JVM is to be reconfigured, the InterBase server must be shutdown and restarted. Let's consider the functional requirements for a configurable JVM. Functional Requirements For A Server-Wide Configurable JVM First off, we'll need to have a way to configure the server to enable Java
UDF support. This could be an option in the LOAD_JAVA_VIRTUAL_MACHINE TRUE or it could be a system environment variable of the same name. The default
for When the JVM is initialized, the classpath for all user-defined Java classes must be supplied. The classpath indicates the location of all Java class libraries for the Java UDFs and must be local to the InterBase server. By default, the classpath for all user-defined Java functions could be <interbase-dir>/java_udfs where All directories and jar files in the classpath setting are separated by semi-colons according to the standard Java conventions for setting classpath on Windows. Here is an example setting: JAVA_UDF_CLASSPATH c:interbasejava_udfs;d:fredsUdfsmathUdfs.jar For native UDFs, a library module ( For Java UDFs which utilize native libraries via JNI, the directory
location of the native libraries ( JAVA_UDF_NATIVE_LIBRARY_PATH d:fredsUdfNativeLibs There is a secondary option of when to create the JVM. The JVM could be created when the InterBase server starts up (accepted), or alternatively, it could be created upon invocation of the first Java UDF (rejected). Which choice is taken would affect the design under JDK 1.1 because of threading issues. So an ancillary design issue is addressed here. Design note: In JDK 1.1, the main thread which created the JVM must be maintained for the life of the embedding application, and only this main thread may destroy the JVM (thereby releasing JVM resources). Therefore, in JDK 1.1, a transient query thread cannot be used to create the JVM, as would be tempting to do if the JVM is created on the first invocation of a Java UDF during SQL execution. If the JVM has not yet been created, then the first transient query thread to invoke a Java UDF must yield to the dedicated main thread to create the JVM. This main thread must also destroy the JVM at server shutdown time. If the JVM is already started, any transient query thread may "attach" itself to the JVM before invoking it, and "detach" itself from the JVM before being returned to the internal pool of InterBase query threads. These design requirements have changed in the JDK 1.2 version of the JNI, in which any thread may destroy the JVM. The idea of loading the JVM upon Java UDF invocation, rather than server startup has been rejected. Here's a qoute from Mark Duquette which best explains why:
Functional Requirements For Multiple Connection-Wide JVMs (Rejected Alternative)This alternative is academic, being that it is actually not possible given the current JVM implementations, and would probably not be a desirable alternative even if it were possible, but it is included here for completeness. Alternatively to a single server-wide JVM, separate JVMs could be created for each connection which requests a JVM. This gives control over the configuration of the JVM to the user connection and does not require server restart for a new JVM configuration to accomodate some new connection. The JNI provides a mechanism for creating multiple JVMs to facilitate thread isolation in multi-threaded programming environments. One simple way to allocate JVMs is to create a dedicated JVM for each connection which needs Java UDF support. In this case, a JVM may be created and configured when a connection which requests Java UDF support is established to a database. A connection requesting a JVM may specify a server-side classpath, as well as a server-side native library path if necessary. Other ways of distributing multiple JVMs are possible, such as one JVM per query (way too costly), but one JVM per requesting connection is probably the most logical if one opted for multiple JVMs. Multiple connection-wide JVMs could be configured in the same way as a
single server-wide JVM is configured via the isc_dpb_load_java_virtual_machine
isc_dpb_java_udf_classpath
isc_dpb_java_udf_native_library_path
SQL support would also need to be surfaced by extending the syntax of the
CONNECT "employee.gdb" LOAD_JAVA_VIRTUAL_MACHINE JAVA_UDF_CLASSPATH "d:java_udfs"; Alternatively, we could eliminate the need for
Because each JVM maintains its own object memory, using multiple JVMs would present some difficulties if static class variables were modified by a UDF. Because of this and the amount of resources that would be required by multiple JVMs, a single server-wide JVM is undoubtedly our best option under the super-server model. In fact, I asked a JavaSoft JNI engineer the following question to get an idea of the intended usage of multiple JVMs:
Here's a further comment giving another reason for rejecting this design alternative:
A System Table For Java UDFs Rather than introduce a new system table for Java UDFs (eg.
Table Exception HandlingTesting has shown that the Java VM will crash with a segmentation violation upon UDF invocation if the Java UDF Declaration and the Java Method Declaration signatures do not match. Exceptions occuring from within the JVM will be trapped by the engine, then an appropriate error message will be logged, and the server will exit gracefully. Unlike native UDFs, Java UDF exceptions include both abnormal terminations of the Java VM, and normal Java exceptions thrown from within the UDF Java method itself. So the engine will trap both normal Java exceptions thrown from a Java method, as well as abnormal terminations of the Java VM. Furthermore, the server should not exit for a Java exception, as it does for an abnormal termination such as a segmentation violation. Rather, the server should log a message for the Java exception (by way of the status vector) and abort the associated query, but not exit. Design note: The implementation could leverage the work done for UDF exception handling in which the server does not terminate. This is not currently in force for 6.0 since it's unsafe to continue the server after a segmentation violation. Deploying The Java RuntimeIn order for end users to use Java UDFs, they'll need to have a Java runtime environment installed on their server. The Java 2 SDK software can serve as a runtime environment. However, we shouldn't assume all users have the Java 2 SDK software installed, and the Java 2 SDK software license doesn't allow us to redistribute SDK software files. To solve this problem, Sun provides the Java 2 runtime environment as a free, redistributable runtime environment, available for Win32 and Solaris systems. By distributing the JRE with InterBase, we can ensure that customers will have the correct version of the Java platform for running our software. The Java Runtime Environment (JRE) is the minimum standard Java platform
for running applications written in the Java programming language. It contains
the Java virtual machine, Java core classes, and supporting files. The JRE does
not contain any of the development tools (such as The Win32 version comes with a built-in installation program suitable for end-users. Solaris versions require the developer to provide installation support. This means the InterBase install could invoke the Sun JRE installation exe for Win32 if desired, but must install the JRE files manually on Solaris. The Java 2 runtime environment for Win32 is available both with and without international support. The non-international version is much smaller, but is suitable only for English-speaking users. We also must make sure that our installation procedure never overwrites an existing JRE installation, unless the existing runtime environment is an older version. The Win32 installation program records program information in the Windows Registry. This registry information includes the software version, which we will need to compare with the Java 2 runtime environment version compatible with our InterBase software. One approach is to install the Java 2 runtime environment files manually
into our own InterBase directory or any other directory specified by the
installer. If we choose this approach, we must redistribute the JRE in its
entirety except for some optional files which we may choose not to
redistribute. The files that are optional are listed in the JRE
The Java 2 runtime environment includes In the case of the Win32 Java 2 runtime environment, the native C runtime
library,
Although InterBase already distributes this file, it is stated here for the record that this file should be included in redistributions of the Win32 version of the Java 2 runtime environment. MetaData Extract Utility For Java UDFsFor each Java UDF declared to the database, extract out DECLARE EXTERNAL JAVA FUNCTION udf-name
[ java-udf-datatype .,..]
[ RETURNS { java-udf-datatype
| PARAMETER argument-position } ]
CLASS "class-name"
METHOD "method-name";
java-udf-datatype ::=
JSTRING (maximum-character-length)
| NUMERIC(p,s) | NUMERIC(p) | DECIMAL(p,s) | DECIMAL(p)
| DATE | TIME | TIMESTAMP
| BLOB
| DOUBLE PRECISION
| INTEGER
| SMALLINT
Standard Java UDF LibraryA custom Java UDF library comparable to our FreeUDF library, or Gregory Deatz' or MER System's native UDF libraries is not really necessary because most of these functions already exist in the core Java class libraries. One of the advantages of providing support for Java UDFs is that the core Java class libraries already provide a wealth of built-in methods ready for use. This also means it is unnecessary to port the standard InterBase native UDF library. However, a standard set of Blob UDFs, especially for converting String data to and from Blobs, would be a useful add-on Java UDF library. Linking With Unknown Java Virtual Machines The ability to embed a JVM from within a native application such as
InterBase requires us to link with a Java virtual machine implementation. How
we link with a Java virtual machine depends on whether we intend to deploy with
only one particular virtual machine implementation or a variety of virtual
machine implementations from different vendors. Because the JNI does not
specify the name of the native library that implements a Java virtual machine,
we should be prepared to work with Java virtual machine implementations that
are shipped under different names. In general, different vendors may name their
virtual machine implementations differently. For example, on Win32, Sun's
virtual machine is shipped as The solution is to use programmatic run-time dynamic linking to load the
particular virtual machine library specified in the JAVA_VIRTUAL_MACHINE_LIBRARY "c:\jdk1.2\jre\bin\classic\jvm.dll" Design Note: Linking in this way, we would not need to make explicit
JNI function calls from within the InterBase engine code, and we would
therefore not need to link the engine with // Return a function pointer to the JNI function
// "AttachCurrentThread" in a variable JVM library.
void *findAttachCurrentThread (char *jvmLibrary)
{
HINSTANCE hVM = LoadLibrary (jvmLibrary);
if (hVM == NULL) return NULL;
return GetProcAddress (hVM, "AttachCurrentThread");
}
The Solaris version is: // Return a function pointer to the JNI function
// "AttachCurrentThread" in a variable JVM library.
void *findAttachCurrentThread (char* jvmLibrary)
{
void *libVM = dlopen (jvmLibrary, RTLD_LAZY);
if (libVM == NULL) return NULL;
return dlsym (libVM, "AttachCurrentThread");
}
Requirements and Constraints The Java analog to the native UDF module name ( Like native UDFs, the invocation of a Java UDF will release the engine thread lock by performing a thread-exit before transferring control to the Java runtime (invoking the UDF). When the Java UDF returns, a thread-enter will be performed to regain the thread lock on the engine. The JVM port must provide native Java thread support for the deployed platform. Therefore initial support for Java UDFs will be for Win32 only. Here's a quote from Sun's JNI FAQ (http://java.sun.com/products/jdk/faq/jnifaq.html ):
This non-native Java thread implementation was known as green threads. But further information is now to be found at the new Java 2 JNI FAQ ( http://java.sun.com/products/jdk/faq/jni-j2sdk-faq.html#nativethreads ):
This will need to be tested directly on Solaris for confirmation. Note that
we must embed the native threads VM since green threads and native
threads don't mix, and of course InterBase already links with Because of significant differences in the JNI API between Java 1 and Java 2, only Java 2 and above will be supported. The JVM port must support the Java 2 JNI interfaces. The blr and dyn generation for Java UDFs is deferred as a design consideration. Thread-safety of Java UDFs is up to the author of the Java UDF class library. However, this is a relatively easy task in Java. Performance of Java UDFs will be inferior to that of native UDFs because of the necessary internal conversions from native InterBase datatypes to Java objects. Migration Issues Although Open Issues
Syntax ConventionsThe syntax diagram conventions mostly follow BNF, with a few variations to enhance readability. Here is a description of the general rules for specifying syntax in this extended BNF. Please be aware that BNF is a high-level specification syntax, and is not a low-level LALR(1) syntax as used by parser generators such as YACC. The LALR(1) syntax for Java UDFs is deferred as a design consideration. These rules are taken from the book "SQL Instant Reference" by Martin Gruber, Sybex Publishing.
Reference Documents
|