GlueGen is a tool which automatically generates the Java and JNI code necessary to call C libraries. It reads as input ANSI C header files and separate configuration files which provide control over many aspects of the glue code generation. GlueGen uses a complete ANSI C parser and an internal representation (IR) capable of representing all C types to represent the APIs for which it generates interfaces. It has the ability to perform significant transformations on the IR before glue code emission. GlueGen is currently powerful enough to bind even low-level APIs such as the Java Native Interface (JNI) and the AWT Native Interface (JAWT) back up to the Java programming language.
GlueGen is currently used to generate the JOGL interface to the OpenGL 3D graphics API and the JOAL interface to the OpenAL audio library. In the case of JOGL, GlueGen is used not only to bind OpenGL to Java, but also the low-level windowing system APIs on the Windows, X11 and Mac OS X platforms. The implementation of the JOGL library is thereby written in the Java programming language rather than in C, which has offered considerable advantages during the development of the library.
GlueGen is designed in modular form and can be extended to alter the glue code emission style or to generate interface code for other languages than Java.
This manual describes how to use GlueGen to bind new C libraries to the Java programming language.
GlueGen supports two basic styles of glue code generation: everything in one class, or a separate interface and implementing class. The first mode, "AllStatic", exposes the underlying C functions as a set of static Java methods in a concrete class. This is a straightforward binding mechanism, but has the disadvantage of tying users to a concrete class (which may or may not be a problem) and makes it more difficult to support certain kinds of call-through-function-pointer semantics required by certain C APIs. The second mode, "InterfaceAndImpl", exposes the C functions as methods in an interface and emits the implementation of that interface into a separate class and package. The implementing class is not intended to be in the public API; this more strongly separates the user from the implementation of the API. Additionally, because it is necessary to hold an instance of the implementing class in order to access the underlying C routines, it is easier to support situations where call-through-function-pointer semantics must be followed, in particular where those function pointers might change from instance to instance.
The generated glue code follows some basic rules in binding C APIs to Java:
int*
,
float*
) are bound to java.nio Buffer subclasses
(IntBuffer
, FloatBuffer
) and optionally
to Java arrays (int[]
, float[]
).
void*
are bound to java.nio.Buffer.
void*
argument will allow either a direct or non-direct java.nio
Buffer to be passed as argument. If the NioDirectOnly directive is specified,
however, only a direct Buffer will be accepted.
void*
return values apply to
those for pointers to typed primitives.
char*
may be bound to
java.lang.String using the ArgumentIsString or ReturnsString directives.
#define
statements in header files mapping names to
constant values are exposed as public static final constant
values in either the generated interface or AllStatic class.
JNIEnv*
and jobject
. The tool
understands that the JNIEnv*
argument is implicit
and that jobject
maps to java.lang.Object at the
Java programming language level. While this is most useful when
binding JDK-internal APIs such as the JAWT to Java, there may
be other JNI libraries which expose C functions taking these
data types, and GlueGen can very easily bind to them.
This section provides motivation for the design of the GlueGen tool and is not necessary to understand how to use the tool.
There are many tools available for assisting in the autogeneration of foreign function interfaces for various high-level languages. Only a few examples include Header2Scheme, an early tool allowing binding of a limited subset of C++ to the Scheme programming language; SWIG, a tool released at roughly the same time as Header2Scheme which by now supports binding C and C++ libraries to a variety of scripting languages; JNIWrapper, a commercial tool automating the binding of C APIs to Java; and NoodleGlue, a recently-released tool automating the binding of C++ APIs to Java. Other language-specific tools such as Perl's XS, Boost.Python and many others exist.
GlueGen was designed with a few key principles in mind. The most fundamental was to support binding of the lowest-level APIs on a given platform up to the Java programming language. The intended goal, in the context of the JOGL project, was to allow subsets of the Win32 and X11 APIs to be exposed to Java, and to use those APIs to write the behind-the-scenes OpenGL context creation and management code in Java instead of C. This informed several other design goals:
GlueGen was designed with the Java programming language in mind, but is not necessarily restricted to generating glue code for the Java language. The tool is divided into separate parse and code generation phases, and the internal representation is fairly easy to iterate over. The core driver of GlueGen may therefore be useful in producing other tools which autogenerate foreign function interfaces to C libraries for other languages.
cvs -d :pserver:guest@cvs.dev.java.net:/cvs co gluegenTo build GlueGen, first download the ANTLR jar file from antlr.org. Currently GlueGen is only compatible with ANTLR releases up to 2.7.x and does not work with ANTLR 3.x. (NOTE: do not place the ANTLR jar file in the Extensions directory of the JRE or JDK, as this will cause the build to fail.) Next, copy
make/gluegen.properties
from the GlueGen
workspace to your home directory (pointed to by the Java system
property user.home
; on Windows this is
e.g. C:\Documents and Settings\username
). Edit the copy
of gluegen.properties in your home directory to point the
antlr.jar
property to the location of the ANTLR jar file
on your local disk. Finally, cd to the make/
subdirectory
and type "ant". Ant 1.6.x or later is required.
GlueGen can be run either as an executable jar file (java -jar
gluegen.jar
; note that antlr.jar must be in the same directory
as gluegen.jar in order for this invocation to work) or from within
Ant as described in the following section. When run from the command
line, GlueGen accepts four kinds of command-line arguments:
#include
directives. Unlike most C preprocessors,
however, GlueGen has no default include path, so it is typically
necessary to supply at least one -I
option on the
command line in order to handle any #include
directives in the file being parsed.
com.sun.gluegen.GlueEmitter
interface. If this
option is not specified, a
com.sun.gluegen.JavaEmitter
will be used by default.
GlueGen can also be invoked as a subtask within Ant. In order to do so, a path element should be defined as follows:
<path id="gluegen.classpath"> <pathelement location="${gluegen.jar}" /> <pathelement location="${antlr.jar}" /> </path>where the
gluegen.jar
and antlr.jar
properties point to the respective jar files. A taskdef defining the
GlueGen task should then be specified as follows:
<taskdef name="gluegen" classname="com.sun.gluegen.ant.GlueGenTask" classpathref="gluegen.classpath" />At this point GlueGen may be invoked as follows:
<gluegen src="[header to parse]" config="[configuration file]" includeRefid="[dirset for include path]" emitter="com.sun.gluegen.JavaEmitter"> <classpath refid="gluegen.classpath" /> </gluegen>Please see the JOGL and JOAL build.xml files for concrete, though non-trivial, examples of how to invoke GlueGen via Ant.
GlueGen contains and uses a minimal C preprocessor called the "Pseudo
C Pre-Processor", or PCPP. A slightly specialized C preprocessor is
required for correct glue code generation with most libraries.
Constant values intended for use by end users are defined in many C
libraries' headers using #define
s rather than constant
int declarations, and if the header is processed by a full C
preprocessor then the #define statements will be stripped become
unavailable for processing by the glue code generator.
PCPP is largely an invisible part of the glue code generation process; however, it has certain limitations which make it difficult to parse certain header files. First, it does not support macro evaluation in any form, so if a header relies on macro evaluation in order to generate code, PCPP will fail. It is possible that PCPP may fail silently in this situation, causing GlueGen to simply not produce code for the associated constructs. If GlueGen's output is not as expected and there is heavy use of the C preprocessor in the header, run PCPP against the header directly (PCPP takes simply the -I and filename arguments accepted by GlueGen) and examine the output.
Second, PCPP contains only limited support for #if
clauses. Generally speaking, its handling of #if defined(foo) ||
defined(bar)
constructs is limited to approximately what is
required to handle the OpenGL header files. If the header being parsed
relies on moderately complicated expressions being evaluated by the C
preprocessor, check the output from PCPP and ensure it is as expected.
Contributions to PCPP would be especially welcome. It would be very desirable to turn it into a full-blown C preprocessor with simply the option of passing through #define statements unchanged.
Error reporting by GlueGen's parser is currently less than ideal.
Because PCPP makes #include
directives disappear
completely with respect to the C parser (it appears that the
#line
directives it emits are not being consumed properly
-- an area which needs more investigation), the line numbers reported
in parse failures are incorrect in all but the simplest cases. This
makes it difficult to determine in exactly what header file and on
exactly what construct the C parser failed.
Fortunately, there is a relatively simple workaround. PCPP can be run
with all of the same -I arguments passed to GlueGen and the result
piped to a new .c file. GlueGen can then be invoked on that .c file
(now containing no #include
directives) and the line
numbers on any parse failures will be correct.
As much as is possible, GlueGen is intended to operate on unmodified C
header files, so that it is easy to upgrade the given C API being
bound to Java simply by dropping in a new set of header files.
However, most C headers contain references to standard headers like
stdio.h
, and if this header is parsed by GlueGen, the
tool will automatically attempt to generate Java entry points for such
routines as fread
and fwrite
, among others.
It is impractical to exclude these APIs on a case by case basis.
Therefore, the suggested technique to avoid polluting the binding with
these APIs is to "stub out" the headers.
GlueGen searches the include path for headers in the order the include
directories were specified to the tool. Placing another directory in
front of the one in which the bulk of the headers are found allows,
for example, an alternative stdio.h
to be inserted which
contains few or no declarations but which satisfies the need of the
dependent header to find such a file.
GlueGen uses a complete ANSI and GNU C parser written by John Mitchell and Monty Zukowski from the set of grammars available for the ANTLR tool by Terrence Parr. As a complete C parser, this grammar requires all data types encountered during the parse to be fully defined. Often a particular header will be included by another one in order to pick up data type declarations rather than API declarations. Stubbing out the header with a smaller one providing a "fake" type declaration is a useful technique for avoiding the binding of unnecessary APIs during the glue code process.
Here's an example from the JOGL glue code generation process. The
glext.h
header defining OpenGL extensions references
stddef.h
in order to pick up the ptrdiff_t
data type. We choose to not include the real stddef.h but instead to
swap in a stub header. The contents of this header are therefore as
follows:
#if defined(_WIN64) typedef __int64 ptrdiff_t; #elif defined(__ia64__) || defined(__x86_64__) typedef long int ptrdiff_t; #else typedef int ptrdiff_t; #endifThis causes the ptrdiff_t data type to be defined appropriately for the current architecture. It will be referenced during the glue code generation and cause a Java value of the appropriate type (int or long) to be used to represent it.
This is not the best example because it involves a data type which
changes size between 32- and 64-bit platforms, and there are otner
considerations to take into account in these situations (see the
section 32- and 64-bit considerations). Here's
another example, again from the JOGL source tree. JOGL binds the AWT
Native Interface, or JAWT, up to the Java programming language so that
the low-level code which binds OpenGL contexts to Windows device
contexts may be written in Java. The JDK's jawt_md.h
on
the Windows platform includes windows.h
to pick up the
definitions of data types such as HWND
(window handle)
and HDC
(handle to device context). However, it is
undesirable to try to parse the real windows.h
just to
pick up these typedefs; not only does this header contain thousands of
unneeded APIs, but it also uses certain macro constructs not supported
by GlueGen's minimal C preprocessor. To avoid
these problems, a "stub" windows.h
header is placed in
GlueGen's include path containing only the necessary typedefs:
typedef struct _handle* HANDLE; typedef HANDLE HDC; typedef HANDLE HWND;Note that it is essential that the type being specified to GlueGen is compatible at least in semantics with the real definition of the HANDLE typedef in the real
windows.h
, so that during
compilation of GlueGen's autogenerated C code, when the real
windows.h
is referenced by the C compiler, the
autogenerated code will compile correctly.
This example is not really complete as it also requires consideration of the size of data types on 32- and 64-bit platforms as well as a discussion of how certain opaque data types are described to GlueGen and exposed in its autogenerated APIs. Nonetheless, it illustrates at a basic level why using a stub header is necessary and useful in certain situations.
When binding C functions to the Java programming language, it is important that the resulting Java code support execution on a 64-bit platform if the associated native methods are compiled appropriately. In other words, the public Java API should not change if the underlying C data types change to another data model such as LP64 (in which longs and pointers become 64-bit).
GlueGen internally maintains two descriptions of the underlying C data model: one for 32-bit architectures and one for 64-bit architectures. These machine descriptions are used when deciding the mapping between integral C types such as int and long and the corresponding Java types, as well as when laying out C structs for access by the Java language. For each autogenerated C struct accessor, both a 32-bit and 64-bit variant are generated behind the scenes, ensuring that the resulting Java code will run correctly on both 32-bit and 64-bit architectures.
When generating the main class containing the bulk of the method
bindings, GlueGen uses the 64-bit machine description to map C data
types to Java data types. This ensures that the resulting code will
run properly on 64-bit platforms. Note that it also generally means
that C long
s will be mapped to Java long
s,
since an LP64 data model is assumed.
If Opaque directives are used to cause a given C integer or pointer data type to be mapped directly to a Java primitive type, care should be taken to make sure that the Java primitive type is wide enough to hold all of the data even on 64-bit platforms. Even if the data type is defined in the header file as being only a 32-bit C integer, if there is a chance that on a 64-bit platform the same header may define the data type as a 64-bit C integer or long, the Opaque directive should map the C type to a Java long.
Complex header files may contain declarations for certain data types that are either too complex for GlueGen to handle or unnecessarily complex from the standpoint of glue code generation. In these situations a stub header may be used to declare a suitably compatible typedef for the data type. An Opaque directive can be used to map the resulting typedef to a Java primitive type if it is undesirable to expose it as a full-blown Java wrapper class.
GlueGen hashes all typedefs internally down to their underlying primitive type. (This is probably not really correct according to the C type system, but is correct enough from a glue code generation standpoint, where if the types are compatible they are considered equivalent.) This means that if the parser encounters
typedef void* LPVOID;then an Opaque directive stating
Opaque long LPVOIDwill cause all
void*
or LPVOID
arguments in
the API to be mapped to Java longs, which is almost never desirable.
Unfortunately, it is not currently possible to distinguish between the
LPVOID typedef and the underlying void*
data type in this
situation.
A similar problem occurs for other data types for which Opaque
directives may be desired. For example, a Windows HANDLE equates to a
typedef to void*
, but performing this typedef in a stub
header and then adding the Opaque directive
Opaque long HANDLEwill cause all void* arguments to be exposed as Java longs instead of Buffers, which is again undesirable. Attempting to work around the problem by typedef'ing HANDLE to an integral type, as in:
typedef long HANDLE;may itself have problems, because GlueGen will assume the two integral types are compatible and not perform any intermediate casts between HANDLE and jlong in the autogenerated C code. (When casting between a pointer type and a JNI integral type such as jlong in C code, GlueGen automatically inserts casts to convert the pointer first to an "intptr_t" and then to the appropriate JNI type, in order to silence compiler warnings and/or errors.)
What is desired is to produce a new type name distinct from all others but still compatible with the pointer semantics of the original type. Then an Opaque directive can be used to map the new type name to, for example, a Java long.
To implement this in the context of the HANDLE example, the following typedef may be inserted into the stub header:
typedef struct _handle* HANDLE;This uses a pointer to an anonymous struct name to produce a new pointer type. This is legal ANSI C and is supported by GlueGen's parser without having seen a declaration for "struct _handle". Subsequently, an Opaque directive can be used to map the HANDLE data type to a Java long:
Opaque long HANDLENow HANDLEs are exposed to Java as longs as desired. A similar technique is used to expose XIDs on the X11 platform as Java longs.
Certain configuration file directives allow the insertion of Java or C code at various places in the generated glue code, to both eliminate the need to hand-edit the generated glue code as well as to minimize the hand-writing of glue code, which sidesteps the GlueGen process. In some situations the inserted code may reference incoming arguments to compute some value or perform some operation. Examples of directives supporting this substitution include ReturnValueCapacity and ReturnedArrayLength.
The expressions in these directives may contain Java MessageFormat
expressions like {0}
which refer to the incoming argument
names to the function. {0}
refers to the first incoming
argument.
Strongly-typed C primitive pointers such as int*
, which
ordinarily expand to overloaded Java methods taking
e.g. int[]
as well as IntBuffer
, present a
problem. The expansion to int[] arr
also generates an
int arr_offset
argument to be able to pass a pointer into
the middle of the array down to C. To allow the same MessageFormat
expression to be used for both cases, the subsitution that occurs when
such a primitive array is referenced is the string arr,
arr_offset
; in other words, the subtituted string contains a
comma. This construct may be used in the following way: the code being
manually inserted may itself contain a method call taking
e.g. {3}
(the incoming argument index of the primitive
array or buffer). The user should supply two overloaded versions of
this method, one taking a strongly-typed Buffer and one taking e.g. an
int[] arr
and int arr_offset
argument. The
implementation of RangeCheck
s for primitive arrays and
strongly-typed buffers uses this construct.
It should be noted that in the autogenerated C code the offset argument is expressed in bytes while at the Java level it is expressed in elements. Most uses of GlueGen will probably not have to refer to the primitive array arguments in C code so this slight confusion should be minor.
In addition to the C headers, GlueGen requires a certain amount of metadata in the form of configuration files in order to produce its glue code. There are three basic reasons for this: first, GlueGen must be informed into which Java classes the C methods are to be bound; second, there are many configuration options for the generated glue code, and passing them all on the command line is infeasible; and third, there are ambiguities in many constructs in the C programming language which must be resolved before a Java binding can be produced.
The contents of the configuration file are dependent on the class of emitter specified to GlueGen. Currently there are three built-in emitter classes: JavaEmitter, which produces a basic, static Java binding of C functions; ProcAddressEmitter, which extends JavaEmitter by calling the underlying C functions through function pointers, resulting in more dynamic behavior and supporting C APIs with optional functionality; and GLEmitter, which specializes ProcAddressEmitter to support some OpenGL-specific constructs. The GLEmitter will be ignored in this manual as it is specialized for JOGL and provides very little additional functionality beyond the ProcAddressEmitter. The JavaEmitter and ProcAddressEmitter support many options in their configuration files. As the ProcAddressEmitter is a subclass of JavaEmitter, all of the constructs in the JavaEmitter's configuration files are also legal in the ProcAddressEmitter's configuration files.
The configuration files have a very simple line-by-line structure, and
are parsed by a very rudimentary, hand-written parser. Each
non-whitespace and non-comment line (note: comment lines begin with
'#') contains a directive like Package
,
Style
or JavaClass
followed by arguments to
that directive. There are a certain set of directives that are
required for any code generation; others are optional and their
omission results in some default behavior. Directives are
case-insensitive.
The following is an exhaustive list of the options currently supported by each of these emitters' configuration files. It is difficult to see exactly how to use the tool based simply on these descriptions, so the examples may be more helpful in seeing exactly how to structure a configuration file for proper glue code generation.
Note that only a very few of the following directives are specified as being "required" rather than "optional"; these indicate the minimal directives needed for a valid configuration file to begin to get glue code to be produced. In general, these are Package, ImplPackage, JavaClass, ImplJavaClass, and Style. Other directives such as NioDirectOnly are required in some circumstances for the glue code to be correct, and some such as ReturnedArrayLength, ReturnValueCapacity, and ReturnValueLength should be specified in some situations in order for certain return values to be useful at the Java level.
The following directives are specified in alphabetical order, although this is not necessarily the best semantic order.
AccessControl [method name] [ PUBLIC | PROTECTED |
PRIVATE | PACKAGE_PRIVATE ]
ArgumentIsString [function name]
[indices...]
where the first argument index is 0 char*
(or compatible data type) arguments, indicates that
those arguments are semantically null-terminated C strings rather than
arbitrary arrays of bytes. The generated glue code will be modified to
emit those arguments as java.lang.String objects rather than
byte[]
or ByteBuffer
.
ClassJavadoc [class name] [code...]
/**
and trailing */
must be included in the
correct place. Each line of Javadoc is emitted in the order
encountered during parsing of the configuration files.
CustomCCode [code...]
CustomJavaCode [class name] [code...]
EmitStruct [C struct type name]
GlueGenRuntimePackage [package name, like com.sun.gluegen.runtime]
com.sun.gluegen.runtime
(no
quotes). This is useful if you want to bundle the runtime classes in
your application without the possibility of interfering with other
versions elsewhere in the system.
Extends [Java interface name] [interface name to
extend]
HierarchicalNativeOutput true
Ignore [regexp]
IgnoreField [struct type name] [field name]
Implements [Java class name] [interface name to
implement]
ImplJavaClass [class name]
ImplPackage [package name]
Import [package name]
(no trailing semicolon)
Include [filename]
IncludeAs [prefix tokens] [filename]
IncludeAs CustomJavaCode
MyClass MyClass-CustomJavaCode.java
.
JavaClass [class name]
JavaEpilogue [C function name] [code...]
JavaOutputDir [directory name]
JavaPrologue [C function name] [code...]
ManuallyImplement [function name]
NativeOutputDir [directory name]
NioDirectOnly [function name]
void*
, float*
, etc.,
GlueGen will typically generate up to two overloaded Java methods, one
taking a Buffer
or Buffer
subclass such as
FloatBuffer
, and one taking a primitive array such as
float[]
. (In the case of void*
outgoing
arguments, GlueGen produces only one variant taking a Buffer.)
Normally the generated glue code accepts either a "direct" or
non-"direct" buffer (according to the New I/O APIs) as argument.
However, if the semantics of the C function are that it either expects
to hold on to this pointer past the point of the function call, or if
it can block while holding on to the pointer, the
NioDirectOnly
directive must be
specified for this C function in order for the generated glue code to
be correct. Failing to observe this requirement may cause JVM hangs or
crashes.
Opaque [Java primitive data type] [C data
type]
long
s. It is also
useful for forcing certain integral C data types to be exposed as e.g.
long
to Java to ensure 64-bit cleanliness of the
generated glue code. See the examples. The C
data type may be a multiple-level pointer type; for example
Opaque long void**
. Note that it is not currently
supported to make a given data type opaque for just a few functions;
the Opaque directive currently applies to all C functions in the
headers being parsed. This means that sweeping Opaque declarations
like Opaque long void*
will likely have unforseen and
undesirable consequences.
Package [package name]
(no trailing
semicolon) RangeCheck [C function name] [argument number]
[expression]
RangeCheckBytes [C function name] [argument number]
[expression]
RenameJavaMethod [from name] [to name]
RenameJavaType [from name] [to name]
ReturnedArrayLength [C function name]
[expression]
where expression
is a legal Java
expression with MessageFormat specifiers such as "{0}". These
specifiers will be replaced in the generated glue code with the
incoming argument names where the first argument to the method is
numbered 0. See the section on argument
name substitution.XVisualInfo*
, indicates that the returned pointer is
to be treated as an array and specifies the length of the returned
array as a function of the arguments passed to the function. Note that
this directive differs subtly from ReturnValueCapacity and
ReturnValueLength. It is also sometimes most useful in conjunction
with the TemporaryCVariableDeclaration
and TemporaryCVariableAssignment directives.
ReturnsString [function name]
char*
or compatible type actually returns a
null-terminated C string which should be exposed as a
java.lang.String. NOTE: currently does not properly handle the case
where this storage needs to be freed by the end user. In these
situations the data should be returned as a direct ByteBuffer, the
ByteBuffer converted to a String using custom Java code, and the
ByteBuffer freed manually using another function bound to Java.
ReturnValueCapacity [C function name]
[expression]
Buffer
or
subclass wrapping a C primitive pointer such as char*
or
float*
being returned from a C function. Typically
necessary in order to properly use such pointer return results from
Java. As in the ReturnedArrayLength
directive, argument name substitution
is performed on MessageFormat expressions.
ReturnValueLength [C function name]
[expression]
RuntimeExceptionType [class name]
RuntimeException
.
StructPackage [C struct type name] [package
name]
. Package name contains no trailing semicolon. Style [ AllStatic | InterfaceAndImpl |
InterfaceOnly | ImplOnly ]
TemporaryCVariableAssignment [C function name]
[code...]
TemporaryCVariableDeclaration [C function name]
[code...]
Unignore [regexp]
Unimplemented [regexp]
The ProcAddressEmitter is a subclass of the core JavaEmitter which knows how to call C functions through function pointers. In particular, the ProcAddressEmitter detects certain constructs in C header files which imply that the APIs are intended to be called through function pointers, and generates the glue code appropriately to support that.
The ProcAddressEmitter detects pairs of functions and function pointer typedefs in a set of header files. If it finds a matching pair, it converts the glue code emission style for that API to look for the function to call in an autogenerated table called a ProcAddressTable rather than linking the autogenerated JNI code directly to the function. It then changes the calling convention of the underlying native method to pass the function pointer from Java down to C, where the call-through-function-pointer is performed.
The ProcAddressEmitter discovers the function and function pointer
pairs by being informed of the mapping between their names by the
user. In the OpenGL and OpenAL libraries, there are fairly simple
mappings between the functions and function pointers. For example, in
the OpenGL glext.h
header file, one may find the
following pair:
GLAPI void APIENTRY glFogCoordf (GLfloat); ... typedef void (APIENTRYP PFNGLFOGCOORDFPROC) (GLfloat coord);Therefore the mapping rule between the function name and the function pointer typedef for the OpenGL extension header file is "PFN + Uppercase(funcname) + PROC". Similarly, in the OpenAL 1.1 header files, one may find the following pair:
AL_API void AL_APIENTRY alEnable( ALenum capability ); ... typedef void (AL_APIENTRY *LPALENABLE)( ALenum capability );Therefore the mapping rule between the function name and the function pointer typedef for the OpenAL header files is "LP + Uppercase(funcname)".
These are the two principal function pointer-based APIs toward which the GlueGen tool has currently been applied. It may turn out to be that this simple mapping heuristic is insufficient, in which case it will need to be extended in a future version of the GlueGen tool.
Note that it is currently the case that in order for the
ProcAddressEmitter to notice that a given function should be called
through a function pointer, it must see both the function prototype as
well as the function pointer typedef. Some headers, in particular the
OpenAL headers, have their #ifdefs
structured in such a
way that either the declaration or the typedef is visible, but not
both simultaneously. Because the PCPP C
preprocessor GlueGen uses obeys #ifdefs
, it is in a
situation like this that the headers would have to be modified to
allow GlueGen to see both declarations.
The following directives are specified in alphabetical order, although this is not necessarily the best semantic order. The ProcAddressEmitter also accepts all of the directives supported by the JavaEmitter. The required directives are GetProcAddressTableExpr and ProcAddressNameExpr.
EmitProcAddressTable [true | false]
ForceProcAddressGen [function name]
GetProcAddressTableExpr [expression]
GetProcAddressTableExpr _context.getGLProcAddressTable()
.
In the JOAL project, the ProcAddressTables are currently held in a
separate class accessed via static methods, so one of the associated
directives is GetProcAddressTableExpr
ALProcAddressLookup.getALCProcAddressTable()
.
LocalProcAddressCallingConvention [function name] [calling convention string]
ForceProcAddressGen
directive, this
specifies the calling convention of the locally generated function
pointer typedef. This is needed only on Windows and only for APIs
whose calling convention differs from the default __cdecl.
ProcAddressNameExpr [expression]
$UpperCase(arg)
converts the argument to
uppercase. "UpperCase" is case-insensitive.
$LowerCase(arg)
converts the argument to
lowercase. "LowerCase" is case-insensitive.
{0}
represents the name of the function.
PFN
$UPPERCASE({0}) PROC
. The ProcAddressNameExpr for the OpenAL
functions as described at the start of this section is LP
$UPPERCASE({0})
.
ProcAddressTableClassName [class name]
ProcAddressTablePackage [package name] (no
trailing semicolon)
SkipProcAddressGen [function name]
This example shows the simplest possible usage of GlueGen; a single routine taking as arguments and returning only primitive types. The signature of the C function we are interested in binding is
int one_plus(int a);
To bind this function to Java, we only need a configuration file with very basic settings, indicating the style of glue code emission, the package and class into which the glue code will be generated, and the output directories for the Java and native code. The contents of the configuration file are as follows:
Package testfunction Style AllStatic JavaClass TestFunction JavaOutputDir gensrc/java NativeOutputDir gensrc/native
GlueGen can then be invoked with approximately the following command line:
java -cp gluegen.jar:antlr.jar com.sun.gluegen.GlueGen \ -I. -Ecom.sun.gluegen.JavaEmitter -Cfunction.cfg function.h
The resulting Java and native code needs to be compiled, and the
application needs to load the native library for the Java binding
before attempting to invoke the native method by calling
System.load()
or System.loadLibrary()
.
This example shows how C primitive arrays are bound to Java. The header file contains three functions to bind:
float process_data(float* data, int n); void set_global_data(float* data); float process_global_data(int n);
The semantics of process_data
are that it takes in a
pointer to a set of primitive float
values and the number
of elements in the array and performs some operation on them,
returning a floating-point value as the result. Afterward the passed
data is no longer referenced.
set_global_data
, on the other hand, takes a pointer
to the data and stores it persistently in the C code.
process_global_data
then accepts as argument the number
of elements to process from the previously-set global data, performs
this processing and returns a result. The global data may be accessed
again afterward. As an example, these kinds of semantics are used in
certain places in the OpenGL API.
From a Java binding standpoint, process_data
may
accept data stored either inside the Java heap (in the form of a
float[]
or non-direct FloatBuffer
) or
outside the Java heap (in the form of a direct
FloatBuffer
), because it does not access the data after
the function call has completed and therefore would not be affected if
garbage collection moved the data after the function call was
complete. However, set_global_data
can cause the passed
data to be accessed after the function call is complete, if
process_global_data
is called. Therefore the data passed
to set_global_data
may not reside in the Java
garbage-collected heap, but must reside outside the heap in the form
of a direct FloatBuffer
.
It is straightforward to take into account these differences in semantics in the configuration file using the NioDirectOnly directive:
# The semantics of set_global_data imply that # only direct Buffers are legal NioDirectOnly set_global_data
Note the differences in the generated Java-side overloadings for the two functions:
public static void process_data(java.nio.FloatBuffer data, int n) {...} public static void process_data(float[] data, int data_offset, int n) {...} public static void set_global_data(java.nio.FloatBuffer data) {...}
No overloading is produced for set_global_data
taking
a float[]
, as it can not handle data residing in the Java
heap. Further, the generated glue code will verify that any
FloatBuffer
passed to this routine is direct, throwing a
RuntimeException
if not. The type of the exception thrown
in this and other cases may be changed with the RuntimeExceptionType directive.
This example shows how to pass and return C strings. The functions involved are a bit contrived, as nobody would ever need to bind the C library's string handling routines to Java, but they do illustrate situations in which Java strings might need to be passed to C and C strings returned to Java. As an example, both styles of function are present in the OpenGL and OpenAL APIs.
The included source code exposes two functions to Java:
size_t strlen(const char* str); char* strstr(const char* str1, const char* str2);
Note that we might just as easily parse the C standard library's
string.h
header file to pick up these function
declarations. However for the purposes of this example it is easier to
extract just the functions we need.
Note that the function.h header
file contains a typedef for size_t
. This is needed
because GlueGen does not inherently know about this data type. An
equivalent data type for the purposes of this example is
int
, so we choose to tell GlueGen to use that data type
in place of size_t
while generating glue code.
The following directive in the configuration file tells GlueGen
that strlen
takes a string as argument 0 (the first
argument):
ArgumentIsString strlen 0
The following directive tells GlueGen that strstr
takes two strings as its arguments:
ArgumentIsString strstr 0 1
Finally, the following directive tells GlueGen that
strstr
returns a string instead of an array of bytes:
ReturnsString strstr
We also use the CustomCCode directive
to cause the string.h
header file to be #included in the
generated glue code:
CustomCCode /* Include string.h header */ CustomCCode #include <string.h>
Now the bindings of these two functions to Java look as expected:
public static native int strlen(java.lang.String str); public static native java.lang.String strstr(java.lang.String str1, java.lang.String str2);Note that the ReturnsString directive does not currently correctly handle the case where the
char*
returned from C needs to be explicitly freed. As an
example, a binding of the C function strdup
using a
ReturnsString directive would cause a C heap memory leak.
This example shows how memory allocation is handled when binding C to Java. It gives the example of a custom memory allocator being bound to Java; this is a construct that at least at one point was present in OpenGL in the NV_vertex_array_range extension.
The two functions we are exposing to Java are as follows:
void* custom_allocate(int num_bytes); void custom_free(void* data);
The Java-side return type of custom_allocate
will
necessarily be a ByteBuffer
, as that is the only useful
way of interacting with arbitrary memory produced by C. The question
is how to inform the glue code generator of the size of the returned
sequence of memory. The semantics of custom_allocate
are
obvious to the programmer; the incoming num_bytes
argument specifies the amount of returned memory. We tell GlueGen this
fact using the ReturnValueCapacity
directive:
# The length of the returned ByteBuffer from custom_allocate is # specified as the argument ReturnValueCapacity custom_allocate {0}
Note that we name incoming argument 0 with the MessageFormat specifier "{0}" rather than the explicit name of the parameter ("num_bytes") for generality, in case the header file is changed later.
Because custom_free
will only ever receive Buffers
produced by custom_allocate, we use the NioDirectOnly directive to prevent
accidental usage with the wrong kind of Buffer:
# custom_free will only ever receive a direct Buffer NioDirectOnly custom_free
The generated Java APIs for these functions are as follows:
public static java.nio.ByteBuffer custom_allocate(int num_bytes) {...} public static void custom_free(java.nio.Buffer data) {...}
This example shows how GlueGen provides access to C structs and supports both passing them to and returning them from C functions. The header file defines a sample data structure that might describe the bit depth of a given screen:
typedef struct { int redBits; int greenBits; int blueBits; } ScreenInfo;
Two functions are defined which take and return this data type:
ScreenInfo* default_screen_depth(); void set_screen_depth(ScreenInfo* info);
The semantics of default_screen_depth()
are that it
returns a pointer to some static storage which does not need to be
freed, which describes the default screen depth.
set_screen_depth()
is a hypothetical function which would
take a newly-allocated ScreenInfo
and cause the primary
display to switch to the specified bit depth.
The only additional information we need to tell GlueGen, beyond
that in the header file, is how much storage is returned from
default_screen_depth()
. Note the semantic ambiguity,
where it might return a pointer to a single ScreenInfo
or
a pointer to an array of ScreenInfo
s. We tell GlueGen
that the return value is a single value with the ReturnValueCapacity directive,
similarly to the memory allocation example
above:
# Tell GlueGen that default_screen_depth() returns a pointer to a # single ScreenInfo ReturnValueCapacity default_screen_depth sizeof(ScreenInfo)
Note that if default_screen_depth
had returned
newly-allocated storage, it would be up to the user to expose a
free()
function to Java and call it when necessary.
GlueGen automatically generates a Java-side
ScreenInfo
class which supports not only access to any
such objects returned from C, but also allocation of new
ScreenInfo
structs which can be passed (persistently)
down to C. The Java API for the ScreenInfo class looks like this:
public abstract class ScreenInfo { public static ScreenInfo create(); public abstract ScreenInfo redBits(int val); public abstract int redBits(); ... }
The create()
method allocates a new ScreenInfo struct
which may be passed, even persistently, out to C. Its C-heap storage
will be automatically reclaimed when the Java-side ScreenInfo object
is no longer reachable, as it is backed by a direct New I/O
ByteBuffer
. The fields of the struct are exposed as
methods which supply both getters and setters.
This example, taken from JOGL's X11 binding, illustrates how to
return an array of structs from C to Java. The
XGetVisualInfo
function from the X library has the
following signature:
XVisualInfo *XGetVisualInfo( Display* display, long vinfo_mask, XVisualInfo* vinfo_template, int* nitems_return );
Note that the XVisualInfo
data structure itself
contains many elements, including a pointer to the current visual. We
use the following trick in the header file to cause GlueGen to treat
the Display*
in the above signature as well as the
Visual*
in the XVisualInfo
as opaque
pointers:
typedef struct {} Display; typedef struct {} Visual; typedef unsigned long VisualID; typedef struct { Visual *visual; VisualID visualid; int screen; int depth; int c_class; /* C++ */ unsigned long red_mask; unsigned long green_mask; unsigned long blue_mask; int colormap_size; int bits_per_rgb; } XVisualInfo;
XGetVisualInfo
returns all of the available pixel
formats in the form of XVisualInfo
s which match a given
template. display
is the current connection to the X
server. vinfo_mask
indicates which fields from the
template to match against. vinfo_template
is a partially
filled-in XVisualInfo
specifying the characteristics to
match. nitems_return
is a pointer to an integer
indicating how many XVisualInfo
s were returned. The
return value, rather than being a pointer to a single
XVisualInfo
, is a pointer to the start of an array of
XVisualInfo
data structures.
There are two basic steps to being able to return this array
properly to Java using GlueGen. The first is creating a direct
ByteBuffer of the appropriate size in the autogenerated JNI code. The
second is slicing up this ByteBuffer appropriately in order to return
an XVisualInfo[]
at the Java level.
In the autogenerated JNI code, after the call to
XGetVisualInfo
is made, the outgoing
nitems_return
value points to the number of elements in
the returned array, which indicates the size of the direct ByteBuffer
which would need to wrap these elements. However, if we look at the
implementation of one of the generated glue code variants for this
method (specifically, the one taking an int[]
as the
third argument), we can see a problem in trying to access this value
in the C code:
JNIEXPORT jobject JNICALL Java_testfunction_TestFunction_XGetVisualInfo1__Ljava_nio_ByteBuffer_2JLjava_nio_ByteBuffer_2Ljava_lang_Object_2I( JNIEnv *env, jclass _unused, jobject arg0, jlong arg1, jobject arg2, jobject arg3, jint arg3_byte_offset) { ... int * _ptr3 = NULL; ... if (arg3 != NULL) { _ptr3 = (int *) (((char*) (*env)->GetPrimitiveArrayCritical(env, arg3, NULL)) + arg3_byte_offset); } _res = XGetVisualInfo((Display *) _ptr0, (long) arg1, (XVisualInfo *) _ptr2, (int *) _ptr3); if (arg3 != NULL) { (*env)->ReleasePrimitiveArrayCritical(env, arg3, _ptr3, 0); } if (_res == NULL) return NULL; return (*env)->NewDirectByteBuffer(env, _res, ??? What to put here ???); }
Note that at the point of the statement "What to put here?" the
pointer to the storage of the int[]
, _ptr3
,
has already been released via
ReleasePrimitiveArrayCritical
. This means that it may not
be referenced at the point needed in the code.
To solve this problem we use the TemporaryCVariableDeclaration and TemporaryCVariableAssignment directives. We want to declare a persistent integer variable down in the C code and assign the returned array length to that variable before the primitive array is released. While in order to do this we unfortunately need to know something about the structure of the autogenerated JNI code, at least we don't have to hand-edit it afterward. We add the following directives to the configuration file:
# Get returned array's capacity from XGetVisualInfo to be correct TemporaryCVariableDeclaration XGetVisualInfo int count; TemporaryCVariableAssignment XGetVisualInfo count = _ptr3[0];
Now in the autogenerated JNI code the variable "count" will contain the number of elements in the returned array. We can then reference this variable in a ReturnValueCapacity directive:
ReturnValueCapacity XGetVisualInfo count * sizeof(XVisualInfo)
At this point the XGetVisualInfo
binding will return
a Java-side XVisualInfo
object whose backing ByteBuffer
is the correct size. We now have to inform GlueGen that the underlying
ByteBuffer represents not a single XGetVisualInfo
struct,
but an array of them, using the ReturnedArrayLength directive. This
conversion is performed on the Java side of the autogenerated code.
Here, the first element of either the passed IntBuffer
or
int[]
contains the number of elements in the returned
array. (Alternatively, we could examine the length of the ByteBuffer
returned from C to Java and divide by
XVisualInfo.size()
.) Because there are two overloadings
produced by GlueGen for this method, if we reference the
nitems_return
argument in a ReturnedArrayLength directive, we need
to handle not only the differing data types properly
(IntBuffer
vs. int[]
), but also the fact
that both the integer array and its offset value are substituted for
any reference to the fourth argument.
To solve this problem, we define a pair of private helper functions whose purpose is to handle this overloading.
CustomJavaCode TestFunction private static int getFirstElement(IntBuffer buf) { CustomJavaCode TestFunction return buf.get(buf.position()); CustomJavaCode TestFunction } CustomJavaCode TestFunction private static int getFirstElement(int[] arr, CustomJavaCode TestFunction int offset) { CustomJavaCode TestFunction return arr[offset]; CustomJavaCode TestFunction }
Now we can simply write for the returned array length:
ReturnedArrayLength XGetVisualInfo getFirstElement({3})
That's all that is necessary. GlueGen will then produce the following Java-side overloadings for this function:
public static XVisualInfo[] XGetVisualInfo(Display arg0, long arg1, XVisualInfo arg2, java.nio.IntBuffer arg3); public static XVisualInfo[] XGetVisualInfo(Display arg0, long arg1, XVisualInfo arg2, int[] arg3, int arg3_offset);
As it happens, we don't really need the Display and Visual data
structures to be produced; they can be treated as long
s
on the Java side. Therefore we can add the following directives to the
configuration file:
# We don't need the Display and Visual data structures to be # explicitly exposed Opaque long Display * Opaque long Visual * # Ignore the empty Display and Visual data structures (though made # opaque, the references from XVisualInfo and elsewhere are still # traversed) Ignore Display Ignore Visual
The final generated Java API is the following:
public static XVisualInfo[] XGetVisualInfo(long arg0, long arg1, XVisualInfo arg2, java.nio.IntBuffer arg3); public static XVisualInfo[] XGetVisualInfo(long arg0, long arg1, XVisualInfo arg2, int[] arg3, int arg3_offset);
As with the example above, this example is taken from JOGL's X11 binding. Here we show how to expose to Java a C routine returning an array of pointers to a data structure.
The declaration of the function we are binding is as follows:
typedef struct __GLXFBConfigRec *GLXFBConfig; GLXFBConfig *glXChooseFBConfig( Display *dpy, int screen, const int *attribList, int *nitems );
This function is used during allocation of a hardware-accelerated
off-screen surface ("pbuffer") on X11 platforms; its exact meaning is
not important. The semantics of the arguments and return value are as
follows. As in the previous example, it
accepts a connection to the current X display as one argument. The
screen of this display is the second argument. The
attribList
is a zero-terminated list of integer
attributes; because it is zero-terminated, the length of this list is
not passed to the function. As in the previous example, the
nitems
argument points to an integer into which the
number of returned GLXFBConfig
objects is placed. The
return value is an array of GLXFBConfig
objects.
Because the GLXFBConfig
data type is typedefed as a
pointer to an opaque (undefined) struct, the construct
GLXFBConfig*
is implicitly a "pointer-to-pointer" type.
GlueGen automatically assumes this is convertible to a Java-side array
of accessors to structs. The only configuration necessary is to tell
GlueGen the length of this array.
As in the previous example, we use the TemporaryCVariableDeclaration and TemporaryCVariableAssignment directives to capture the length of the returned array:
TemporaryCVariableDeclaration glXChooseFBConfig int count; TemporaryCVariableAssignment glXChooseFBConfig count = _ptr3[0];The structure of the generated glue code for the return value is subtly different than in the previous example. The question in this case is not whether the return value is a pointer to a single object vs. a pointer to an array of objects; it is what the length of the returned array is, since we already know that the return type is pointer-to-pointer and is therefore an array. We use the ReturnValueLength directive for this case:
ReturnValueLength glXChooseFBConfig countWe add similar Opaque directives to the previous example to yield the resulting Java bindings for this function:
public static GLXFBConfig[] glXChooseFBConfig(long dpy, int screen, java.nio.IntBuffer attribList, java.nio.IntBuffer nitems); public static GLXFBConfig[] glXChooseFBConfig(long dpy, int screen, int[] attribList, int attribList_offset, int[] nitems, int nitems_offset);Note that because the GLXFBConfig data type is returned as an element of an array, we can not use the Opaque directive to erase this data type to
long
as we did with the Display
data
type.