diff options
author | Sven Gothel <[email protected]> | 2023-06-16 02:16:20 +0200 |
---|---|---|
committer | Sven Gothel <[email protected]> | 2023-06-16 02:16:20 +0200 |
commit | 8b127c4c1dd26fcb1756805ddb83729203161f78 (patch) | |
tree | f8563a0e39d293bc070ef01e457cfe08ee44096d /doc/GlueGen_Mapping.md | |
parent | aeadfab9572e4b441b1bc1f0708cf4c72dfe181e (diff) |
GlueGen Struct [5]: Revised Struct Mapping + Documentation
GlueGen Revised Struct Mapping (esp pointer to array or single element), Struct String Charset, .. and Documentation
- Documentation:
- Added README.md
Let's have a proper face for the git repo
- Added doc/GlueGen_Mapping.md (and its html conversion doc/GlueGen_Mapping.html)
Created a new document covering application and implementation details suitable for users/devs.
- Added doc/JogAmpMacOSVersions.md conversion to doc/JogAmpMacOSVersions.html
- Updated www/index.html
- Use *CodeUnit instead of PrintWriter, representing a Java or C code unit covering a set of functions and structs.
The CCodeUnit also handles common code shared by its unit across functions etc.
- Dropping 'static initializer', as its no more required
due to simplified `JVMUtil_NewDirectByteBufferCopy()` variant.
- Revised Struct Mapping:
- Pure Java implementation to map primitive and struct fields within a struct
by utilizing ElementBuffer.
Only 'Function Pointer' fields within a struct require native code.
Exposes `static boolean usesNativeCode()` to query whether native code is used/required.
- Transparent native memory address API
Expose `long getDirectBufferAddress()` and `static TK_Struct derefPointer(long addr)`,
allowing to
- pass the native struct-pointer with native code
- reconstruct the struct from a native struct-pointer
- have a fully functional `TK_Struct.derefPointer(struct.getDirectBufferAddress())` cycle.
- Add 'boolean is<Val>Null() to query whether a pointer (array) is NULL
- *Changed* array get/set method for more flexibility alike `System.arraycopy(src, srcPos, dest, destPos, len)`,
where 'src' is being dropped for the getter and 'dest' is being dropped for the setter
as both objects are reflected by the struct instance.
- *Changed* `get<Val>ArrayLength()` -> `get<Val>ElemCount()` for clarity
- Considering all ConstElemCount values with config 'ReturnedArrayLength <int>'
to be owned by native code -> NativeOwnership -> Not changing the underlying memory region!
JavaOwnership is considered for all pointer-arrays not of NativeOwnership.
Hence any setter on a NativeOwnership pointer-array will fail with non-matching elem-count.
- Add 'release<Val>()' for JavaOwnership pointer-arrays,
allowing to release the Java owned native memory incl. null-ing pointer and set<Val>ElemCount(0).
- Support setter for 'const <type>*' w/ JavaOwnership, i.e. pointer to const value of a primitive or struct,
setter and getter using pointer to array or single element in general.
- Added Config `ImmutableAccess symbol` to disable all setter for whole struct or a field
- Added Config `MaxOneElement symbol` to restrict a pointer to maximum one element and unset
initial value (zero elements)
- Added Config `ReturnsStringOnly symbol` to restrict mapping only to a Java String,
dropping the ByteBuffer variant for 'char'
- String mapping default is UTF-8 and can be read and set via [get|set]Charset(..) per class.
- Dynamic string length retrieval in case no `ReturnedArrayLength` has been configured
has changed from `strlen()` to `strnlen(aptr, max_len)` to be on the safe site.
The maximum length default is 8192 bytes and can be read and set via [get|set]MaxStrnlen(..) per class.
FIXME: strnlen(..) using EOS byte non-functional for non 8-bit codecs like UTF-8, US-ASCII.
This is due to e.g. UTF-16 doesn't use an EOS byte, but interprets it as part of a code point.
- TODO: Perhaps a few more unit tests
- TODO: Allow plain 'int' to be mapped in structs IFF their size is same for all MachineDescriptions used.
Currently this is the case -> 4 bytes like int32_t.
Diffstat (limited to 'doc/GlueGen_Mapping.md')
-rw-r--r-- | doc/GlueGen_Mapping.md | 402 |
1 files changed, 402 insertions, 0 deletions
diff --git a/doc/GlueGen_Mapping.md b/doc/GlueGen_Mapping.md new file mode 100644 index 0000000..dcdb51a --- /dev/null +++ b/doc/GlueGen_Mapping.md @@ -0,0 +1,402 @@ +<style> +table, th, td { + border: 1px solid black; +} +</style> + +# GlueGen Native Data & Function Mapping for Java™ + +## References + +- [GlueGen Git Repo](https://jogamp.org/cgit/gluegen.git/about/) +- [GlueGen Java™ API-Doc](https://jogamp.org/deployment/jogamp-next/javadoc/gluegen/javadoc/) +- [GlueGen Manual](https://jogamp.org/gluegen/doc/manual/) +- [GlueGen Project Page](https://jogamp.org/gluegen/www/) +- [How To Build](https://jogamp.org/gluegen/doc/HowToBuild.html) + +## Overview +[GlueGen](https://jogamp.org/gluegen/www/) is a compiler for function and data-structure declarations, +generating Java and JNI C code offline at compile time +and allows using native libraries within your Java application. + +It reads ANSI C header files +and separate configuration files which provide control over many +aspects of the glue code generation. GlueGen uses a complete ANSI C +parser and an internal representation (IR) capable of representing all +C types to represent the APIs for which it generates interfaces. It +has the ability to perform significant transformations on the IR +before glue code emission. + +GlueGen can produce native foreign function bindings to Java as well as +map native data structures to be fully accessible from Java including +potential calls to embedded function pointer. + +GlueGen is also capable to bind even low-level APIs such as the Java Native Interface (JNI) and +the AWT Native Interface (JAWT) back up to the Java programming language. + +GlueGen utilizes [JCPP](https://jogamp.org/cgit/jcpp.git/about/), migrated C preprocessor written in Java. + +GlueGen is used for the [JogAmp](https://jogamp.org) projects +[JOAL](https://jogamp.org/cgit/joal.git/about/), +[JOGL](https://jogamp.org/cgit/jogl.git/about/) and +[JOCL](https://jogamp.org/cgit/jocl.git/). + +GlueGen is part of [the JogAmp project](https://jogamp.org). + +## Primitive Mapping + +Gluegen has build-in types (terminal symbols) for: + +| type | java-bits | native-bits <br> x32 | native bits <br> x64 | type | signed | origin | +|:-----|:----------|:-----------------------|:---------------------|:--------|:-------|:-------| +| void | 0 | 0 | 0 | void | void | ANSI-C | +| char | 8 | 8 | 8 | integer | any | ANSI-C | +| short | 16 | 16 | 16 | integer | any | ANSI-C | +| int | 32 | 32 | 32 | integer | any | ANSI-C | +| long | 64 | 32 | **32**† | integer | any | ANSI-C - Windows | +| long | 64 | 32 | **64** | integer | any | ANSI-C - Unix | +| float | 32 | 32 | 32 | float | signed | ANSI-C | +| double | 64 | 64 | 64 | double | signed | ANSI-C | +| \_\_int32 | 32 | 32 | 32 | integer | any | windows | +| \_\_int64 | 64 | 64 | 64 | integer | any | windows | +| int8\_t | 8 | 8 | 8 | integer | signed | stdint.h | +| uint8\_t | 8 | 8 | 8 | integer | unsigned | stdint.h | +| int16\_t | 16 | 16 | 16 | integer | signed | stdint.h | +| uint16\_t | 16 | 16 | 16 | integer | unsigned | stdint.h | +| int32\_t | 32 | 32 | 32 | integer | signed | stdint.h | +| uint32\_t | 32 | 32 | 32 | integer | unsigned | stdint.h | +| int64\_t | 64 | 64 | 64 | integer | signed | stdint.h | +| uint64\_t | 64 | 64 | 64 | integer | unsigned | stdint.h | +| intptr\_t | 64 | 32 | 64 | integer | signed | stdint.h | +| uintptr\_t | 64 | 32 | 64 | integer | unsigned | stdint.h | +| ptrdiff\_t | 64 | 32 | 64 | integer | signed | stddef.h | +| size\_t | 64 | 32 | 64 | integer | unsigned | stddef.h | +| wchar\_t | 32 | 32 | 32 | integer | signed | stddef.h | + +**Warning:** Try to avoid unspecified bit sized types, especially **long**, since it differs on Unix and Windows! +**Notes:** + +* † Type **long** will result in broken code on Windows, since we don't differentiate the OS and it's bit size is ambiguous. +* Anonymous void-pointer _void\*_ are mapped to NIO _Buffer_. +* Pointers to pointer-size types like _intptr\_t\*_, _uintptr\_t\*_, _ptrdiff\_t\*_ and _size\_t\*_ are mapped to _PointerBuffer_, to reflect the architecture depending storage size. + +### String Mapping + +#### Function return String values +Function return values are currently mapped from `char*` to Java String using *UTF-8* +via JNI function +> `jstring NewStringUTF(JNIEnv *env, const char *bytes)` + +*FIXME*: This might need more flexibility in case UTF-8 is not suitable for 8-bit wide `char` mappings +or wide characters, e.g. for UTF-16 needs to be supported. + +#### Function argument String values +Function argument values are either mapped from `char*` to Java String using *UTF-8* +via JNI function +> `const char * GetStringUTFChars(JNIEnv *env, jstring string, jboolean *isCopy)`. + +Alternatively, if a 16-bit wide *character* type has been detected, i.e. *short*, +the native *character* are mapped to Java using *UTF-16* +via JNI function +> `void GetStringRegion(JNIEnv *env, jstring str, jsize start, jsize len, jchar *buf)`. + + +#### Struct String mapping + +String value mapping for `Struct` fields is performed solely from the Java side using *Charset* and is hence most flexible. + +By default, *UTF-8* is being used for getter and setter of String values. +The *Struct* class provides two methods to get and set the used *Charset* for conversion +``` + /** Returns the Charset for this class's String mapping, default is StandardCharsets.UTF_8. */ + public static Charset getCharset() { return _charset; }; + + /** Sets the Charset for this class's String mapping, default is StandardCharsets.UTF_8. */ + public static void setCharset(Charset cs) { _charset = cs; } + +``` + +In case the String length has not been configured via `ReturnedArrayLength`, +it will be dynamically calculated via `strnlen(aptr, max_len)`. +The maximum length default for the `strnlen(..)` operation is 8192 bytes and can be get and set using: +``` + /** Returns the maximum number of bytes to read to determine native string length using `strnlen(..)`, default is 8192. */ + public static int getMaxStrnlen() { return _max_strnlen; }; + + /** Sets the maximum number of bytes to read to determine native string length using `strnlen(..)`, default is 8192. */ + public static void setMaxStrnlen(int v) { _max_strnlen = v; } +``` +*FIXME*: This only works reliable using an 8-bit Charset encoding, e.g. the default *UTF-8*. + +### Alignment for Compound Data + +In general, depending on CPU and it's configuration (OS), alignment is set up for each type (char, short, int, long, ..). + +Compounds (structures) are aligned naturally, i.e. their inner components are aligned +and are itself aligned to it's largest element. + +See: + +* [Wikipedia Data Structure Alignment](http://en.wikipedia.org/wiki/Data_structure_alignment) +* [Wikipedia Data Structure Alignment - Padding](http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding) +* [Viva64 Data Alignment](http://www.viva64.com/en/l/0021/) +* [Apple: Darwin 64bit Porting - Data Type Size & Alignment](http://developer.apple.com/library/mac/#documentation/Darwin/Conceptual/64bitPorting/transition/transition.html#//apple_ref/doc/uid/TP40001064-CH207-SW1) + +#### Simple alignment arithmetic + +Modulo operation, where the 2nd handles the case offset == alignment: + +> padding = ( alignment - ( offset % alignment ) ) % alignment ; +> aligned\_offset = offset + padding ; + +Optimization utilizing alignment as a multiple of 2 `-> x % 2n == x & ( 2n - 1 )` + +> remainder = offset & ( alignment - 1 ) ; +> padding = ( remainder > 0 ) ? alignment - remainder : 0 ; +> aligned\_offset = offset + padding ; + +Without branching, using the 2nd modulo operation for the case offset == alignment: + +> padding = ( alignment - ( offset & ( alignment - 1 ) ) ) & ( alignment - 1 ) ; +> aligned\_offset = offset + padding ; + +See `com.jogamp.gluegen.cgram.types.SizeThunk.align(..)`. + +#### Type Size & Alignment for x86, x86\_64, armv6l-32bit-eabi and Window(mingw/mingw64) + +Runtime query is implemented as follows: +``` + typedef struct { + char fill; // nibble one byte + // padding to align s1: padding_0 + type_t s1; // + } test_struct_type_t; + + padding_0 = sizeof(test_struct_type_t) - sizeof(type_t) - sizeof(char) ; + alignmentOf(type_t) = sizeof(test_struct_type_t) - sizeof(type_t) ; +``` + +| type | size <br> *32 bit* | alignment <br> *32 bit* | size <br> *64 bit* | alignment <br> *64 bit* | +|:------------|:-------------------|:------------------------|:-------------------|:------------------------| +| char | 1 | 1 | 1 | 1 | +| short | 2 | 2 | 2 | 2 | +| int | 4 | 4 | 4 | 4 | +| float | 4 | 4 | 4 | 4 | +| long | 4 | 4 | 8†,4∗ | 8†,4∗ | +| pointer | 4 | 4 | 8 | 8 | +| long long | 8 | 4†,8∗+ | 8 | 8 | +| double | 8 | 4†,8∗+ | 8 | 8 | +| long double | 12†∗,8+,16\- | 4†∗,8+,16\- | 16 | 16 | + +† Linux, Darwin ++armv7l-eabi +\- MacOsX-32bit-gcc4 +∗ Windows + +## Struct Mapping +A *Struct* is a C compound type declaration, which can be mapped to a Java class. + +A *Struct* may utilize the following data types for its fields +* *Primitive*, i.e. *char*, *int32_t*, ... + * See [*Primitive Mapping*](#primitive-mapping) above. + * See [*String Mapping*](#string-mapping) above. +* *Struct*, i.e. another compound variable +* *Function Pointer*, a *typedef*'ed and set callable function pointer + +A field may be a direct aggregation, i.e. instance, within the struct including an array +or a reference to a single element or array via a pointer. + +Both, *primitive* and *struct* field type mappings only produce pure Java code, utilizing the *GlueGen Runtime*. +Hence no additional native code must be compiled nor a resulting additional library loaded to use the mapping. + +Only when mapping *function-pointer* within *structs*, additional native glue-code is produced to +call the underlying native function which has to be compiled and its library loaded. + +The generated method `public static boolean usesNativeCode()` can be used to validate +whether the produced Java class requires a corresponding library for additional native code. + +### GlueGen Struct Settings + +#### **ImmutableAccess** *symbol* +Immutable access can be set for a whole struct or a single field of a struct. + +Immutable access will simply suppress generating setters in the Java code +and hence also reduces the footprint of the generated Java class for such struct. + +* `ImmutableAccess TK_Struct` + + Immutable access for the whole struct `TK_Struct + + Sets pseudo-code flag *ImmutableAccess*, see below. + +* `ImmutableAccess TK_Struct.val` + + Immutable access for the single field `val` within struct `TK_Struct` + + Sets pseudo-code flag *ImmutableAccess*, see below. + +#### **MaxOneElement** *symbol* +* `MaxOneElement TK_Struct.val` + + Sets field pointer `val` + to point to a array with a maximum of one element and unset initial value (zero elements). + + Sets pseudo-code flag *MaxOneElement*, see below. + +#### **ReturnedArrayLength** *symbol* *expression* +* `ReturnedArrayLength TK_Struct.val 3` + + Sets field pointer `val` to point to a array with three elements. + + Sets pseudo-code flag *ConstElemCount*, see below. + + Having set *ConstElemCount* also implies *native ownership* for a *Pointer* referenced *native* memory. + +* `ReturnedArrayLength TK_Struct.val 1` + + Sets field pointer `val` to point to a array with one element. + + Sets pseudo-code flags *ConstElemCount* and *MaxOneElement*, see below. + + Having set *ConstElemCount* also implies *native ownership* for a *Pointer* referenced *native* memory. + +* `ReturnedArrayLength TK_Struct.val getValElements()` + + Sets field pointer `val` to point to a array with a variable length as described by the + field `valElements` retrievable via its getter `getValElements()`. + + Sets pseudo-code flag *VariaElemCount*, see below. + +#### **ReturnsString** *symbol* +A direct C code `char` array or indirect array via pointer can be interpreted as a Java `String`. + +* `ReturnsString TK_Struct.name` + + Sets field char-array or char-pointer `name` to be additionally interpreted as a Java `String`. + Besides the `byte[]` and `ByteBuffer` getter and setter variants, a `String` variant will be added. + + Sets pseudo-code flags *String*, see below. + + See [*String Mapping*](#string-mapping) above. + +#### **ReturnsStringOnly** *symbol* + +* `ReturnsStringOnly TK_Struct.name` + + Sets field char-array or char-pointer `name` to be exclusively interpreted as a Java `String`. + Instead of the `byte[]` and `ByteBuffer` getter and setter variants, a `String` variant will be produced. + + Sets pseudo-code flags *StringOnly*, see below. + + See [*String Mapping*](#string-mapping) above. + +### Struct Mapping Notes + +* *ConstElemCount* via **ReturnedArrayLength \<int\>** implies *native ownership* for a *Pointer* referenced *native* memory + if the expression is constant. Otherwise the *native* memory has *java ownership*. + See [ReturnedArrayLength Setting](#returnedarraylength-symbol-expression) above. + +* To release native memory with *java ownership*, i.e. a native ByteBuffer, `releaseVal()` can be used. + +* To shrink a *Pointer* & *VariaElemCount* pointer-array elemCount size with *java ownership* , + the memory must be cleared with `releaseVal()` first. This is due to `setVal(src, srcPos, destPos, len)` + reusing the existing memory in case `destPos + len < elemCount`. + +### Struct Java Signature Table + +Please find below signature table as generated by the *C Declaration* including its *C Modifier*, +e.g. `const` for constant, `[const]` for const and non-const and `empty` for non-const (variable). + +Further, the *GlueGen Setting* (see above) impacts the code generation as well. + +Below table demonstrates *primitive* types being mapped within a `struct` named `TK_Struct`. +A similar mapping is produced for `struct` types, i.e. *compounds*. + + +| C Mod | C Declaration | Java Setter | Java Getter | GlueGen Setting | Ownership | Remarks | +|:--------|:--------------|:-----------------------------------------------------------------|:--------------------------------------|:-----------------------------------------------|:----------|:----------| +| | | | static boolean usesNativeCode() | | | Java, static, <br> *true* if using native code | +| | | | static int size() | | | Java, static, <br> native size in bytes | +| | | | static TK_Struct create() | | | Java, static ctor | +| | | | static TK_Struct create(ByteBuffer) | | | Java, static ctor <br> w/ existing ByteBuffer | +| | | | static TK_Struct derefPointer(long addr) | | | Java, static ctor <br> dereferencing ByteBuffer <br> at native address of size() | +| | | | ByteBuffer getBuffer() | | | Java, <br> underlying ByteBuffer | +| | | | long getDirectBufferAddress() | | | Java, native address <br> of underlying getBuffer() | +| | int32_t val | setVal(int v) | int getVal() | | Static | | +| const | int32_t val | *none* | int getVal() | | Static | Read only | +| | int32_t val | *none* | int getVal() | **ImmutableAccess** | Static | Read only | +| [const] | int32_t* val | setVal(int v) <br> releaseVal() | int getVal() <br> boolean isValNull() <br> int getValElemCount() | **MaxOneElement** | Java | Starts w/ null elements,<br>max 1 element | +| const | int32_t* val | *none* | int getVal() <br> boolean isValNull() <br> static int getValElemCount() | **ReturnedArrayLength 1** | Native | Const element count 1 | +| | int32_t* val | setVal(int v) | int getVal() <br> boolean isValNull() <br> static int getValElemCount() | **ReturnedArrayLength 1** | Native | Const element count 1 | | +| | int32_t val[3]| setVal(int[] src, int srcPos, int destPos, int len) | IntBuffer getVal() <br> int[] getVal(int srcPos, int[] dest, int destPos, int len) | | Static | | +| const | int32_t val[3]| *none* | IntBuffer getVal() <br> int[] getVal(int srcPos, int[] dest, int destPos, int len) | | Static | Read only | +| const | int32_t* val | *none* | IntBuffer getVal() <br> int[] getVal(int srcPos, int[] dest, int destPos, int len) <br> boolean isValNull() <br> static int getValElemCount() | **ReturnedArrayLength 3** | Native | Read only <br> Const element count 3 | +| | int32_t* val | setVal(int[] src, int srcPos, int destPos, int len) | IntBuffer getVal() <br> int[] getVal(int srcPos, int[] dest, int destPos, int len) <br> boolean isValNull() <br> static int getValElemCount() | **ReturnedArrayLength 3** | Native | Const element count 3 | +| [const] | int32_t* val | setVal(int[] src, int srcPos, int destPos, int len) <br> releaseVal() | IntBuffer getVal() <br> int[] getVal(int srcPos, int[] dest, int destPos, int len) <br> boolean isValNull() <br> int getValElemCount() | | Java | Starts w/ null elements | +| [const] | int32_t* val | setVal(int[] src, int srcPos, int destPos, int len) <br> releaseVal() | IntBuffer getVal() <br> int[] getVal(int srcPos, int[] dest, int destPos, int len) <br> boolean isValNull() | **ReturnedArrayLength getValCount()** | *Ambiguous* | Variable element count<br>using field *valCount*,<br>which has getter and setter | +| [const] | char* name | setName(String srcVal) <br> releaseVal() | String getName() <br> boolean isNameNull() <br> int getNameElemCount() | **ReturnsStringOnly** | Java | String only, w/ EOS | +| [const] | char* name | setName(String srcVal) <br> setName(byte[] src, int srcPos, int destPos, int len) <br> releaseVal() | String getNameAsString() <br> ByteBuffer getName() <br> boolean isNameNull() <br> int getNameElemCount() | **ReturnsString** | Java | String and byte access, w/ EOS| + +### Struct Setter Pseudo-Code + +* *ImmutableAccess*: Drops setter, immutable +* *Pointer* & *ConstValue* & *ConstElemCount*: Drops setter, native ownership on const-value +* *Array* & *ConstValue* : Drops setter, const-value array +* *Primitive* + * Single aggregated instance + * Store value within *native* memory + * *Array* | *Pointer* + * *MaxOneElement* + * *Pointer* + * *ConstValue*: Allocate new memory and store value + * *VariaValue*: + * *ConstElemCount*: Reuse *native* memory and store value with matching *elemCount 1*, otherwise Exception + * *VariaElemCount*: Reuse *native* memory and store value with matching *elemCount 1*, otherwise allocates new memory (had *elemCount 0*) + * *Array* & *VariaValue*: Reuse *native* memory and store value (has const *elemCount 1*) + * *else*: *SKIP* setter for const single-primitive array + * *AnyElementCount* + * *String* & *isByteBuffer* & *Pointer* + * *ConstElemCount*: Reuse *native* memory and store UTF-8 bytes with EOS with matching *elemCount*, otherwise Exception + * *StringOnly*: End, no more setter for this field, otherwise continue + * *VariaElemCount*: Allocate new *native* memory and store UTF-8 bytes with EOS + * *StringOnly*: End, no more setter for this field, otherwise continue + * *ConstValue* + * *Pointer* + * *VariaElemCount*: Allocates new *native* memory and store value + * *else*: *SKIP* setter for const primitive array + * *Array* | *ConstElemCount*: Reuse *native* memory and store value with <= *elemCount*, otherwise Exception + * *Pointer* & *VariaElemCount*: Reuse *native* memory and store value with <= *elemCount*, otherwise allocate new *native* memory +* *Struct* ... + + +## Platform Header Files + +GlueGen provides convenient platform headers, +which can be included in your C header files for native compilation and GlueGen code generation. + +Example: +``` + #include <gluegen_stdint.h> + #include <gluegen_stddef.h> + + uint64_t test64; + size_t size1; + ptrdiff_t ptr1; +``` + +To compile this file you have to include the following folder to your compilers system includes, ie `-I`: +``` + gluegen/make/stub_includes/platform +``` + +To generate code for this file you have to include the following folder to your GlueGen `includeRefid` element: +``` + gluegen/make/stub_includes/gluegen +``` + +## Pre-Defined Macros + +To identity a GlueGen code generation run, GlueGen defines the following macros: +``` + #define __GLUEGEN__ 2 +``` |