diff options
author | Sven Gothel <[email protected]> | 2023-04-05 23:36:25 +0200 |
---|---|---|
committer | Sven Gothel <[email protected]> | 2023-04-05 23:36:25 +0200 |
commit | 10b60e10ece3cbc3e0b8a68ac73229371530e0ba (patch) | |
tree | db89ceda1867ca3a42b45c18ab8f42895877c452 /src/jogl/classes/com/jogamp/gluegen/opengl/GLConfiguration.java | |
parent | 24113f8e3452df8c8bb9e6136fa12bfed3bcc312 (diff) |
Matrix4f Perf: Enhance invert(), Drop (test) load on Matrix4f.mul(Matrix4f) for fair and realistic numbers - Both mul() ops faster than FloatUtil
Enhanced invert() of Matrix4f* and FloatUtil: Use 1f/det factor for burst scale.
Enhanced Matrix4f.invert(..): Use factored-out mulScale() to deliver the scale,
giving a good 10% advantage on aarch64 and amd64.
Brings Matrix4f.invert(..) on par w/ FloatUtil, on aarch64 even a 14% advantage.
+++
TestMatrix4f02MulNOUI added an additional Matrix4f.load() to the mul(Matrix4f) loop test,
which surely is an extra burden and not realistic as the mul(Matrix4f, Matrix4f) and FloatUtil
pendants also don't count loading a value.
Matrix4f.mul(Matrix4f) shall be used to utilize an already stored value anyways.
Matrix4f.mul(Matrix4f) didn't really exist in FloatUtil.
Same is true for Matrix4f.invert(), re-grouped order, i.e. pushing the non-arg variant last.
+++
Revised performance numbers from commit 15e60161787224e85172685f74dc0ac195969b51
AMD64 + OpenJDK17
- FloatUtil.multMatrix(a, a_off, b, b_off, dest) is considerable slower than all
- Matrix4f.mul(a, b) roughly ~10% faster than FloatUtil.multMatrix(a, b, dest)
- Matrix4f.mul(b) roughly ~18% faster than FloatUtil.multMatrix(a, b, dest) (*)
- Matrix4f.invert(a) roughly ~ 2% faster than FloatUtil.invertMatrix(..)
- Matrix4f.invert() roughly ~ 4% slower than FloatUtil.invertMatrix(..) (*)
- Launched: nice -19 scripts/tests-x64.sh
RaspberryPi 4b aarch64 + OpenJDK17
- FloatUtil.multMatrix(a, a_off, b, b_off, dest) is considerable slower than all
- Matrix4f.mul(a, b) roughly ~ 9% faster than FloatUtil.multMatrix(a, b, dest)
- Matrix4f.mul(b) roughly ~14% faster than FloatUtil.multMatrix(a, b, dest) (*)
- Matrix4f.invert(a) roughly ~14% faster than FloatUtil.invertMatrix(..)
- Matrix4f.invert() roughly ~12% faster than FloatUtil.invertMatrix(..) (*)
- Launched: nice -19 scripts/tests-linux-aarch64.sh
(*) not a true comparison in feature, as operating on 'this' matrix values
for one argument, unavailable to FloatUtil.
Conclusion
- Matrix4f.mul(..) is considerable faster!
- Matrix4f.invert(..) faster, esp on aarch64
And additional Matrix4fb tests using float[16] similar to FloatUtil
also demonstrates less performance compared to Matrix4f using
dedicated float fields.
Diffstat (limited to 'src/jogl/classes/com/jogamp/gluegen/opengl/GLConfiguration.java')
0 files changed, 0 insertions, 0 deletions