Matrix4f Perf: Enhance invert(), Drop (test) load on Matrix4f.mul(Matrix4f) for fair and realistic numbers - Both mul() ops faster than FloatUtil

Enhanced invert() of Matrix4f* and FloatUtil: Use 1f/det factor for burst scale. Enhanced Matrix4f.invert(..): Use factored-out mulScale() to deliver the scale, giving a good 10% advantage on aarch64 and amd64. Brings Matrix4f.invert(..) on par w/ FloatUtil, on aarch64 even a 14% advantage. +++ TestMatrix4f02MulNOUI added an additional Matrix4f.load() to the mul(Matrix4f) loop test, which surely is an extra burden and not realistic as the mul(Matrix4f, Matrix4f) and FloatUtil pendants also don't count loading a value. Matrix4f.mul(Matrix4f) shall be used to utilize an already stored value anyways. Matrix4f.mul(Matrix4f) didn't really exist in FloatUtil. Same is true for Matrix4f.invert(), re-grouped order, i.e. pushing the non-arg variant last. +++ Revised performance numbers from commit 15e60161787224e85172685f74dc0ac195969b51 AMD64 + OpenJDK17 - FloatUtil.multMatrix(a, a_off, b, b_off, dest) is considerable slower than all - Matrix4f.mul(a, b) roughly ~10% faster than FloatUtil.multMatrix(a, b, dest) - Matrix4f.mul(b) roughly ~18% faster than FloatUtil.multMatrix(a, b, dest) (*) - Matrix4f.invert(a) roughly ~ 2% faster than FloatUtil.invertMatrix(..) - Matrix4f.invert() roughly ~ 4% slower than FloatUtil.invertMatrix(..) (*) - Launched: nice -19 scripts/tests-x64.sh RaspberryPi 4b aarch64 + OpenJDK17 - FloatUtil.multMatrix(a, a_off, b, b_off, dest) is considerable slower than all - Matrix4f.mul(a, b) roughly ~ 9% faster than FloatUtil.multMatrix(a, b, dest) - Matrix4f.mul(b) roughly ~14% faster than FloatUtil.multMatrix(a, b, dest) (*) - Matrix4f.invert(a) roughly ~14% faster than FloatUtil.invertMatrix(..) - Matrix4f.invert() roughly ~12% faster than FloatUtil.invertMatrix(..) (*) - Launched: nice -19 scripts/tests-linux-aarch64.sh (*) not a true comparison in feature, as operating on 'this' matrix values for one argument, unavailable to FloatUtil. Conclusion - Matrix4f.mul(..) is considerable faster! - Matrix4f.invert(..) faster, esp on aarch64 And additional Matrix4fb tests using float[16] similar to FloatUtil also demonstrates less performance compared to Matrix4f using dedicated float fields.
author: Sven Gothel <[email protected]> 2023-04-05 23:36:25 +0200
committer: Sven Gothel <[email protected]> 2023-04-05 23:36:25 +0200
commit: 10b60e10ece3cbc3e0b8a68ac73229371530e0ba (patch)
tree: db89ceda1867ca3a42b45c18ab8f42895877c452 /make/scripts/tests.sh
parent: 24113f8e3452df8c8bb9e6136fa12bfed3bcc312 (diff)
1 files changed, 2 insertions, 2 deletions
diff --git a/make/scripts/tests.sh b/make/scripts/tests.sh
index 6340a7a1c..59271f7f1 100644
--- a/make/scripts/tests.sh
+++ b/make/scripts/tests.sh
@@ -578,7 +578,7 @@ function testawtswt() {
 #testnoawt com.jogamp.opengl.test.junit.jogl.math.TestFloatUtil03InversionNOUI $*
 #testnoawt com.jogamp.opengl.test.junit.jogl.math.TestMatrix4f01NOUI $*
 #testnoawt com.jogamp.opengl.test.junit.jogl.math.TestMatrix4f02MulNOUI $*
-#testnoawt com.jogamp.opengl.test.junit.jogl.math.TestMatrix4f03InversionNOUI $*
+testnoawt com.jogamp.opengl.test.junit.jogl.math.TestMatrix4f03InversionNOUI $*
 #testnoawt com.jogamp.opengl.test.junit.jogl.math.TestPMVMatrix01NEWT $*
 #testnoawt com.jogamp.opengl.test.junit.jogl.math.TestPMVMatrix02NOUI $*
 #testnoawt com.jogamp.opengl.test.junit.jogl.math.TestPMVMatrix03NOUI $*
@@ -979,7 +979,7 @@ function testawtswt() {
 #testnoawt com.jogamp.opengl.demos.graph.ui.UISceneDemo02 $*
 #testnoawt com.jogamp.opengl.demos.graph.ui.UISceneDemo03 $*
 #testnoawt com.jogamp.opengl.demos.graph.ui.UISceneDemo10 $*
-testnoawt com.jogamp.opengl.demos.graph.ui.UISceneDemo11 $*
+#testnoawt com.jogamp.opengl.demos.graph.ui.UISceneDemo11 $*
 #testnoawt com.jogamp.opengl.demos.graph.ui.UISceneDemo20 $*
 
 #testnoawt com.jogamp.opengl.demos.av.MovieCube $*
author	Sven Gothel <[email protected]>	2023-04-05 23:36:25 +0200
committer	Sven Gothel <[email protected]>	2023-04-05 23:36:25 +0200
commit	10b60e10ece3cbc3e0b8a68ac73229371530e0ba (patch)
tree	db89ceda1867ca3a42b45c18ab8f42895877c452 /make/scripts/tests.sh
parent	24113f8e3452df8c8bb9e6136fa12bfed3bcc312 (diff)