firefox-wasm-tail-call-benchmark

2024-02-16 15:13:16 +05:00
parent 0d401e1274
commit dba77ef988
1 changed files with 58 additions and 0 deletions
--- a/articles/firefox-wasm-tail-call-benchmark/page.mmd
+++ b/articles/firefox-wasm-tail-call-benchmark/page.mmd
@@ -0,0 +1,58 @@
+Title:  Testing Firefox Wasm Tail Call
+Brief:  Or why assumptions are not always correct.
+Date:   1708076705
+Tags:   Wasm, Interpreters
+CSS:    /style.css
+
+### Lore ###
+
+Interpreting comes at a cost, the more you nest - the more complex things become.
+It's especially true on the web, where any user program already sits on layers and layers
+of interfaces. It gets pretty funny, I can't even run ZX Spectrum emulator written in JavaScript with more than few frames a second.
+
+A lot of software targeting the web has their own languages and interpreters (such as Godot and GDScript) and in realtime simulation intensive cases overheads do matter.
+
+One of things that is often suggested for solving interpreter performance is `tail calling`.
+And it works emperically on native platforms. ![Check this post](https://mort.coffee/home/fast-interpreters/).
+
+And so I wondered, could it work for Wasm platform? Firefox recently ![pushed support](https://bugzilla.mozilla.org/show_bug.cgi?id=1846789) for ![experimental spec](https://github.com/WebAssembly/tail-call/blob/main/proposals/tail-call/Overview.md) of it, after all.
+
+### Results ###
+
+I based the test interpreter on `fast-interpreters` post linked above.
+Sources are available on ![github](https://github.com/quantumedbox/wasm-tail-call-interpreter-benchmark). It does nothing, but increments until 100000000,
+which is relevant case for nothing, but instruction decoding, which we are testing here.
+
+First, native:
+```
+time ./jump-table
+
+real    0m3,094s
+user    0m3,082s
+sys     0m0,012s
+
+time ./tail-call
+
+real    0m2,491s
+user    0m2,485s
+sys     0m0,005s
+
+```
+
+Run time decrease of `19.3%`! Formidable.
+
+But with web it's more interesting:
+```
+tail-call.wasm (cold): 10874ms - timer ended
+
+jump-table.wasm (cold): 6610ms - timer ended
+
+```
+
+Tail calls are actually slower in this case (by `39.2%`), which I'm not sure about why yet.
+Intuition proven wrong, - but me testing it first proven useful :)
+
+Note: I'm running it on amd64 cpu, stable Firefox 122.0, compiled with Zig's Clang version 16.
+
+Seems like JIT complation on the web is the way to go, to fold everything to Wasm bytecode.
+But overall with plain jump-table overheads are *mere* 113.6%, which I would say isn't critical for a lot of cases, especially if interpreter is intended mostly as an interface adapter, which is the case with GDScript.