firefox-wasm-tail-call-benchmark

2024-02-16 15:13:16 +05:00
parent 0d401e1274
commit dba77ef988
1 changed files with 58 additions and 0 deletions
--- a/articles/firefox-wasm-tail-call-benchmark/page.mmd
+++ b/articles/firefox-wasm-tail-call-benchmark/page.mmd
@@ -0,0 +1,58 @@
 Title:  Testing Firefox Wasm Tail Call
 Brief:  Or why assumptions are not always correct.
 Date:   1708076705
 Tags:   Wasm, Interpreters
 CSS:    /style.css
 ### Lore ###
 Interpreting comes at a cost, the more you nest - the more complex things become.
 It's especially true on the web, where any user program already sits on layers and layers
 of interfaces. It gets pretty funny, I can't even run ZX Spectrum emulator written in JavaScript with more than few frames a second.
 A lot of software targeting the web has their own languages and interpreters (such as Godot and GDScript) and in realtime simulation intensive cases overheads do matter.
 One of things that is often suggested for solving interpreter performance is `tail calling`.
 And it works emperically on native platforms. ![Check this post](https://mort.coffee/home/fast-interpreters/).
 And so I wondered, could it work for Wasm platform? Firefox recently ![pushed support](https://bugzilla.mozilla.org/show_bug.cgi?id=1846789) for ![experimental spec](https://github.com/WebAssembly/tail-call/blob/main/proposals/tail-call/Overview.md) of it, after all.
 ### Results ###
 I based the test interpreter on `fast-interpreters` post linked above.
 Sources are available on ![github](https://github.com/quantumedbox/wasm-tail-call-interpreter-benchmark). It does nothing, but increments until 100000000,
 which is relevant case for nothing, but instruction decoding, which we are testing here.
 First, native:
 ```
 time ./jump-table
 real    0m3,094s
 user    0m3,082s
 sys     0m0,012s
 time ./tail-call
 real    0m2,491s
 user    0m2,485s
 sys     0m0,005s
 ```
 Run time decrease of `19.3%`! Formidable.
 But with web it's more interesting:
 ```
 tail-call.wasm (cold): 10874ms - timer ended
 jump-table.wasm (cold): 6610ms - timer ended
 ```
 Tail calls are actually slower in this case (by `39.2%`), which I'm not sure about why yet.
 Intuition proven wrong, - but me testing it first proven useful :)
 Note: I'm running it on amd64 cpu, stable Firefox 122.0, compiled with Zig's Clang version 16.
 Seems like JIT complation on the web is the way to go, to fold everything to Wasm bytecode.
 But overall with plain jump-table overheads are *mere* 113.6%, which I would say isn't critical for a lot of cases, especially if interpreter is intended mostly as an interface adapter, which is the case with GDScript.