From dba77ef988b43a4b7e86526d2d413e8f3da6f923 Mon Sep 17 00:00:00 2001 From: veclav talica Date: Fri, 16 Feb 2024 15:13:16 +0500 Subject: [PATCH] firefox-wasm-tail-call-benchmark --- .../firefox-wasm-tail-call-benchmark/page.mmd | 58 +++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 articles/firefox-wasm-tail-call-benchmark/page.mmd diff --git a/articles/firefox-wasm-tail-call-benchmark/page.mmd b/articles/firefox-wasm-tail-call-benchmark/page.mmd new file mode 100644 index 0000000..5a98ebf --- /dev/null +++ b/articles/firefox-wasm-tail-call-benchmark/page.mmd @@ -0,0 +1,58 @@ +Title: Testing Firefox Wasm Tail Call +Brief: Or why assumptions are not always correct. +Date: 1708076705 +Tags: Wasm, Interpreters +CSS: /style.css + +### Lore ### + +Interpreting comes at a cost, the more you nest - the more complex things become. +It's especially true on the web, where any user program already sits on layers and layers +of interfaces. It gets pretty funny, I can't even run ZX Spectrum emulator written in JavaScript with more than few frames a second. + +A lot of software targeting the web has their own languages and interpreters (such as Godot and GDScript) and in realtime simulation intensive cases overheads do matter. + +One of things that is often suggested for solving interpreter performance is `tail calling`. +And it works emperically on native platforms. ![Check this post](https://mort.coffee/home/fast-interpreters/). + +And so I wondered, could it work for Wasm platform? Firefox recently ![pushed support](https://bugzilla.mozilla.org/show_bug.cgi?id=1846789) for ![experimental spec](https://github.com/WebAssembly/tail-call/blob/main/proposals/tail-call/Overview.md) of it, after all. + +### Results ### + +I based the test interpreter on `fast-interpreters` post linked above. +Sources are available on ![github](https://github.com/quantumedbox/wasm-tail-call-interpreter-benchmark). It does nothing, but increments until 100000000, +which is relevant case for nothing, but instruction decoding, which we are testing here. + +First, native: +``` +time ./jump-table + +real 0m3,094s +user 0m3,082s +sys 0m0,012s + +time ./tail-call + +real 0m2,491s +user 0m2,485s +sys 0m0,005s + +``` + +Run time decrease of `19.3%`! Formidable. + +But with web it's more interesting: +``` +tail-call.wasm (cold): 10874ms - timer ended + +jump-table.wasm (cold): 6610ms - timer ended + +``` + +Tail calls are actually slower in this case (by `39.2%`), which I'm not sure about why yet. +Intuition proven wrong, - but me testing it first proven useful :) + +Note: I'm running it on amd64 cpu, stable Firefox 122.0, compiled with Zig's Clang version 16. + +Seems like JIT complation on the web is the way to go, to fold everything to Wasm bytecode. +But overall with plain jump-table overheads are *mere* 113.6%, which I would say isn't critical for a lot of cases, especially if interpreter is intended mostly as an interface adapter, which is the case with GDScript.