omniLua

Performance,
measured.

omniLua is competitive with reference C, not faster than it, and not LuaJIT. What it is able to say precisely: the cost of memory safety is roughly zero, and the residual gap to C is value representation and dispatch — both recoverable, neither structural to safety.

1.37×
wall-time geomean
dashboard workloads, stock
0.42×
table_ops_long
2.4× faster than C
~0%
safety tax
bounds + RefCell guards
1.72×
peak memory geomean
vs reference C
§ 01

The numbers

Ratio = omniLua ÷ reference C on the same workload; lower is better, parity is 1.00×. The headline below is the latest benchmarked commit by workload; the trajectory charts under it plot every commit — zoom each with its own y-max and window controls (the earliest commits are pre-optimization, so zoom out to see them), hover a point for the commit, click a legend chip to mute a series.

Latest commit, by workload

Wall-time ratio at the most recent benchmarked commit (), fastest to slowest.

faster than C typical slower (≥1.9×) │ parity (1.00×)

Wall-time ratio over commits

Execution time vs reference C, per workload, at every benchmarked commit.

y-max commits

Memory (RSS) ratio over commits

Peak resident-set vs reference C, per workload, at every benchmarked commit.

y-max commits

Provenance. omniLua ÷ reference PUC-Rio Lua 5.4.7, best wall-clock of interleaved ref/omniLua pairs per commit (harness/bench/compare.sh), recorded to the evidence ledger and plotted live above; the standalone dashboard renders the same data with per-commit detail. The earliest commits show large ratios (pre-optimization) — clip them with the y-max control. At the latest benchmarked commit the wall geomean is ≈1.37× across the tracked workloads; three beat C outright — the table-bulk benchmarks, where omniLua's table representation pays off. The slowest rows are GC- and call-heavy (gc_pressure, binarytrees, closure_ops). Perf claims follow docs/MEASUREMENT_PROTOCOL.md.

§ 02

Where the gap is

A common assumption is that safe Rust is slow because of bounds checks and runtime borrow guards. We ablated exactly that. It is not where the time goes.

~0%

Safety tax

Removing bounds checks and RefCell guards recovers roughly zero reliable wall time. Memory safety is not the cost. The unsafe budget stays at zero outside the GC, the dynamic-library loader, and the wasm pointer ABI.

16 vs 8

Representation

Each Lua value is 16 bytes where C packs 8 (NaN-boxing not yet done). That wider value — copied through every register move — is the largest share of the residual ≈2.3× instruction-count gap on the hot path. It is a known, recoverable lever.

dispatch

Dispatch & layout

The rest is interpreter dispatch and code layout — the gap between a match loop and C's computed-goto threading. PGO recovers part of it; the remainder tracks the representation work above.

The headline, stated plainly: the residual gap to C is representation, not safety. That is the difference between a wall the project is stuck behind and a lever it has not yet pulled.

§ 03

What it is not

This is not LuaJIT, and it is not faster than reference C in general — it is a faithful bytecode interpreter that happens to be pure, safe Rust. If you need LuaJIT-class throughput or a decades-mature binding, use mlua. omniLua's argument is reach and safety at a competitive constant factor: the same runtime native and on wasm32, with the cost of that safety measured at roughly nothing.

The benchmark harness measures the omniLua / reference-C ratio (wall + RSS), not absolute throughput — the ratio is the only fair number across machines. Methodology: harness/bench/compare.sh; perf claims follow the frozen-baseline interleaved A/B protocol in docs/MEASUREMENT_PROTOCOL.md.