Zero steady-state allocation: the arena model

Bump-slab arena with pointer rewind replaces malloc/free per query.

Setup

The benchmark is bench_aggregate: 5,000 rows, 500 iterations of a GROUP BY query. Measured as wall-clock latency through the PostgreSQL wire protocol.

CREATE TABLE sales (
    id INT, region TEXT, product TEXT,
    quantity INT, amount NUMERIC(10,2)
);
-- 5,000 rows inserted

SELECT region, SUM(amount)
FROM sales
GROUP BY region;
-- repeated 500 times

Problem

The aggregate workload runs 500 iterations of the same query. Each iteration allocates memory for parse trees, executor state, hash table buckets, result rows, and wire buffers. Under concurrent load with 8 threads, the system allocator’s lock becomes a serialization point.

Cause

The traditional execution path called malloc() and free() at every level: per-query for the parse tree, per-row for expression evaluation temporaries, per-group for hash table entries, per-result for wire serialization buffers. A single iteration of the aggregate query on 5,000 rows could issue hundreds of allocation calls. Multiply by 500 iterations and the allocator overhead becomes measurable.

More critically, free() fragments the heap. After thousands of queries, the allocator spends increasing time searching free lists. Under AddressSanitizer (which the test suite runs under), each allocation also incurs quarantine overhead.

Fix

The engine uses a bump-slab arena (arena.h). Each connection owns a query_arena with a scratch region. All per-query allocations—parse nodes, plan nodes, hash table buckets, sort arrays, intermediate column blocks—come from bump_alloc(scratch, size), which advances a pointer and returns the previous position.

At query end, query_arena_reset() rewinds the pointer to zero. No free() calls. No fragmentation. No lock contention. The arena grows by allocating new slabs (64 KB each) only when needed; slabs are retained across queries so steady-state allocation is zero.

Hash tables, sort arrays, and plan nodes all bump-allocate from arena.scratch. The only heap allocations that remain are for table storage (which persists across queries) and the slab chain itself (which grows monotonically and never shrinks).

Result

Same query, same machine:

	mskql	PostgreSQL	Ratio
aggregate (500 iter, 5K rows)	15ms	265ms	0.06×

The arena model is not an optimization applied to a slow path—it is the memory model. Every query, from simple lookups to complex analytical CTEs, allocates from the same bump slab and resets at the same point. The 18× advantage over PostgreSQL on this workload reflects the cumulative effect of removing thousands of malloc/free pairs per query.