The benchmark is bench_expression_agg: 5,000 rows, 500 iterations
of an aggregate with an arithmetic expression in the SELECT list. Measured as
wall-clock latency through the PostgreSQL wire protocol.
CREATE TABLE sales (
id INT, region TEXT, product TEXT,
quantity INT, amount NUMERIC(10,2),
price NUMERIC(10,2)
);
-- 5,000 rows inserted
SELECT category, SUM(price * quantity)
FROM sales
GROUP BY category;
-- repeated 500 times
The expression price * quantity inside SUM() must
be evaluated for every row before aggregation. The legacy executor evaluates
this per-row: for each of 5,000 rows, call eval_expr(), which
dispatches on expression type, reads two cells, performs the multiplication,
and writes the result to a temporary cell. This is 5,000 function calls with
type dispatch per iteration, 2,500,000 across 500 iterations.
The per-row evaluator eval_expr() handles arbitrary expression
trees: nested function calls, CASE expressions, type coercions, NULL
propagation. For the common case of a simple binary operation on two columns,
this generality is overhead. The function call, type switch, cell read, cell
write, and NULL check dominate the actual arithmetic.
The plan builder recognizes simple arithmetic expressions and emits
VEC_PROJECT / VEC_FUNC_* nodes in the plan tree
(plan.c). These operate directly on col_block arrays:
VEC_FUNC_MUL_I32,
VEC_FUNC_MUL_F64 — multiply two
contiguous column arrays in a tight loop with no per-element dispatch.VEC_FUNC_ADD_*,
VEC_FUNC_SUB_*,
VEC_FUNC_DIV_* — same pattern for
other arithmetic operations.VEC_FUNC_CEIL,
VEC_FUNC_FLOOR,
VEC_FUNC_SQRT,
VEC_FUNC_SIGN_*,
VEC_FUNC_MOD_*,
VEC_FUNC_POWER — unary and binary
math functions vectorized.VEC_FUNC_CAST_F64_TO_I32,
VEC_FUNC_COALESCE_LIT — type casts
and COALESCE with literal defaults.
The vectorized path reads contiguous int32_t[] or
double[] arrays from the column block, applies the operation in
a single loop, and writes the result to a new column block. NULL handling is
done via a parallel uint8_t[] null bitmap, checked once per
block rather than per element.
Same query, same machine:
| mskql | PostgreSQL | DuckDB | |
|---|---|---|---|
| expression_agg (500 iter, 5K rows) | 15ms | 299ms | 220ms |
| vs PostgreSQL ratio | 0.05× (mskql 20× faster) | ||
| vs DuckDB ratio | 0.07× (mskql 15× faster) | ||
The VEC_PROJECT path is used automatically by the plan builder when it recognizes eligible expressions. Queries with complex expression trees (nested CASE, multi-level function composition) fall back to the per-row evaluator. The plan builder makes this decision at build time with zero runtime overhead for either path.