CTE materialization: from legacy fallback to plan executor

Pre-materialize non-recursive CTEs as temp tables before plan build.

Setup

The benchmark is bench_mixed_analytical: a CTE pipeline that joins, aggregates, and sorts across a star-schema dataset. 50,000 order rows, 2,000 customers, 200 products. Measured as wall-clock latency through the PostgreSQL wire protocol, 500 iterations.

WITH order_summary AS (
    SELECT o.customer_id,
           SUM(o.quantity * p.price) AS total
    FROM orders o
    JOIN products p ON o.product_id = p.id
    GROUP BY o.customer_id
)
SELECT c.region, COUNT(*), SUM(os.total) AS revenue
FROM order_summary os
JOIN customers c ON os.customer_id = c.id
GROUP BY c.region
ORDER BY revenue DESC;

Problem

The plan executor (plan.c) handles SELECT queries through a columnar pipeline: scan → filter → hash join → hash aggregate → sort → top-N → project. This pipeline is significantly faster than the legacy row-by-row executor in query.c. However, queries with CTEs were falling back to the legacy path entirely.

Cause

The logical plan builder (logical_build) emitted L_SCAN(cte_name) when it encountered a CTE reference in the FROM clause. The plan executor then called db_find_table_sv() to resolve the scan source—but CTE names are not real tables, so the lookup returned NULL. This caused plan_build_select() to return failure, and the entire query fell back to the legacy row-by-row executor.

The result: a query that should have run through the fast columnar pipeline (hash join + hash aggregate + sort on column blocks) instead ran through the slow path (per-row expression evaluation, row-based hash table, row-based sort).

Fix

In plan_build_select() (plan.c), before calling logical_build, the function now iterates over the query’s CTE list. For each non-recursive CTE:

  1. Execute the CTE body through db_exec(), which may itself use the plan executor for the CTE’s inner query.
  2. Materialize the result as a temporary table with the CTE’s name.
  3. Register the temp table in the database so that db_find_table_sv() resolves it.

After the outer query completes, the temp tables are dropped. This approach means logical_build sees real tables for every CTE reference, and the full columnar pipeline handles the outer query’s joins, aggregates, and sorts.

Result

Same query, same machine:

mskqlPostgreSQLDuckDB
mixed_analytical (500 iter)7ms65ms28ms
vs PostgreSQL ratio0.11× (mskql 9× faster)
vs DuckDB ratio0.25× (mskql 4× faster)

The CTE materialization fix moved the entire analytical pipeline into the columnar executor. Combined with the result cache (which caches the wire bytes for repeated identical queries), this workload sustains 97,210 QPS under concurrent throughput testing.