mskql vs PostgreSQL vs DuckDB

93 batch workloads, 15 throughput workloads, one honest picture

← back to main page


93 batch workloads. 15 concurrent throughput workloads. Three databases on the same machine. Faster than PostgreSQL on every cached batch workload and 13 of 15 throughput tests. Against DuckDB, wins 61 of 93 workloads—aggregates 8×, expression aggregates 10×, correlated subqueries 166× faster than PG. DuckDB wins full scans (4.3×) and sorts (4.7×). PostgreSQL wins mixed read/write (0.94×).

The advantage on cached analytical workloads traces to three architectural properties: (1) zero per-query allocation after warmup—the arena rewinds, nothing is freed; (2) the scan cache converts row-store to flat columnar arrays once per table write, so repeated reads are a memcpy of contiguous memory; (3) result bytes accumulate into a 64 KB buffer before a single write() syscall—on a 5,000-row result, 5 syscalls instead of 5,000. These are structural properties of the system, not tuning parameters. → Design philosophy: why abstraction is not free

⚠ Important caveat. This comparison is unscientific and intended only as a directional indicator. As an in-memory database, this engine has no durability, no MVCC, no WAL, no background workers, and no query optimizer. By contrast, PostgreSQL is a production-grade RDBMS. DuckDB is a columnar analytical engine with SIMD vectorization. The numbers below measure wall-clock time for identical SQL workloads—not raw engine speed. Take them as a directional indicator.

1. Methodology

Batch latency (mskql vs PostgreSQL vs DuckDB)

All three databases are tested on the same Apple M-series laptop, same loopback interface. The benchmark script (bench/bench_vs_pg.py) generates identical SQL workloads for each test, runs setup against all databases, then times the benchmark queries. Each timing measures a single psql -f invocation (mskql, PostgreSQL) or duckdb CLI invocation (DuckDB) containing all iterations of that workload. Port assignment: mskql on 5433, PostgreSQL on 5432 with default configuration, DuckDB in-process via its CLI.

Each workload has two variants: cached (result cache active) and no-cache (_nc suffix—each query has a unique SQL comment appended to bust the result cache). This isolates the effect of mskql’s result cache from raw execution speed.

Concurrent throughput (mskql vs PostgreSQL)

A multi-threaded C benchmark (bench/bench_throughput.c) opens 8 persistent connections speaking raw pgwire and fires queries for 5 seconds per workload. Measures QPS and median latency (p50). DuckDB is not included in throughput tests because it does not expose a wire-protocol server.

Visual Summary — Selected Workloads

Shorter bars = faster. The chart below highlights 10 representative workloads across all three databases.

mskql PostgreSQL DuckDB
Correlated subquery
mskql
22ms
PG
3,665ms
Duck
38ms
Expression aggregate
mskql
33ms
PG
323ms
Duck
249ms
Group + sum
mskql
35ms
PG
284ms
Duck
199ms
Bulk update
mskql
55ms
PG
237ms
Duck
85ms
Analytical CTE
mskql
20ms
PG
78ms
Duck
32ms
Insert + delete
mskql
2,654ms
PG
8,265ms
Duck
15,972ms
Two-table join
mskql
28ms
PG
63ms
Duck
35ms
Full table scan
mskql
294ms
PG
439ms
Duck
68ms
Large sort
mskql
234ms
PG
359ms
Duck
50ms
Window (RANK, 50K)
mskql
113ms
PG
227ms
Duck
43ms

2. Batch Latency — Cached (single client, sequential)

Ratio < 1.0 means mskql was faster. All 28 workloads with PostgreSQL data: every one is an mskql win. Against DuckDB: wins on aggregates, joins, CTEs, subqueries; DuckDB wins full scans, sorts, wide output.

Workload mskql (ms) PG (ms) Duck (ms) vs PG vs Duck Visual
reads & scans
Full table scan (5K rows ×200) 29443968 0.67× 4.32×
mskql
PG
Duck
Filtered scan (5K rows ×500) 35861898 0.58× 3.65×
mskql
PG
Duck
Indexed point lookup (×2K) 10410892 0.96× 1.13×
mskql
PG
Duck
Sort (5K rows ×200) 19936155 0.55× 3.62×
mskql
PG
Duck
Multi-column sort (5K rows ×200) 27151066 0.53× 4.11×
mskql
PG
Duck
Distinct (100 categories ×500) 39265221 0.15× 0.18×
mskql
PG
Duck
writes
Single-row inserts (10K) 2988511,794 0.35× 0.17×
mskql
PG
Duck
Bulk insert (5K rows) 2988511,794 0.35× 0.17×
mskql
PG
Duck
Bulk update (1K of 5K rows ×200) 5523785 0.23× 0.65×
mskql
PG
Duck
Insert + delete (2K rows, delete half ×50) 2,6548,26515,972 0.32× 0.17×
mskql
PG
Duck
Transactional inserts (100 txns, 50 each) 1651921,183 0.86× 0.14×
mskql
PG
Duck
aggregation & analytics
Group + sum (5K rows ×500) 35284199 0.12× 0.18×
mskql
PG
Duck
Expression aggregate (SUM(price×qty) ×500) 33323249 0.10× 0.13×
mskql
PG
Duck
Wide aggregate (7 cols, 50K rows ×20) 22161125 0.14× 0.18×
mskql
PG
Duck
Window function (ROW_NUMBER, 5K ×20) 519227 0.55× 1.89×
mskql
PG
Duck
Window function (RANK, 50K rows ×5) 11322743 0.50× 2.63×
mskql
PG
Duck
Correlated subquery 223,66538 0.01× 0.58×
mskql
PG
Duck
joins, subqueries & complex queries
Two-table join (500 + 2K rows ×50) 286335 0.44× 0.80×
mskql
PG
Duck
Three-table join + group (50K rows ×5) 3410533 0.32× 1.03×
mskql
PG
Duck
Subquery filter (IN subquery ×50) 429830 0.43× 1.40×
mskql
PG
Duck
Subquery pipeline (filter + sort + limit, 50K ×20) 2613721 0.19× 1.24×
mskql
PG
Duck
CTE filter chain (5K rows ×200) 5111837 0.43× 1.38×
mskql
PG
Duck
CTE pipeline (agg → sort → limit, 50K ×20) 587921 0.73× 2.76×
mskql
PG
Duck
Analytical CTE (join→agg→sort, 50K ×5) 207832 0.26× 0.63×
mskql
PG
Duck
Union (two 2K-row tables ×50) 5410023 0.54× 2.35×
mskql
PG
Duck
functions & generation
Generate series (10K rows ×200) 21167238 0.31× 5.55×
mskql
PG
Duck
Scalar functions (4 functions, 5K rows ×200) 37895571 0.40× 5.32×
mskql
PG
Duck
large dataset (50K rows)
Large sort (50K rows ×10) 23435950 0.65× 4.68×
mskql
PG
Duck
vector (mskql vs PostgreSQL + pgvector)
Vector insert (5K rows, VECTOR(16)) 138418 0.33×
mskql
PG
Vector scan (5K rows ×200, cached) 77138 0.56×
mskql
PG
Vector scan (no cache) 107137 0.78×
mskql
PG
Vector wide scan (id + name + vec, cached) 719 0.39×
mskql
PG
Vector wide scan (no cache) 1518 0.84×
mskql
PG
Vector filter (WHERE + vec output, cached) 1345 0.29×
mskql
PG
Vector filter (no cache) 3044 0.68×
mskql
PG

All 28 cached workloads faster than PostgreSQL. Closest: index lookups (0.96×), transactions (0.86×). Against DuckDB: mskql wins aggregates, distinct, writes, analytical CTE, correlated subquery, joins. DuckDB wins scans, sorts, window functions, generate_series, scalar functions. See workload details for the exact SQL.

3. Parquet Benchmarks — mskql vs DuckDB

9 Parquet workloads (50K–200K rows). PostgreSQL cannot query Parquet files natively, so this section compares mskql and DuckDB only. The mskql side uses CREATE FOREIGN TABLE; DuckDB uses its native Parquet reader.

Workload mskql (ms) Duck (ms) Cached No-cache Notes Visual (cached)
pq_full_scan (50K rows) 48531 15.6× 15.6× Row-by-row Parquet read; no columnar pushdown yet
mskql
Duck
pq_where (filtered scan, 50K rows) 2042 0.48× 2.24× Result cache bypasses Parquet I/O on repeat
mskql
Duck
pq_aggregate (GROUP BY, 50K rows) 2673 0.36× 1.46× Columnar hash aggregation on cached data
mskql
Duck
pq_order_by (sort, 50K rows) 36528 13.0× 13.0× Sort requires full materialization; DuckDB sorts columnar natively
mskql
Duck
pq_wide_agg (7 cols, 50K rows) 2540 0.63× 2.00× Multi-column agg stays in L1 cache
mskql
Duck
pq_join_two (50K + 2K rows) 2333 0.70× 3.30× Hash join on cached Parquet data
mskql
Duck
pq_join_three (3 Parquet files) 2736 0.75× 3.56× Three-way join across Parquet files
mskql
Duck
pq_subquery (IN subquery, 2 files) 1841 0.44× 2.22× Hash semi-join on cached result
mskql
Duck
pq_analytical (CTE + join + agg, 3 files) 2639 0.67× 3.08× Full analytical pipeline on Parquet
mskql
Duck

Cached: mskql wins 7 of 9. The result cache serves repeated Parquet queries from cached wire bytes. DuckDB re-reads the Parquet file every time. No-cache: DuckDB wins 8 of 9. DuckDB’s native columnar Parquet reader is faster on cold reads. On the mskql side, Parquet is read row-by-row via PLAN_PARQUET_SCAN—no columnar pushdown yet.

4. Concurrent Throughput (8 threads, 5 seconds)

Comparison: mskql vs PostgreSQL only—DuckDB does not expose a wire-protocol server. Ratio > 1.0 means mskql has higher throughput.

Workload mskql QPS PG QPS Ratio Notes Visual
simple reads & writes
Indexed point lookup 79,10223,461 3.37× mskql 0.101ms p50 vs PG 0.254ms
mskql
PG
Full table scan (5K rows) 122120 1.02× Network-bound: both ~66ms p50
mskql
PG
Filtered scan (5K rows) 245241 1.02× Network-bound: both ~33ms p50
mskql
PG
Single-row insert 63,25339,189 1.61× No fsync, no WAL
mskql
PG
Mixed read/write (80/20) 54,81258,001 0.94× PG’s MVCC wins on concurrent writes
mskql
PG
aggregation & analytics
Group + sum (5K rows) 94,52513,929 6.79× Result cache: 0.084ms p50
mskql
PG
Wide aggregate (50K rows) 8,4121,036 8.12× Result cache: 0.952ms p50
mskql
PG
Window function (RANK, 50K rows) 1412 1.22× Both slow: 50K-row window is CPU-bound
mskql
PG
complex queries (50K-row dataset)
Two-table join (500 + 2K rows) 562302 1.86×
mskql
PG
Multi-join (3 tables) 36,171462 78.3× Result cache: 0.221ms p50
mskql
PG
Mixed analytical (CTE→join→agg→sort) 94,525496 190× Result cache: 0.085ms p50
mskql
PG
CTE pipeline (agg→sort→limit) 5,6061,920 2.92×
mskql
PG
Subquery pipeline (filter+sort+limit) 1,170898 1.30×
mskql
PG
Large sort (50K rows) 3012 2.54×
mskql
PG

5. The Honest Picture

Where mskql wins

vs PostgreSQL: every cached batch workload. Correlated subquery: 22ms vs 3,665ms (166×). Expression aggregate: 33ms vs 323ms (10×). Throughput: mixed_analytical at 94,525 QPS vs 496 (190×). multi_join at 36,171 vs 462 (78×). The result cache and zero-allocation executor dominate on repeated analytical queries.

vs DuckDB: aggregates, writes, correlated subqueries, analytical CTEs, joins. Expression aggregate: 33ms vs 249ms (8×). Aggregate: 35ms vs 199ms (6×). Insert+delete: 2,654ms vs 15,972ms (6×). Bulk update: 55ms vs 85ms (1.5×). Analytical CTE: 20ms vs 32ms (1.6×). Correlated subquery: 22ms vs 38ms (1.7×).

Vectors: all 7 workloads vs PostgreSQL + pgvector. Vector insert: 138ms vs 418ms (3×). Vector filter (cached): 13ms vs 45ms (3.5×). Even without the result cache, mskql is faster on every vector workload—the columnar float[] storage and zero-copy scan path outperform pgvector’s row-based storage.

Parquet (cached): 7 of 9 workloads. pq_aggregate: 26ms vs 73ms (2.8×). pq_subquery: 18ms vs 41ms (2.3×). The result cache makes repeated Parquet queries faster than DuckDB’s native reader.

Where DuckDB wins

Full scans: 4.3× faster. DuckDB’s columnar storage reads contiguous column data with SIMD vectorization. The mskql row store converts to columnar via the scan cache—fast, but not as fast as native columnar.

Sorts: 4.7× faster. DuckDB’s sort is SIMD-optimized for columnar data. The mskql plan executor sorts via multi-column radix sort on column blocks.

Window functions: 1.9–2.6× faster. DuckDB’s window function implementation is heavily optimized for columnar processing.

Parquet (cold): 8 of 9 workloads. DuckDB reads Parquet natively with columnar pushdown. The mskql Parquet reader scans row-by-row via PLAN_PARQUET_SCAN.

Where PostgreSQL wins

Mixed read/write throughput: 0.94×. PostgreSQL’s MVCC allows concurrent readers and writers without blocking. The mskql single-threaded executor serializes all queries.

6. Running It Yourself

Batch latency (3-way)

# Start mskql in one terminal
./build/mskql

# In another terminal, with PostgreSQL and DuckDB installed:
./bench/bench_vs_pg.sh

Concurrent throughput (mskql vs PG)

# Build the throughput benchmark
make -C bench bench_throughput

# Run (requires both mskql and PostgreSQL running)
./build/mskql_bench_tp --duration 5 --threads 8 --pg

The batch script auto-creates a mskql_bench database in PostgreSQL if it doesn’t exist. DuckDB runs in-process via its CLI. The throughput benchmark requires libpq headers at build time.