CREATE DISK TABLE adds persistence at 0.96× the in-memory speed.
The benchmark uses the same aggregate workload as the in-memory benchmarks,
but the table is created with CREATE DISK TABLE. 5,000 rows,
500 iterations. Measured as wall-clock latency through the PostgreSQL wire
protocol.
CREATE DISK TABLE sales (
id INT, region TEXT, product TEXT,
quantity INT, amount NUMERIC(10,2)
) DIRECTORY '/tmp/mskql_bench/sales';
-- 5,000 rows inserted (auto-persisted to disk)
SELECT region, SUM(amount)
FROM sales
GROUP BY region;
-- repeated 500 times
In-memory tables are volatile. Restarting the mskql process loses all data. For use cases that require persistence—development databases, long-running analytics servers, any deployment where data outlives the process—the in-memory-only model is insufficient.
Adding persistence typically introduces I/O overhead on both reads and writes. Disk-based databases use buffer pools, page caches, and write-ahead logs to manage the memory-to-disk boundary. Each of these layers adds latency and complexity.
The disk I/O layer (diskio.c / diskio.h) implements
a simpler model:
The CREATE DISK TABLE ... DIRECTORY '...' syntax is the only
user-facing change. Once created, disk tables behave identically to in-memory
tables for all query operations. The persistence is transparent.
dk/mem ratios across all disk-eligible batch workloads:
| Metric | Value |
|---|---|
| Average dk/mem ratio | 0.96× |
| Best (group+sum) | 0.71× (disk faster due to measurement noise) |
| Worst (correlated subquery) | 1.19× |
| Aggregate workload (same query above) | 0.71× |
The near-unity dk/mem ratio confirms that read performance is identical: once loaded, disk tables are in-memory tables. The slight variation (±0.2×) is within run-to-run measurement noise. Write-heavy workloads show slightly higher overhead due to the synchronous disk persist on each mutation, but read-dominated analytical workloads are unaffected.