Wide tables with many columns are common in analytics. A query like
SELECT name, score FROM events on a 10-column table should not
read the other 8 columns. The benchmark is wc_wide_output:
50,000 rows with 10 columns, sorted by one column.
SELECT id, name, score
FROM wide_table
ORDER BY score DESC;
Without projection pushdown, PLAN_SEQ_SCAN reads all columns from
the table into the pipeline, even if only a subset is needed. For wide tables,
this wastes memory bandwidth and pollutes the CPU cache with unused data.
The sort stage then shuffles full-width rows through the comparator.
The plan builder analyzes the SELECT list, WHERE clause, ORDER BY, and GROUP BY
to determine which columns are actually referenced. It constructs a column mask
and passes it to the scan node. The scan node only materializes the needed columns
into col_block arrays.
SELECT *, all columns are marked.PLAN_SEQ_SCAN skips columns
not in the mask. For columnar storage this means entire column arrays are
never touched.Projection pushdown is especially effective for Parquet foreign tables. The Parquet reader can skip entire column chunks at the I/O level, avoiding both decompression and deserialization of unused columns. A query selecting 2 of 10 columns reads ~80% less data from disk.
The wide output sort benchmark (wc_wide_output, 10 columns, 50K rows):
| mskql | PostgreSQL | DuckDB | |
|---|---|---|---|
| wc_wide_output (cached) | 128ms | 170ms | 40ms |
| vs PostgreSQL ratio | 0.75× (mskql 1.3× faster) | ||
Projection pushdown is one of the reasons mskql beats PostgreSQL on every analytical workload despite having no query planner or cost model. The plan builder applies it unconditionally—there is no statistics-based decision about whether to prune columns.