This tutorial requires the native mskql server. Parquet support
uses filesystem access and is not available in the WebAssembly playground.
Start the server with make && ./build/mskql and connect
with psql -h 127.0.0.1 -p 5433 -U test -d mskql.
A foreign table maps an external Parquet file to a SQL table name.
No data is copied into memory—mskql reads the Parquet file on demand.
Once mapped, you query it with the same SQL you use for regular tables:
SELECT, WHERE, JOIN,
GROUP BY, ORDER BY, subqueries, CTEs.
CREATE FOREIGN TABLE events (
id INT,
user_id INT,
event_type TEXT,
amount INT,
score INT,
created_at TEXT
) OPTIONS (filename '/path/to/events.parquet');
The column definitions must match the Parquet file’s schema.
Supported Parquet types: INT32, INT64,
FLOAT, DOUBLE, BYTE_ARRAY
(→ TEXT), INT96
(→ TIMESTAMP).
SELECT * FROM events
WHERE amount > 500 AND score > 5000;
SELECT event_type, COUNT(*), SUM(amount), AVG(score)
FROM events
GROUP BY event_type;
SELECT * FROM events
ORDER BY score DESC;
Map multiple Parquet files and join them with standard SQL.
CREATE FOREIGN TABLE orders (
id INT,
customer_id INT,
product_id INT,
quantity INT,
amount INT
) OPTIONS (filename '/path/to/orders.parquet');
CREATE FOREIGN TABLE customers (
id INT,
name TEXT,
region TEXT,
tier TEXT
) OPTIONS (filename '/path/to/customers.parquet');
SELECT customers.region, COUNT(*), SUM(orders.amount)
FROM orders
JOIN customers ON orders.customer_id = customers.id
GROUP BY customers.region
ORDER BY customers.region;
Foreign tables and regular tables work together in the same query.
CREATE TABLE local_regions (
id INT,
region_name TEXT,
tax_rate INT
);
INSERT INTO local_regions VALUES (0, 'north', 10);
INSERT INTO local_regions VALUES (1, 'south', 8);
INSERT INTO local_regions VALUES (2, 'east', 12);
INSERT INTO local_regions VALUES (3, 'west', 9);
CREATE FOREIGN TABLE sales (
id INT,
region_id INT,
amount INT
) OPTIONS (filename '/path/to/sales.parquet');
SELECT lr.region_name, SUM(sales.amount),
SUM(sales.amount * lr.tax_rate / 100) AS tax
FROM sales
JOIN local_regions lr ON sales.region_id = lr.id
GROUP BY lr.region_name
ORDER BY lr.region_name;
SELECT * FROM events
WHERE user_id IN (
SELECT id FROM customers WHERE tier = 'premium'
)
AND amount > 500
ORDER BY score DESC
LIMIT 100;
CREATE FOREIGN TABLE products (
id INT,
name TEXT,
category TEXT,
price INT
) OPTIONS (filename '/path/to/products.parquet');
WITH order_totals AS (
SELECT orders.customer_id,
SUM(orders.quantity * products.price) AS total
FROM orders
JOIN products ON orders.product_id = products.id
GROUP BY orders.customer_id
)
SELECT customers.region, COUNT(*), SUM(ot.total)
FROM order_totals ot
JOIN customers ON ot.customer_id = customers.id
GROUP BY customers.region
ORDER BY customers.region;
9 Parquet workloads benchmarked against DuckDB on 50K–200K row files. On cached queries, mskql is faster than DuckDB on 7 of 9. DuckDB faster on 8 of 9 cold (no-cache) queries.
| Workload | mskql (ms) | DuckDB (ms) | Cached ratio |
|---|---|---|---|
| pq_aggregate (GROUP BY) | 28 | 76 | 0.37× |
| pq_subquery (IN subquery) | 21 | 44 | 0.49× |
| pq_where (filtered scan) | 22 | 44 | 0.50× |
| pq_wide_agg (7 columns) | 27 | 42 | 0.65× |
| pq_analytical (CTE+join+agg) | 29 | 41 | 0.71× |
| pq_join_three (3 files) | 30 | 38 | 0.79× |
| pq_join_two (2 files) | 28 | 35 | 0.80× |
| pq_full_scan (50K rows) | 449 | 32 | 14.0× |
| pq_order_by (sort) | 371 | 29 | 12.8× |
The result cache serves repeated queries from cached wire bytes—no Parquet re-read. DuckDB re-reads the file every time. On cold reads, DuckDB’s native columnar Parquet reader is faster.
→ Full Parquet benchmark results
DROP FOREIGN TABLE events;
DROP FOREIGN TABLE orders;
DROP FOREIGN TABLE customers;
DROP FOREIGN TABLE products;
DROP FOREIGN TABLE sales;
DROP TABLE local_regions;
All tutorials · SQL reference · Grammar · Benchmarks · Architecture