SQL Reference

Supported SQL and data types

Every SQL statement, clause, and data type supported by mskql. For the formal grammar, see the Grammar Reference.

Supported SQL

Category	Supported	Notes
DDL	`CREATE TABLE` (`IF NOT EXISTS`, `LIKE`), `DROP TABLE`, `CREATE INDEX` (`IF NOT EXISTS`, composite multi-column), `DROP INDEX`, `CREATE TYPE … AS ENUM`, `DROP TYPE`, `CREATE SEQUENCE`, `DROP SEQUENCE`, `CREATE VIEW`, `DROP VIEW`, `ALTER TABLE` (`ADD`/`DROP`/`RENAME COLUMN`, `ALTER COLUMN TYPE`, `SET DEFAULT`/`DROP DEFAULT`, `SET NOT NULL`/`DROP NOT NULL`), composite `PRIMARY KEY(col1, col2)` / `UNIQUE(col1, col2)`	Schema changes are immediate; no background rewrite. Views are expanded at query time.
DML	`INSERT` (single, multi-row, `INSERT…SELECT`), `SELECT`, `UPDATE` (incl. `UPDATE…FROM`), `DELETE`, `RETURNING`, `ON CONFLICT DO NOTHING`, `ON CONFLICT DO UPDATE` (with `EXCLUDED.*` references), `TRUNCATE TABLE`, `COPY TO STDOUT` (tab/CSV/header), `COPY FROM STDIN` (tab/CSV)	In-place updates—no MVCC tuple versioning. Inserts are 3× faster than PostgreSQL.
Filtering	`WHERE` with `AND`/`OR`/`NOT`, `BETWEEN`, `IN` (values, subqueries, multi-column), `ANY`/`ALL`, `LIKE`, `ILIKE`, `IS [NOT] NULL`, `IS [NOT] DISTINCT FROM`, `=` `!=` `<>` `<` `>` `<=` `>=`	Compound filters (AND/OR trees) run in the plan executor on columnar blocks.
Joins	`INNER`, `LEFT`, `RIGHT`, `FULL OUTER`, `CROSS`, `NATURAL`, `USING`, `LATERAL`, multi-table, compound `ON` conditions (`AND`/`OR`)	Hash join for equi-joins; nested loop for CROSS/LATERAL. Join cache reuses hash tables across queries.
Aggregation	`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`, `COUNT(DISTINCT col)`, `STRING_AGG()`, `ARRAY_AGG()`, expression aggregates (`SUM(price * qty)`), `GROUP BY` (multi-column, positional: `GROUP BY 1, 2`), `GROUP BY ROLLUP`, `GROUP BY CUBE`, `HAVING`	Columnar hash aggregation—6–10× faster than PostgreSQL on typical workloads.
Window functions	`ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`, `NTILE()`, `PERCENT_RANK()`, `CUME_DIST()`, `LAG()`, `LEAD()`, `FIRST_VALUE()`, `LAST_VALUE()`, `NTH_VALUE()`, `SUM()`, `COUNT()`, `AVG()` over `PARTITION BY … ORDER BY …`, window frames (`ROWS BETWEEN …`)	Sort-based window execution with bump-allocated accumulators. ~2× faster than PostgreSQL.
Set operations	`UNION`, `UNION ALL`, `INTERSECT`, `EXCEPT`	`UNION` deduplicates via hash; `UNION ALL` concatenates without dedup.
CTEs	`WITH name AS (SELECT …) SELECT …`, `WITH RECURSIVE`	Simple CTEs are inlined when possible, avoiding temp-table materialization overhead.
Expressions	`+` `-` `*` `/` `%`, unary `-`, operator precedence, `CASE WHEN…THEN…ELSE…END`, `COALESCE()`, `NULLIF()`, `GREATEST()`, `LEAST()`	Evaluated in columnar batches via `PLAN_EXPR_PROJECT`—35% faster than PostgreSQL.
Result control	`DISTINCT`, `DISTINCT ON`, `ORDER BY` (multi-column, per-column `ASC`/`DESC`, positional), `LIMIT`, `OFFSET`, `FETCH FIRST N ROWS ONLY`	`LIMIT` short-circuits the plan—stops pulling blocks once satisfied.
Constraints	`NOT NULL`, `UNIQUE`, `PRIMARY KEY`, `DEFAULT`, `CHECK`, `REFERENCES` (foreign keys with `ON DELETE/UPDATE CASCADE`, `RESTRICT`, `SET NULL`, `SET DEFAULT`)	Enforced at write time. Foreign keys support all four referential actions.
Transactions	`BEGIN`, `COMMIT`, `ROLLBACK` (snapshot-based)	Lazy copy-on-write snapshots. First write to a table triggers O(N) deep copy.
String functions	`UPPER()`, `LOWER()`, `LENGTH()`, `TRIM()`, `CONCAT()`, `CONCAT_WS()`, `\|\|`, `SUBSTRING()`, `REPLACE()`, `LPAD()`/`RPAD()`, `POSITION()`, `SPLIT_PART()`, `LEFT()`/`RIGHT()`, `REPEAT()`, `REVERSE()`, `INITCAP()`, `LTRIM()`, `RTRIM()`, `SUBSTR()`	19 string functions. All run in the expression projector for batched evaluation.
Math functions	`ABS()`, `CEIL()`/`CEILING()`, `FLOOR()`, `ROUND()` (with optional decimal places), `POWER()`, `SQRT()`, `MOD()`, `SIGN()`, `RANDOM()`	9 math functions. `ROUND(expr, n)` supports arbitrary decimal places.
Date/time functions	`NOW()`, `CURRENT_TIMESTAMP`, `CURRENT_DATE`, `EXTRACT()`, `DATE_TRUNC()`, `DATE_PART()`, `AGE()`, `TO_CHAR()`, date/timestamp ± interval arithmetic, timestamp − timestamp → interval, date ± integer days	Full interval arithmetic. `TO_CHAR` supports common PostgreSQL format patterns.
Type casting	`CAST(expr AS type)`, `expr::type` postfix syntax, numeric↔numeric, numeric↔text, text↔boolean, text→date/timestamp conversions	Both `CAST()` and `::` syntax. Implicit coercion for compatible types.
Table functions	`generate_series(start, stop [, step])` — integer, timestamp, and date sequences with optional step and column aliases	Supports integer, date, and timestamp series. Max 10M rows safety limit.
Sequences & utilities	`NEXTVAL()`, `CURRVAL()`, `GEN_RANDOM_UUID()`	`SERIAL` columns use sequences internally. UUIDs are v4 random.
Foreign tables	`CREATE FOREIGN TABLE name (col type, …) OPTIONS (filename ’path.parquet’)` — maps a Parquet file to a SQL table. Query with standard SQL: joins, aggregates, subqueries, CTEs. Result cache applies. `DROP FOREIGN TABLE`	Parquet files queried on demand. Cached queries are 2–14× faster than DuckDB.
Subqueries	Scalar subqueries in `SELECT` list, correlated subqueries, `EXISTS` / `NOT EXISTS` (correlated), `IN (SELECT…)`	`IN` subqueries use hash semi-join (O(N+M)). Correlated subqueries: 166× faster than PG.
Introspection	`EXPLAIN` (plan-based and legacy execution paths), `SHOW` (`server_version`, `search_path`, `client_encoding`, etc.), `SET`/`RESET`/`DISCARD` (accepted, no-op), `information_schema.tables`, `information_schema.columns`, `pg_catalog.pg_type`, `pg_catalog.pg_class`, `pg_catalog.pg_namespace`, `pg_catalog.pg_attribute`, `pg_catalog.pg_index`, `pg_catalog.pg_attrdef`, `pg_catalog.pg_constraint`, `pg_catalog.pg_am`, `pg_catalog.pg_roles`	`EXPLAIN` shows the plan tree. System catalogs enable tool compatibility (e.g. DBeaver, pgAdmin).

→ Full SQL grammar reference — production rules for every statement, with implemented/not-implemented status for each feature.

Data Types

Type	Storage	Description
`SMALLINT` / `INT2`	2 bytes	16-bit signed integer (−32,768 to 32,767)
`INT`	4 bytes	32-bit signed integer. Default for most integer columns.
`BIGINT`	8 bytes	64-bit signed integer. Use for row counts or large IDs.
`FLOAT`	8 bytes	Double-precision (C `double`). Use for scientific or approximate values.
`NUMERIC` / `DECIMAL`	8 bytes	Stored as `double`. Use for monetary or exact decimal values.
`TEXT`	variable	Variable-length string. No length limit.
`VARCHAR(n)`	variable	String with max length n. Stored as TEXT internally.
`BOOLEAN`	1 byte	`TRUE` / `FALSE`. Accepts `t`/`f`, `yes`/`no`, `1`/`0`.
`DATE`	4 bytes	Calendar date (YYYY-MM-DD). Stored as `int32_t` days from PostgreSQL epoch (2000-01-01).
`TIME`	8 bytes	Time of day (HH:MM:SS). Stored as `int64_t` microseconds since midnight.
`TIMESTAMP`	8 bytes	Date and time (YYYY-MM-DD HH:MM:SS). Stored as `int64_t` microseconds from PG epoch. Supports arithmetic with intervals.
`TIMESTAMPTZ`	8 bytes	Timestamp with time zone awareness. Same storage as `TIMESTAMP`.
`INTERVAL`	16 bytes	Time span (`months` int32 + `days` int32 + `usec` int64). Supports `'1 year 2 months 3 days'` syntax.
`UUID`	16 bytes	128-bit identifier stored as two `uint64`. Generated with `GEN_RANDOM_UUID()`.
`SMALLSERIAL` / `SERIAL` / `BIGSERIAL`	2/4/8 bytes	Auto-incrementing integer backed by a sequence.
`VECTOR(n)`	n × 4 bytes	Fixed-dimension float array. pgvector-compatible `[1.0, 2.0, 3.0]` literal syntax. Stored as contiguous `float[]` in columnar blocks.
`ENUM`	4 bytes	User-defined type with ordinal storage. Compared by declaration order.