SQL Reference

Supported SQL and data types

Every SQL statement, clause, and data type supported by mskql. For the formal grammar, see the Grammar Reference.

Supported SQL

CategorySupportedNotes
DDL CREATE TABLE (IF NOT EXISTS, LIKE), DROP TABLE, CREATE INDEX (IF NOT EXISTS, composite multi-column), DROP INDEX, CREATE TYPE … AS ENUM, DROP TYPE, CREATE SEQUENCE, DROP SEQUENCE, CREATE VIEW, DROP VIEW, ALTER TABLE (ADD/DROP/RENAME COLUMN, ALTER COLUMN TYPE, SET DEFAULT/DROP DEFAULT, SET NOT NULL/DROP NOT NULL), composite PRIMARY KEY(col1, col2) / UNIQUE(col1, col2) Schema changes are immediate; no background rewrite. Views are expanded at query time.
DML INSERT (single, multi-row, INSERT…SELECT), SELECT, UPDATE (incl. UPDATE…FROM), DELETE, RETURNING, ON CONFLICT DO NOTHING, ON CONFLICT DO UPDATE (with EXCLUDED.* references), TRUNCATE TABLE, COPY TO STDOUT (tab/CSV/header), COPY FROM STDIN (tab/CSV) In-place updates—no MVCC tuple versioning. Inserts are 3× faster than PostgreSQL.
Filtering WHERE with AND/OR/NOT, BETWEEN, IN (values, subqueries, multi-column), ANY/ALL, LIKE, ILIKE, IS [NOT] NULL, IS [NOT] DISTINCT FROM, = != <> < > <= >= Compound filters (AND/OR trees) run in the plan executor on columnar blocks.
Joins INNER, LEFT, RIGHT, FULL OUTER, CROSS, NATURAL, USING, LATERAL, multi-table, compound ON conditions (AND/OR) Hash join for equi-joins; nested loop for CROSS/LATERAL. Join cache reuses hash tables across queries.
Aggregation COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT col), STRING_AGG(), ARRAY_AGG(), expression aggregates (SUM(price * qty)), GROUP BY (multi-column, positional: GROUP BY 1, 2), GROUP BY ROLLUP, GROUP BY CUBE, HAVING Columnar hash aggregation—6–10× faster than PostgreSQL on typical workloads.
Window functions ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), PERCENT_RANK(), CUME_DIST(), LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE(), SUM(), COUNT(), AVG() over PARTITION BY … ORDER BY …, window frames (ROWS BETWEEN …) Sort-based window execution with bump-allocated accumulators. ~2× faster than PostgreSQL.
Set operations UNION, UNION ALL, INTERSECT, EXCEPT UNION deduplicates via hash; UNION ALL concatenates without dedup.
CTEs WITH name AS (SELECT …) SELECT …, WITH RECURSIVE Simple CTEs are inlined when possible, avoiding temp-table materialization overhead.
Expressions + - * / %, unary -, operator precedence, CASE WHEN…THEN…ELSE…END, COALESCE(), NULLIF(), GREATEST(), LEAST() Evaluated in columnar batches via PLAN_EXPR_PROJECT—35% faster than PostgreSQL.
Result control DISTINCT, DISTINCT ON, ORDER BY (multi-column, per-column ASC/DESC, positional), LIMIT, OFFSET, FETCH FIRST N ROWS ONLY LIMIT short-circuits the plan—stops pulling blocks once satisfied.
Constraints NOT NULL, UNIQUE, PRIMARY KEY, DEFAULT, CHECK, REFERENCES (foreign keys with ON DELETE/UPDATE CASCADE, RESTRICT, SET NULL, SET DEFAULT) Enforced at write time. Foreign keys support all four referential actions.
Transactions BEGIN, COMMIT, ROLLBACK (snapshot-based) Lazy copy-on-write snapshots. First write to a table triggers O(N) deep copy.
String functions UPPER(), LOWER(), LENGTH(), TRIM(), CONCAT(), CONCAT_WS(), ||, SUBSTRING(), REPLACE(), LPAD()/RPAD(), POSITION(), SPLIT_PART(), LEFT()/RIGHT(), REPEAT(), REVERSE(), INITCAP(), LTRIM(), RTRIM(), SUBSTR() 19 string functions. All run in the expression projector for batched evaluation.
Math functions ABS(), CEIL()/CEILING(), FLOOR(), ROUND() (with optional decimal places), POWER(), SQRT(), MOD(), SIGN(), RANDOM() 9 math functions. ROUND(expr, n) supports arbitrary decimal places.
Date/time functions NOW(), CURRENT_TIMESTAMP, CURRENT_DATE, EXTRACT(), DATE_TRUNC(), DATE_PART(), AGE(), TO_CHAR(), date/timestamp ± interval arithmetic, timestamp − timestamp → interval, date ± integer days Full interval arithmetic. TO_CHAR supports common PostgreSQL format patterns.
Type casting CAST(expr AS type), expr::type postfix syntax, numeric↔numeric, numeric↔text, text↔boolean, text→date/timestamp conversions Both CAST() and :: syntax. Implicit coercion for compatible types.
Table functions generate_series(start, stop [, step]) — integer, timestamp, and date sequences with optional step and column aliases Supports integer, date, and timestamp series. Max 10M rows safety limit.
Sequences & utilities NEXTVAL(), CURRVAL(), GEN_RANDOM_UUID() SERIAL columns use sequences internally. UUIDs are v4 random.
Foreign tables CREATE FOREIGN TABLE name (col type, …) OPTIONS (filename ’path.parquet’) — maps a Parquet file to a SQL table. Query with standard SQL: joins, aggregates, subqueries, CTEs. Result cache applies. DROP FOREIGN TABLE Parquet files queried on demand. Cached queries are 2–14× faster than DuckDB.
Subqueries Scalar subqueries in SELECT list, correlated subqueries, EXISTS / NOT EXISTS (correlated), IN (SELECT…) IN subqueries use hash semi-join (O(N+M)). Correlated subqueries: 166× faster than PG.
Introspection EXPLAIN (plan-based and legacy execution paths), SHOW (server_version, search_path, client_encoding, etc.), SET/RESET/DISCARD (accepted, no-op), information_schema.tables, information_schema.columns, pg_catalog.pg_type, pg_catalog.pg_class, pg_catalog.pg_namespace, pg_catalog.pg_attribute, pg_catalog.pg_index, pg_catalog.pg_attrdef, pg_catalog.pg_constraint, pg_catalog.pg_am, pg_catalog.pg_roles EXPLAIN shows the plan tree. System catalogs enable tool compatibility (e.g. DBeaver, pgAdmin).

→ Full SQL grammar reference — production rules for every statement, with implemented/not-implemented status for each feature.

Data Types

TypeStorageDescription
SMALLINT / INT22 bytes16-bit signed integer (−32,768 to 32,767)
INT4 bytes32-bit signed integer. Default for most integer columns.
BIGINT8 bytes64-bit signed integer. Use for row counts or large IDs.
FLOAT8 bytesDouble-precision (C double). Use for scientific or approximate values.
NUMERIC / DECIMAL8 bytesStored as double. Use for monetary or exact decimal values.
TEXTvariableVariable-length string. No length limit.
VARCHAR(n)variableString with max length n. Stored as TEXT internally.
BOOLEAN1 byteTRUE / FALSE. Accepts t/f, yes/no, 1/0.
DATE4 bytesCalendar date (YYYY-MM-DD). Stored as int32_t days from PostgreSQL epoch (2000-01-01).
TIME8 bytesTime of day (HH:MM:SS). Stored as int64_t microseconds since midnight.
TIMESTAMP8 bytesDate and time (YYYY-MM-DD HH:MM:SS). Stored as int64_t microseconds from PG epoch. Supports arithmetic with intervals.
TIMESTAMPTZ8 bytesTimestamp with time zone awareness. Same storage as TIMESTAMP.
INTERVAL16 bytesTime span (months int32 + days int32 + usec int64). Supports '1 year 2 months 3 days' syntax.
UUID16 bytes128-bit identifier stored as two uint64. Generated with GEN_RANDOM_UUID().
SMALLSERIAL / SERIAL / BIGSERIAL2/4/8 bytesAuto-incrementing integer backed by a sequence.
VECTOR(n)n × 4 bytesFixed-dimension float array. pgvector-compatible [1.0, 2.0, 3.0] literal syntax. Stored as contiguous float[] in columnar blocks.
ENUM4 bytesUser-defined type with ordinal storage. Compared by declaration order.