You can trust your SQL here. 1,514 test cases verify that every query produces bit-identical output to PostgreSQL—same column types, same NULL handling, same edge cases. Every test runs under AddressSanitizer, catching memory errors that would be silent bugs in production. The result: a database that is both faster and provably correct against the PostgreSQL standard.
How? Three agents working in adversarial rounds: one writing tests designed to break the system, one reviewing code for structural problems, one fixing failures.
The three agents worked in iterative rounds. The Challenger studied the current
codebase and produced .sql test files targeting adversarial inputs:
empty tables, NULL propagation through expressions, multi-column ordering with
mixed ASC/DESC, stale index entries after deletes, and type-cast edge cases.
The Reviewer read the source and annotated it with actionable comments: missing
error paths, redundant allocations, architectural improvements. The Writer ran the
full suite under ASAN, diagnosed failures, and shipped fixes. This continued until
every test passed with zero memory errors.
The Challenger studies the current codebase and produces
.sql test files targeting corner cases: empty tables, NULL
propagation through expressions, multi-column ordering with mixed
ASC/DESC, stale index entries after deletes, concurrent readers during
writes, and edge cases in type casting. Each round adds 10–30 new
tests.
The Writer runs the new tests under AddressSanitizer,
diagnoses failures, and ships fixes. It also implements new features
requested by the test suite—if the Challenger writes a test for
INTERSECT ALL, the Writer adds the parser rule, executor
path, and wire serialisation to make it pass.
The Reviewer reads the source after each round and annotates it with actionable comments: missing error paths, redundant allocations, opportunities to share code between the plan executor and legacy row-by-row path, and architectural improvements like converting ENUM storage from string-based to ordinal.
Write code, run 1,514 tests, fix failures, repeat. The adversarial model drove correctness the same way rigorous code review does on a human team—except all three sides were machines. A typical round:
A database where correctness is not an afterthought. The adversarial loop forced properties that would be hard to specify upfront—each round exposed assumptions the code had silently made. The practical result:
Architecture · Design philosophy · Benchmarks · Testing methodology · Try it in the browser