The benchmark inserts 10,000 vectors of dimension 128 and queries the 10 nearest neighbors using L2 distance. Compared against PostgreSQL with pgvector on the same machine.
CREATE TABLE embeddings (
id INT PRIMARY KEY,
name TEXT,
vec VECTOR(128)
);
CREATE INDEX ON embeddings USING hnsw (vec vector_l2_ops);
-- Insert 10K vectors, then:
SELECT id, name
FROM embeddings
ORDER BY vec <-> '[0.1, 0.2, ...]'
LIMIT 10;
Without an index, finding the K nearest neighbors requires computing the distance from the query vector to every row in the table. For 10,000 rows of dimension 128, that is 1.28M floating-point operations per query. Acceptable for small tables, but it scales linearly with row count—at 1M rows, each query would need 128M operations.
The VECTOR(n) column type was added with basic storage,
insert, scan, filter, and sort support, but no index structure. The
ORDER BY vec <-> query LIMIT K pattern was executed
as a full scan followed by a Top-N sort. This is correct but slow
for large tables.
Implemented HNSW (Hierarchical Navigable Small World) in
hnsw.c (608 lines) and hnsw.h (63 lines).
The index is created via CREATE INDEX ... USING hnsw with
support for three distance metrics.
ml = 1/ln(M).hnsw_insert) — inserts a
new vector by searching for its nearest neighbors at each layer
(beam search with ef_construction candidates), then
adding bidirectional edges. Edges are pruned when a node exceeds
M neighbors (layer > 0) or 2×M
neighbors (layer 0).hnsw_search) — starts
at the entry point on the highest layer, greedily descends to
layer 0, then runs a beam search with configurable
ef_search to find the K nearest neighbors.vector.c as tight
float loops that the compiler auto-vectorizes to NEON/SSE.hnsw_remove) — marks a
node as deleted and removes its edges from neighbors. Does not
currently compact the graph.
The planner detects ORDER BY vec <-> query LIMIT K
and, when an HNSW index exists on the vector column, rewrites the scan
to use hnsw_search instead of a full table scan + sort.
KNN queries on indexed tables return in sub-millisecond time regardless
of table size (up to the tested 10K rows). The index parameters
(M=16, ef_construction=200) provide a good
recall/speed tradeoff for typical embedding workloads.
| Distance metric | Syntax |
|---|---|
| L2 (Euclidean) | vector_l2_ops |
| Cosine | vector_cosine_ops |
| Inner product | vector_ip_ops |
The implementation is 608 lines of C with no external dependencies. All vector benchmarks pass with the HNSW index enabled, and the brute-force fallback remains available for tables without an index.