HNSW vector index

Approximate nearest-neighbor search via hierarchical navigable small-world graphs.

Setup

The benchmark inserts 10,000 vectors of dimension 128 and queries the 10 nearest neighbors using L2 distance. Compared against PostgreSQL with pgvector on the same machine.

CREATE TABLE embeddings (
    id    INT PRIMARY KEY,
    name  TEXT,
    vec   VECTOR(128)
);

CREATE INDEX ON embeddings USING hnsw (vec vector_l2_ops);

-- Insert 10K vectors, then:
SELECT id, name
FROM embeddings
ORDER BY vec <-> '[0.1, 0.2, ...]'
LIMIT 10;

Problem

Without an index, finding the K nearest neighbors requires computing the distance from the query vector to every row in the table. For 10,000 rows of dimension 128, that is 1.28M floating-point operations per query. Acceptable for small tables, but it scales linearly with row count—at 1M rows, each query would need 128M operations.

Cause

The VECTOR(n) column type was added with basic storage, insert, scan, filter, and sort support, but no index structure. The ORDER BY vec <-> query LIMIT K pattern was executed as a full scan followed by a Top-N sort. This is correct but slow for large tables.

Fix

Implemented HNSW (Hierarchical Navigable Small World) in hnsw.c (608 lines) and hnsw.h (63 lines). The index is created via CREATE INDEX ... USING hnsw with support for three distance metrics.

Graph structure — each vector is a node in a layered graph. Higher layers are sparser and cover longer distances; layer 0 is the densest. Level assignment follows a random exponential distribution with multiplier ml = 1/ln(M).
Build (hnsw_insert) — inserts a new vector by searching for its nearest neighbors at each layer (beam search with ef_construction candidates), then adding bidirectional edges. Edges are pruned when a node exceeds M neighbors (layer > 0) or 2×M neighbors (layer 0).
Search (hnsw_search) — starts at the entry point on the highest layer, greedily descends to layer 0, then runs a beam search with configurable ef_search to find the K nearest neighbors.
Distance functions — L2 (Euclidean), cosine, and inner product. Implemented in vector.c as tight float loops that the compiler auto-vectorizes to NEON/SSE.
Delete (hnsw_remove) — marks a node as deleted and removes its edges from neighbors. Does not currently compact the graph.

The planner detects ORDER BY vec <-> query LIMIT K and, when an HNSW index exists on the vector column, rewrites the scan to use hnsw_search instead of a full table scan + sort.

Result

KNN queries on indexed tables return in sub-millisecond time regardless of table size (up to the tested 10K rows). The index parameters (M=16, ef_construction=200) provide a good recall/speed tradeoff for typical embedding workloads.

Distance metric	Syntax
L2 (Euclidean)	`vector_l2_ops`
Cosine	`vector_cosine_ops`
Inner product	`vector_ip_ops`

The implementation is 608 lines of C with no external dependencies. All vector benchmarks pass with the HNSW index enabled, and the brute-force fallback remains available for tables without an index.