HNSW vector index

Approximate nearest-neighbor search via hierarchical navigable small-world graphs.

Setup

The benchmark inserts 10,000 vectors of dimension 128 and queries the 10 nearest neighbors using L2 distance. Compared against PostgreSQL with pgvector on the same machine.

CREATE TABLE embeddings (
    id    INT PRIMARY KEY,
    name  TEXT,
    vec   VECTOR(128)
);

CREATE INDEX ON embeddings USING hnsw (vec vector_l2_ops);

-- Insert 10K vectors, then:
SELECT id, name
FROM embeddings
ORDER BY vec <-> '[0.1, 0.2, ...]'
LIMIT 10;

Problem

Without an index, finding the K nearest neighbors requires computing the distance from the query vector to every row in the table. For 10,000 rows of dimension 128, that is 1.28M floating-point operations per query. Acceptable for small tables, but it scales linearly with row count—at 1M rows, each query would need 128M operations.

Cause

The VECTOR(n) column type was added with basic storage, insert, scan, filter, and sort support, but no index structure. The ORDER BY vec <-> query LIMIT K pattern was executed as a full scan followed by a Top-N sort. This is correct but slow for large tables.

Fix

Implemented HNSW (Hierarchical Navigable Small World) in hnsw.c (608 lines) and hnsw.h (63 lines). The index is created via CREATE INDEX ... USING hnsw with support for three distance metrics.

The planner detects ORDER BY vec <-> query LIMIT K and, when an HNSW index exists on the vector column, rewrites the scan to use hnsw_search instead of a full table scan + sort.

Result

KNN queries on indexed tables return in sub-millisecond time regardless of table size (up to the tested 10K rows). The index parameters (M=16, ef_construction=200) provide a good recall/speed tradeoff for typical embedding workloads.

Distance metricSyntax
L2 (Euclidean)vector_l2_ops
Cosinevector_cosine_ops
Inner productvector_ip_ops

The implementation is 608 lines of C with no external dependencies. All vector benchmarks pass with the HNSW index enabled, and the brute-force fallback remains available for tables without an index.