Indexes & Query Performance

CREATE INDEX, EXPLAIN, and when indexes matter

Tutorial 9 of 11


Without an index, every WHERE clause requires a full table scan—reading every row to find matches. An index is a separate data structure (a B-tree in mskql) that maps column values to row positions, turning O(N) lookups into O(log N). This tutorial shows when and how to use them.

1. Schema and seed data

A user-events table with 20 rows. Small enough to see every row, large enough to demonstrate index behavior.

CREATE TABLE events (
    id        SERIAL PRIMARY KEY,
    user_id   INT NOT NULL,
    action    TEXT NOT NULL,
    amount    INT,
    created   DATE NOT NULL
);

INSERT INTO events (user_id, action, amount, created) VALUES
    (1, 'login',    NULL, '2025-01-01'),
    (1, 'purchase', 50,   '2025-01-02'),
    (1, 'purchase', 120,  '2025-01-05'),
    (2, 'login',    NULL, '2025-01-01'),
    (2, 'purchase', 30,   '2025-01-03'),
    (2, 'login',    NULL, '2025-01-06'),
    (3, 'login',    NULL, '2025-01-02'),
    (3, 'purchase', 200,  '2025-01-04'),
    (3, 'purchase', 75,   '2025-01-07'),
    (3, 'refund',   -75,  '2025-01-08'),
    (4, 'login',    NULL, '2025-01-01'),
    (4, 'purchase', 90,   '2025-01-02'),
    (4, 'purchase', 45,   '2025-01-05'),
    (4, 'login',    NULL, '2025-01-09'),
    (5, 'login',    NULL, '2025-01-03'),
    (5, 'purchase', 300,  '2025-01-04'),
    (5, 'refund',   -100, '2025-01-06'),
    (5, 'purchase', 60,   '2025-01-08'),
    (5, 'login',    NULL, '2025-01-10'),
    (1, 'login',    NULL, '2025-01-10');

2. Query without an index

Find all events for user 3. Without an index, the executor must scan all 20 rows.

SELECT id, action, amount, created
FROM events
WHERE user_id = 3
ORDER BY created;

Result — 4 rows (scanned all 20)

idactionamountcreated
7loginNULL2025-01-02
8purchase2002025-01-04
9purchase752025-01-07
10refund-752025-01-08

3. EXPLAIN — see the query plan

EXPLAIN shows how the executor will run a query. Before adding an index, it shows a sequential scan.

EXPLAIN SELECT * FROM events WHERE user_id = 3;

The output shows a SeqScan (sequential scan) followed by a Filter on user_id = 3. Every row is read, then the filter discards non-matching rows.

4. Create an index

Add a B-tree index on user_id. The executor can now jump directly to matching rows instead of scanning the entire table.

CREATE INDEX idx_events_user ON events (user_id);

Now run EXPLAIN again:

EXPLAIN SELECT * FROM events WHERE user_id = 3;

The plan now shows an IndexScan on idx_events_user. Instead of reading 20 rows, it reads only the 4 matching rows directly.

5. IF NOT EXISTS

Use IF NOT EXISTS to make index creation idempotent—safe to run in migrations that might execute more than once.

CREATE INDEX IF NOT EXISTS idx_events_user ON events (user_id);
CREATE INDEX IF NOT EXISTS idx_events_action ON events (action);

The first statement is a no-op (index already exists). The second creates a new index on the action column.

6. Filter by indexed column

With the action index in place, queries filtering on action type use the index automatically.

SELECT user_id,
       SUM(amount) AS total
FROM events
WHERE action = 'purchase'
GROUP BY user_id
ORDER BY total DESC;

Result

user_idtotal
5360
3275
1170
4135
230

7. Primary key lookups

PRIMARY KEY automatically creates a unique index. Lookups by primary key are always indexed—no extra CREATE INDEX needed.

SELECT id, user_id, action,
       amount
FROM events
WHERE id = 8;

Result — direct index lookup

iduser_idactionamount
83purchase200

8. Drop an index

Remove an index when it is no longer needed. The table data is unaffected; only the lookup structure is removed.

DROP INDEX idx_events_action;

After dropping, queries on action fall back to sequential scan. The user_id index remains.

9. When to create indexes

Indexes are not free—they consume memory and slow down INSERT/UPDATE/DELETE because the index must be updated alongside the table. Use them when:

Avoid indexes on columns with few distinct values (e.g., a boolean is_active column) or on tables that are mostly written and rarely queried.

What you learned

In this tutorial you created and used indexes on a 20-row events table: