Why Database Indexes Are Fast Until They Aren't

Harshit Sharma

8 min read

Every database performance problem seems to get the same first answer: add an index.

A lot of the time, that answer is right. A query that took 4 seconds suddenly takes 4 milliseconds. The database stops scanning a million rows and jumps straight to the few rows you wanted.

Then the table gets a few more indexes. Writes get slower. Disk usage doubles. A query still does a full table scan even though the "obvious" index is sitting right there.

Indexes are one of those ideas that feel simple until you ask what the database actually has to maintain.

The slow query

Imagine a users table:

CREATE TABLE users (
  id BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL,
  name TEXT NOT NULL,
  created_at TIMESTAMP NOT NULL
);

You need to find a user by email:

SELECT *
FROM users
WHERE email = 'me@example.com';

Without an index on email, the database has one honest option: start at the first row and check every row until it finds a match.

That is a table scan.

row 1      no
row 2      no
row 3      no
...
row 982341 yes

For 500 rows, nobody cares. For 50 million rows, this is a problem. The query gets slower as the table grows because the database has to inspect more data.

Find one row

No index. Check every row top to bottom.

row 1alex@example.com
row 2blair@example.com
row 3casey@example.com
row 4devon@example.com
row 5erin@example.com
row 6felix@example.com
row 7gray@example.com
... 982,334 more rows
row 982,341me@example.com
rows checked: 0

So you add an index.

CREATE INDEX users_email_idx ON users (email);

Now the database can find the row without reading the whole table. It looks up the email in the index, gets a pointer to the table row, and fetches it.

That is the magic version. The real version is less magical and more useful.

What the index actually stores

An index is a separate data structure. It is not a flag on the column.

For an index on email, the database stores email values in sorted order, plus a way to reach the original row.

email                  row location
-----------------------------------
alex@example.com        row 41
harshit@example.com     row 982341
me@example.com          row 12003
zoe@example.com         row 77

The important part is the sorted order. Once values are sorted, the database does not need to check every email. It can navigate toward the matching value, the same way you would search a dictionary.

But most database indexes are not stored as one giant sorted array. Updating the middle of a huge array would be painful. Every insert might require shifting a lot of data.

Most relational databases use some form of a B-tree.

Why B-trees work so well

A B-tree is a search tree built for storage.

A normal binary tree compares one key at a time:

        m
       / \
      f   t

That is fine in memory, where following a pointer is cheap. Databases care about pages. A page is a fixed-size chunk of storage, often 8 KB or 16 KB. Reading one page from disk or SSD is much more expensive than comparing two values already in memory.

So a B-tree node stores many keys in one page:

[ adam | dia | john | nina | sam | zoya ]

Each key range points to another page. Instead of walking one comparison at a time, the database reads a page, chooses a range, reads the next page, and repeats.

For a huge table, the path is still short:

root page
  -> internal page
    -> leaf page
      -> table row

B-tree walk for me@example.com

Each page narrows the range. The path stays short even on a huge table.

root page

a ... j
k ... z

internal page

k ... o
p ... z

leaf page

kiran@
me@example.com
nina@

table row

row 12003: me@example.com

this lookup

1 index page

full table scan

~240,000 table pages

This is why an index lookup can stay fast even when the table gets large. The tree grows wide, not tall.

At the leaf level, the index has the actual indexed value and a row reference. In PostgreSQL, that row reference is a tuple ID. In MySQL's InnoDB, secondary indexes point to the primary key, which is then used to find the row.

Different databases vary here, but the shape of the idea is the same: sorted keys, short path, row lookup.

Equality is the easy case

This query is what indexes love:

SELECT *
FROM users
WHERE email = 'me@example.com';

One column. Exact match. High selectivity, meaning the condition narrows the table down to very few rows.

The database can jump to the value in the index and fetch the matching row.

Range queries also work well:

SELECT *
FROM users
WHERE created_at >= '2026-01-01'
  AND created_at < '2026-02-01';

With an index on created_at, the database can seek to the first matching date, then walk forward through the leaf pages until the range ends.

That is much better than scanning the whole table. But it still may read a lot of rows if the range is wide.

An index does not make a query free. It makes finding the starting point cheap.

Composite indexes

Now the query changes:

SELECT *
FROM users
WHERE name = 'Harshit'
  AND created_at >= '2026-01-01';

You could create two indexes:

CREATE INDEX users_name_idx ON users (name);
CREATE INDEX users_created_at_idx ON users (created_at);

The database may use one of them, or try to combine them, depending on the engine. But often the better index is a composite index:

CREATE INDEX users_name_created_at_idx ON users (name, created_at);

This index is sorted first by name, then by created_at within each name.

name      created_at
--------------------------
Aarav     2026-01-01
Aarav     2026-01-10
Harshit   2025-12-15
Harshit   2026-01-02
Harshit   2026-01-19
Zoya      2026-01-03

Now the database can jump to Harshit, then scan only that name's date range.

Column order matters. An index on (name, created_at) is useful for:

WHERE name = 'Harshit'
WHERE name = 'Harshit' AND created_at >= '2026-01-01'

It usually will not help much for this by itself:

WHERE created_at >= '2026-01-01'

The index is not globally sorted by created_at. It is sorted by created_at only inside each name.

Index on (name, created_at)

Sorted by name first, then by created_at inside each name. Pick a query.

WHERE name = 'Harshit'
Aarav2026-01-01
Aarav2026-01-10
Harshit2025-12-15
Harshit2026-01-02
Harshit2026-01-19
Zoya2026-01-03
left edge:matchedseeked, filteredscanned

Seeks straight to the Harshit block. One contiguous range.

This is the left-prefix rule. The database can use the index efficiently from the left side of the column list. Skip the first column and the ordering stops helping.

When the index hurts

Indexes speed up reads by making writes do more work.

Insert one user into the table, and the database has to insert into every index on that table too.

table row
users_email_idx
users_created_at_idx
users_name_created_at_idx

Each index is another B-tree to update. That may mean finding the right leaf page, inserting a key, splitting a page if it is full, and writing changes to disk.

Deletes and updates have the same problem. Update an indexed column and the database has to update the index entry. Update ten indexed columns and you just made one row change touch a lot more storage.

Indexes also take space. More than people expect, in many systems. A wide text column index, several composite indexes, and a few foreign key indexes can make the indexes larger than the table itself.

That space is not passive. It affects backups, cache efficiency, replication lag, vacuum work, and write-ahead logs.

Cost of one insert

Every index is another B-tree the database must update on each write.

Indexes on table3
048
table heap+ write
users_idx_1+ write
users_idx_2+ write
users_idx_3+ write
writes per insert

4

1 heap + 3 indexes
write costmedium
disk usagemedium

Relative scale, not a benchmark. An unused index still pays this cost on every write.

An unused index is not harmless. It is a tax on every write.

Why the planner ignores your index

This is the part that feels rude.

You add an index. You run the query. The database still scans the table.

Usually, the planner is not being stupid. It is doing cost math.

Take this query:

SELECT *
FROM users
WHERE country = 'IN';

Suppose 70% of your users are in India. An index on country technically helps the database find those rows, but it still has to fetch most of the table.

Using the index may be worse than a table scan:

index lookup -> row fetch
index lookup -> row fetch
index lookup -> row fetch
...

If the query returns most of the table, reading the table sequentially can be cheaper than bouncing between the index and table pages.

Why the planner skips an index

The more rows a condition matches, the less an index helps.

Rows matched8% (~80k)
1%~25%90%
Index scanplanner picks
32 cost units

lookup + ~80k random row fetches

Sequential scan
100 cost units

read every page once, filter in memory

Simplified planner model, not an exact database formula. Real planners weigh page layout, caching, and statistics.

This is selectivity again. Good indexes filter aggressively. Bad indexes point to too much data.

The planner can also skip an index because your query hides the indexed value:

SELECT *
FROM users
WHERE LOWER(email) = 'me@example.com';

An index on email stores the original email values. The query asks for LOWER(email). Unless you have an expression index, the normal index may not help.

CREATE INDEX users_lower_email_idx ON users (LOWER(email));

Same issue with calculations:

WHERE DATE(created_at) = '2026-05-25'

This wraps the column in a function. A better query keeps the column bare and uses a range:

WHERE created_at >= '2026-05-25'
  AND created_at < '2026-05-26'

The index can work with that.

Covering indexes

In some cases, the database can answer a query from the index alone.

SELECT email
FROM users
WHERE email = 'me@example.com';

If the index contains email, the database may not need to fetch the table row. It can find the value in the index and return it directly.

That is called an index-only scan in PostgreSQL. Other databases use terms like covering index.

This is useful because the second step of an index lookup is often the expensive part:

find key in index
fetch row from table

If the index has all the columns the query needs, the table fetch disappears.

Index scan vs index-only scan

The table fetch is the expensive step. A covering index can skip it.

SELECT name FROM users WHERE email = ?

Index on (email)

non-covering

index scan

find key in index

read email here

fetch table row

random read on the heap

2 reads+ table fetch

Index has email only. The query still needs the table row.

Index on (email, name)

covering for name

index-only scan

find key in index

read email + name here

fetch table row

skipped

1 readno table fetch

Index already has email and name. No table row needed.

Composite indexes can help here:

CREATE INDEX users_email_name_idx ON users (email, name);

Now this query can potentially be answered from the index:

SELECT name
FROM users
WHERE email = 'me@example.com';

But there is a trade-off. Adding columns makes the index wider. Wider indexes take more space, fit fewer entries per page, and cost more to maintain.

You are buying faster reads with more write cost and more storage.

The practical rules

Index columns you use to find a small slice of a large table.

Good candidates:

WHERE email = ?
WHERE organization_id = ? AND created_at > ?
WHERE user_id = ? ORDER BY created_at DESC

Weak candidates:

WHERE is_deleted = false
WHERE country = 'IN'
WHERE status = 'active'

Those can still be useful, but only when they filter hard enough or combine with other columns. A boolean column by itself is rarely selective.

Use composite indexes when your queries commonly filter by the same columns together. Put equality filters first, then range filters, then columns used for ordering.

Do not add indexes because a column "seems important." Add them because a real query needs them.

And always check the plan:

EXPLAIN ANALYZE
SELECT *
FROM users
WHERE email = 'me@example.com';

The plan tells you what the database actually did. Not what you hoped it would do.

The real takeaway

An index is not a speed switch. It is another copy of your data, organized for one access pattern.

That makes indexes useful. It also explains why they hurt.

The right index lets the database avoid useless work. The wrong index creates work on every write and may never be used.

When a query is slow, the question is not "should I add an index?" The question is more specific:

What data is this query trying to skip?

If the index helps the database skip most of the table, it will probably help. If it only gives the database a different way to read the same amount of data, it probably won't.


Written by Harshit Sharma. If you want to know when new posts are out, follow me on Twitter.