Autocomplete Is Just a Tree (Until It Isn't)

Harshit Sharma

6 min read

You type "ho" into a search bar. A dropdown appears: "home", "hotel", "how to". It feels instant, almost trivial.

I thought so too. Then I tried building one.

The first version took an afternoon. The version that actually felt good took weeks. The gap between "returns results" and "returns the right results in the right order" is where all the complexity lives.

This post assumes you're already comfortable with basic data structures. The trie is the easy part. The interesting part starts once the trie hands you a pile of candidates.

Start with a trie

A trie is a tree where each node is a character, and paths from root to leaf spell out words. To find all words starting with "ho", you walk down h -> o, then collect everything below.

class TrieNode {
  children: Map<string, TrieNode> = new Map()
  isEnd: boolean = false
  word: string = ""
}
 
class Trie {
  root = new TrieNode()
 
  insert(word: string) {
    let node = this.root
    for (const ch of word) {
      if (!node.children.has(ch)) {
        node.children.set(ch, new TrieNode())
      }
      node = node.children.get(ch)!
    }
    node.isEnd = true
    node.word = word
  }
 
  search(prefix: string): string[] {
    let node = this.root
    for (const ch of prefix) {
      if (!node.children.has(ch)) return []
      node = node.children.get(ch)!
    }
    // collect all words below this node
    const results: string[] = []
    const stack = [node]
    while (stack.length > 0) {
      const current = stack.pop()!
      if (current.isEnd) results.push(current.word)
      for (const child of current.children.values()) {
        stack.push(child)
      }
    }
    return results
  }
}
 
const trie = new Trie()
trie.insert("home")
trie.insert("hotel")
trie.insert("how")
trie.search("ho") // ["home", "hotel", "how"]
Type a prefix. The highlighted path shows which nodes the trie walks.
5 matches
hadtveitomepetelw
homehothotelhowhope

This is clean. Prefix lookup is O(k) where k is the length of your query, regardless of how many words are in the trie. For a dictionary of 100,000 words, you're walking maybe 3-4 nodes instead of scanning the whole list.

Problem solved?

Not even close.

The ranking problem

The trie gives you all matches. It doesn't tell you which ones matter.

Type "ja" into a search bar. The trie returns: "java", "javascript", "jacuzzi", "jackfruit", "jambalaya". All valid prefix matches. But the user almost certainly wants "javascript" or "java" first, not "jackfruit".

The trie has no concept of relevance. It's a prefix lookup structure, not a ranking engine. To make autocomplete feel good, you need to sort results by something:

  • Frequency (how often is this term searched?)
  • Recency (did the user type this recently?)
  • Popularity (what do most users search for?)
  • Context (what page are they on right now?)
Alphabetical order. The trie returns this by default. Not useful.
query: "ja"
1jacket
2jackfruit
3jacuzzi
4jambalaya
5january
6jar
7java
8javascript

None of this lives in the trie. You either store scores on each node and sort at query time, or you maintain a separate ranking index. Either way, the trie is now just step one of a pipeline.

type Candidate = {
  text: string
  frequency: number
  daysSinceLastUse: number
}
 
function score(candidate: Candidate, query: string) {
  const prefixBonus = candidate.text.startsWith(query) ? 40 : 0
  const frequencyScore = Math.log10(candidate.frequency + 1) * 10
  const recencyScore = Math.max(0, 14 - candidate.daysSinceLastUse)
  return prefixBonus + frequencyScore + recencyScore
}

That scoring function is crude, but that's the shape of the problem. Relevance is usually a weighted mix of signals, and the hard part is choosing weights that feel obvious to users.

Fuzzy matching

Users make typos. They type "javscript" instead of "javascript". A trie won't find it because the prefix doesn't match any path.

Strict prefix matching is fast but brittle. Real autocomplete needs to handle:

  • Typos ("recieve" -> "receive")
  • Partial matches ("js" -> "javascript")
  • Substring matches ("script" -> "javascript", "typescript")

Each of these breaks the trie model. A trie only finds words that start with your exact query. It can't find "javascript" from "script" because "script" isn't a prefix of "javascript".

The common fixes:

Edit distance (Levenshtein). For each word in your dictionary, compute how many character edits separate it from the query. This works but is expensive: checking every word against every query is O(n * k) where n is dictionary size.

n-grams. Break words into chunks ("javascript" -> "jav", "ava", "vas", "asc"...) and index the chunks. Query "jvascript" shares most of the same chunks. Fast lookup via inverted index, but the index gets large.

BK-trees. A tree structure optimized for edit distance queries. Insert words by edit distance from each other, then prune entire branches during search. Better than brute force but still not instant for large dictionaries.

Exact prefix match. Typos return nothing.
No prefix matches. Try fuzzy mode.

The more flexible your matching, the slower it gets. And the more results you return, the more important ranking becomes. These problems compound.

The latency wall

Autocomplete has to feel instant. Once you're much past about 100ms, the dropdown starts to feel late. Past 300ms, users stop trusting it.

On the client, a trie with 10,000 words is fine. Query in microseconds. But what if your dictionary is 10 million products in a database? You can't ship a 10-million-node trie to the browser.

So you move it to the server. Now every keystroke is a network round trip. At 50ms per request, you're already using half your budget just on latency, before the server even starts searching.

The fixes, in escalating complexity:

Debouncing. Don't query on every keystroke. Wait until the user pauses for 150-300ms. Cuts requests dramatically but adds perceived delay.

Prefetching. After "h", fetch results for "ha", "he", "hi", "ho", "hu" in the background. When the user types the next character, results are already cached. Uses more bandwidth but feels instant.

Hybrid. Ship a small trie to the client (top 1,000 most common completions) for instant results. Fall back to the server for long-tail queries. Most keystrokes never hit the network.

In practice, production systems usually mix at least two of these. One trick handles bandwidth. Another handles perceived latency. You rarely get both for free.

Beyond text: structured autocomplete

So far we've been matching strings. But real autocomplete often returns structured data.

Search for "new york" on a travel site. The dropdown shows:

  • New York, NY (city)
  • New York JFK (airport)
  • New York Hilton (hotel)
  • "new york" (search query)

These aren't just strings. They're different types of results from different data sources, ranked together in one dropdown. The autocomplete is querying a city database, an airport database, a hotel database, and a search history, then merging and ranking the results.

This is where tries stop being useful entirely. You're not doing prefix matching anymore. You're doing federated search across multiple indices with type-aware ranking. The data structure doesn't matter. The system architecture matters.

What real autocomplete looks like

The gap between a trie demo and a production autocomplete:

Trie demoProduction
Exact prefix matchFuzzy, substring, typo-tolerant
Returns all matchesReturns top 5-10, ranked
In-memory, client-sideClient + server hybrid
One data sourceFederated across multiple sources
No personalizationWeighted by user history
Simple stringsStructured results with metadata

A trie is still in there, somewhere. But it's one piece of a pipeline that includes ranking, fuzzy matching, caching, debouncing, prefetching, and often machine learning for personalization.

The real takeaway

Autocomplete is one of those features that looks finished when you build the data structure. The trie works, results appear, ship it.

Then users type "javasript" and get nothing. They type "p" and see "pterodactyl" before "python". They type fast and the dropdown flickers.

The data structure is the easy part. The hard part is everything around it: ranking, fuzzy matching, latency hiding, and figuring out what the user actually meant from two characters.


Written by Harshit Sharma. If you want to know when new posts are out, follow me on Twitter.