How Image Compression Actually Works

Harshit Sharma

5 min read

Every image you see on the web is compressed. Your browser just downloaded one, decompressed it, and rendered it before you even thought about it. But what actually happens in between?

I got nerdy about this recently and figured I'd write it up. Turns out image compression is one of those things that sounds complicated but makes a lot of sense once you break it down.

Let's start from the beginning.

What is an image, really?

An image is a grid of pixels. Each pixel is just three numbers: red, green, and blue. Mix those values (0–255 each) and you get any color.

#3B82F6rgb(59, 130, 246)
R
59
G
130
B
246

A 1920×1080 image has about 2 million pixels. Each pixel stores 3 bytes (one per channel). That's roughly 6.2 MB for a single uncompressed frame. A 10-photo gallery would be 62 MB. That's not going to work.

So we compress.

Zoom into any photo and you'll see the grid. From a distance, smooth gradients. Up close, tiny colored squares.

Loading image…
Zoom level

This is the insight compression exploits: we don't actually need all that detail. Our eyes are forgiving.

Lossy compression (JPEG)

JPEG's approach is straightforward: throw away information humans won't notice.

Your eyes are much better at detecting brightness changes than color changes. JPEG uses this. It converts the image from RGB to a different color space, then aggressively compresses the color channels while keeping brightness mostly intact.

Then it breaks the image into 8×8 pixel blocks and runs something called a Discrete Cosine Transform (DCT) on each block. This converts pixel values into frequencies. Think of it like breaking a sound wave into individual notes. High frequencies represent sharp edges and fine detail. Low frequencies represent smooth gradients.

The trick: you can throw away the high-frequency data and the image still looks fine. Crank up the compression, and you start losing more detail. Push it too far, and you get those blocky JPEG artifacts we've all seen.

Loading…
Original6.2 MB
Loading…
JPEG~5.0 MB · 1.2x
Quality
80%

JPEG typically achieves 10–20x compression on photos. A 6.2 MB image becomes 300–600 KB. That's the kind of reduction that makes the web work.

Lossless compression (PNG)

PNG takes a completely different approach: keep every single pixel, but be clever about how you store them.

It uses three techniques stacked on top of each other.

Filtering (prediction)

Adjacent pixels in a photo are usually very similar. A blue sky doesn't jump from rgb(130, 180, 220) to rgb(45, 12, 200) between neighboring pixels.

Instead of storing the actual value of each pixel, PNG stores the difference from the predicted value. If the prediction is close, most differences are tiny numbers: lots of zeros, ones, and twos.

Raw pixel values - adjacent channel data
128
130
132
134
133
135
137
136
138
140

Small, repetitive numbers are way easier to compress than big, random ones.

LZ77 (pattern matching)

Now that we have a stream of mostly small, repeating values, LZ77 looks for repeated patterns.

If the sequence 0, 2, 0, 0 appeared earlier, instead of writing it out again, LZ77 writes a back-reference: "go back 4 positions and copy 4 values." One pointer instead of four numbers.

Input - spot the repeating [0, 2, 0, 0] pattern
0
0
2
1
0
2
0
3
0
4
2
5
0
6
0
7
0
8
2
9
0
10
0
11

Huffman coding

The final trick. In our filtered data, some values show up way more often than others. The value 0 might appear 60% of the time, while 255 might appear 5% of the time.

Fixed-length encoding would use 8 bits for every value regardless. Huffman coding assigns shorter codes to common values and longer codes to rare ones.

Variable-length encoding - common values get shorter codes
060%
01 bit
225%
102 bits
25510%
1103 bits
1275%
1113 bits
Fixed encoding800 bits
Huffman encoding155 bits
Reduction81% smaller

The full pipeline

Stack all three together and you get the PNG compression pipeline:

Uncompressed RGB data - 3 bytes per pixel, stored sequentially

6.2MB

PNG gets 2–3x compression on photos, and up to 50x on graphics with large flat-color areas. It won't match JPEG's ratio on photographs, but every pixel survives the round trip.

The file itself

One thing I found interesting: PNG files have a specific binary structure. They start with an 8-byte signature (the magic bytes 89 50 4E 47; that's how your computer knows it's a PNG), followed by chunks of data: a header chunk with dimensions and color info, data chunks with the compressed pixels, and an end marker.

When to use what

This is the practical part.

JPEG: photos, complex images with lots of gradients. Smaller files, but you lose some detail. Good for the web when file size matters more than pixel-perfect accuracy.

PNG: screenshots, UI elements, graphics with text or sharp edges, anything where you need transparency or exact pixel reproduction.

WebP/AVIF: the newer formats. They do both lossy and lossless, and generally beat JPEG and PNG at the same quality level. If your tooling supports them, use them.

Why this matters

A typical webpage loads 50+ images. The difference between serving uncompressed images and properly compressed ones is the difference between a 300 MB page load and a 3 MB one. That's not an optimization. That's whether your site works at all on a mobile connection.

Every time you save an image, all of this runs behind the scenes. DCT transforms, frequency analysis, pattern matching, Huffman trees, all in milliseconds. I think that's pretty cool.


Written by Harshit Sharma. If you want to know when new posts are out, follow me on Twitter.