GPU Encoding Is Not What You Think It Is

Harshit SharmaApril 8, 2026

9 min read

Every time you stream on Twitch, record a gameplay clip, or export a video from your editor, something has to compress those raw frames into a file small enough to actually send over a network. That something is a video encoder.

There are two ways to encode video: on your CPU or on your GPU. Most people assume "GPU encoding" means your graphics card crunches the video using its thousands of cores. It doesn't. The truth is weirder and more interesting than that.

Let me explain.

The codecs: H.264, H.265, and AV1

Before we talk about hardware, we need to talk about the formats. A codec is a set of rules for compressing and decompressing video. Think of it as a language that the encoder speaks and the decoder understands.

H.264 (AVC) came out in 2003 and basically runs the internet. YouTube, Netflix, Zoom, security cameras, everything uses it or has used it. It's the lingua franca of video. Nearly every device on the planet can decode H.264.

H.265 (HEVC) showed up in 2013 as the successor. Same visual quality at roughly 40% less bitrate. A 5 Mbps H.264 stream looks the same as a 3 Mbps H.265 stream. The catch? Complicated patent licensing that scared away a lot of the industry.

AV1 arrived in 2018 as the royalty-free answer. Developed by the Alliance for Open Media (Google, Mozilla, Netflix, Amazon, and others). Even better compression than H.265, and no licensing fees. The tradeoff is that it's brutally slow to encode in software.

Codec efficiency

Same visual quality, different file sizes. Newer codecs need fewer bits.

Target resolution

H.264 / AVC2003

5.0 Mbps

H.265 / HEVC2013

3.0 Mbps

AV12018

2.2 Mbps

At 1080p, H.265 saves ~40% bandwidth vs H.264. AV1 saves ~56%. Same visual quality, fewer bits on the wire.

All three codecs use the same fundamental techniques, the ones I covered in the HLS post: keyframes, inter-frame prediction, transform coding, entropy coding. The newer codecs just have more tools in the toolbox. Larger block sizes, more prediction modes. More options means better compression, but also more decisions the encoder has to make per frame.

How encoding actually works

Let me walk through what happens when an encoder processes a single frame, because this matters for understanding why hardware and software encoders are different.

Every encoder, regardless of whether it runs on a CPU or GPU, follows the same pipeline:

Motion estimation. Look at nearby frames and find blocks that match. This is the heaviest step. For every 16x16 block in the current frame, the encoder searches surrounding frames for similar blocks. That's thousands of comparisons per frame.
Mode decision. For each block, decide the best way to encode it. Should it reference a previous frame (inter prediction)? Encode it standalone (intra prediction)? Split it into smaller sub-blocks? The encoder evaluates multiple options and picks the cheapest one.
Transform and quantize. Take the residual (the difference between the predicted block and the actual block), run a frequency transform (DCT), then throw away detail the human eye won't notice. This is where the quality-vs-size tradeoff is controlled.
Entropy coding. Compress the final data using variable-length codes. Frequent patterns get short codes, rare patterns get long codes. H.264 uses CABAC or CAVLC. H.265 uses CABAC exclusively.
Bitstream output. Pack everything into the final format with headers, parameter sets, and NAL units.

Encoding pipeline

Same steps, different hardware. Toggle to compare.

CPU (x264 software)

~100msper frame

Approximate times for a 1080p frame. NVENC runs all steps on dedicated silicon in a single pass.

A software encoder like x264 runs all of these steps on your CPU. Each step is optimized with hand-written SIMD assembly (SSE, AVX) to squeeze every bit of performance out of your processor. And it does this really well. The problem? It's still running on general-purpose hardware. Your CPU can do anything, but it's not built specifically for this.

The GPU approach: NVENC

Here's where most people get confused.

When someone says "GPU encoding," they picture their RTX 4070's thousands of CUDA cores crunching through video frames the way they crunch through game physics. That's not what happens.

NVIDIA GPUs have a completely separate chip on the die called NVENC. It's an ASIC, an application-specific integrated circuit. A fixed-function block of silicon that does exactly one thing: encode video. It can't run shaders. It can't do ray tracing. It just encodes video, full stop.

GPU die layout

NVENC is a tiny dedicated block on the chip. Click to explore.

NVENC (Hardware Encoder)

A fixed-function ASIC block. Purpose-built silicon that does one thing: encode video. Runs the full pipeline in hardware. Uses ~5% of the die but handles encoding 10-50x faster than CUDA cores could.

Not to scale. NVENC occupies a small fraction of total die area but handles the entire encode workload independently.

This is the key insight: NVENC doesn't use your CUDA cores at all. When you encode with NVENC, your GPU is completely free to do other things. You can game at full frame rate while NVENC records the gameplay in the background. The two workloads don't compete because they're running on physically different parts of the chip.

The NVENC ASIC has its own dedicated circuitry for motion estimation, mode decision, transform, quantization, and entropy coding. The entire encoding pipeline runs in hardware, in a single pass, on purpose-built silicon. That's why it's fast. It's not doing general-purpose math really quickly. It's doing video encoding on hardware designed for nothing else.

How NVENC evolved

NVENC first appeared on Kepler GPUs in 2012, and honestly, it was rough. The quality was noticeably worse than CPU encoding. People used it for quick captures but not for anything they cared about.

Each GPU generation improved the encoder silicon. Turing (RTX 20-series) in 2018 was the turning point: B-frame support for H.264, improved lookahead, and quality that genuinely rivaled x264 on medium preset. That was the generation where NVENC went from "good enough for streaming" to "actually good."

Ada Lovelace (RTX 40-series) added hardware AV1 encoding, a massive deal. For the first time you could encode the most efficient royalty-free codec at hundreds of frames per second. And Blackwell (RTX 50-series) pushed further with 4:2:2 chroma subsampling and up to four NVENC engines per chip.

NVENC through the years

Each GPU generation improved the encoder silicon. Click to explore.

Ada Lovelace8th gen · RTX 4000

H.264H.265AV1

AV1 hardware encoding
Dual NVENC engines
8K encoding support

AV1 hardware encoding. The royalty-free future.

Software vs hardware: the real tradeoff

Alright, so if NVENC is so fast, why does anyone still use CPU encoding?

Quality per bit. At the same bitrate, a software encoder like x264 on its slow preset produces better-looking video than NVENC. The CPU encoder can afford to spend more time evaluating options. It tries hundreds of block partition modes, runs multiple reference frames, uses advanced rate-distortion optimization. NVENC's fixed-function hardware makes faster decisions, but they're not always the optimal ones.

But speed. x264 on slow at 1080p might encode at 15-20 fps on a modern CPU. That's slower than realtime for 30fps video. NVENC encodes the same content at 300-400+ fps. That's not a small difference. That's an order of magnitude.

Encoding benchmark

1080p 10-min clip. Ryzen 7700X + RTX 4070.

GPUNVENC H.264

420 fps

GPUNVENC HEVC

380 fps

GPUNVENC AV1

310 fps

CPUx264 veryfast

140 fps

CPUx265 veryfast

60 fps

CPUx264 medium

55 fps

CPUx265 medium

22 fps

CPUx264 slow

18 fps

30fps realtime ↓

FPS = frames encoded per second. Above 30 means faster than realtime for 30fps video. Quality is normalized VMAF.

The practical question is always: do you have time to wait?

Live streaming? NVENC. You literally cannot drop frames. The encoder must be faster than realtime, and your CPU needs headroom for the game.
Screen recording? NVENC. No visible quality difference for gameplay footage, and zero CPU impact.
YouTube upload? Depends. NVENC HEVC or AV1 is fine for most creators. If you're a video production house encoding final deliverables, x265 on a slow preset will squeeze more quality out of every kilobit.
Archival encoding? x265 slow, or SVT-AV1 if you want royalty-free. Let it run overnight. The file size savings compound over years of storage.

Speed vs quality

GPU encodes are fast. CPU encodes look better per bit. Hover to compare.

High quality / Slow

High quality / Fast

Low quality / Slow

Low quality / Fast

ideal

SlowerSpeedFaster

The FFmpeg commands

Every encoder mentioned above is accessible through FFmpeg.

CPU encoding with x264 (H.264):

ffmpeg -i input.mp4 -c:v libx264 -preset medium -crf 23 \
  -c:a aac -b:a 128k output_h264.mp4

-crf 23 is constant rate factor: lower means higher quality, bigger file. 18-23 is the sweet spot. -preset medium balances speed and quality. Options range from ultrafast to veryslow.

CPU encoding with x265 (H.265):

ffmpeg -i input.mp4 -c:v libx265 -preset medium -crf 28 \
  -c:a aac -b:a 128k output_h265.mp4

CRF scale is different for x265. CRF 28 in x265 roughly matches CRF 23 in x264 visually, but at a lower bitrate.

GPU encoding with NVENC H.264:

ffmpeg -i input.mp4 -c:v h264_nvenc -preset p5 -rc vbr \
  -cq 23 -b:v 0 -c:a aac -b:a 128k output_nvenc_h264.mp4

-preset p5 is NVENC's equivalent of medium quality (presets go p1 through p7). -rc vbr -cq 23 uses variable bitrate with a quality target.

GPU encoding with NVENC HEVC:

ffmpeg -i input.mp4 -c:v hevc_nvenc -preset p5 -rc vbr \
  -cq 28 -b:v 0 -c:a aac -b:a 128k output_nvenc_hevc.mp4

GPU encoding with NVENC AV1 (RTX 40-series+):

ffmpeg -i input.mp4 -c:v av1_nvenc -preset p5 -rc vbr \
  -cq 30 -b:v 0 -c:a aac -b:a 128k output_nvenc_av1.mp4

The pattern is always the same: swap the codec name, adjust the quality number, and FFmpeg handles the rest. The NVENC variants will finish 10-20x faster than their CPU counterparts.

Bitrate, resolution, and file size

Different codecs need different bitrates to achieve the same visual quality. And the relationship between resolution and bitrate isn't linear. Doubling the pixel count roughly doubles the bitrate, but higher resolutions also benefit more from newer codecs because they have more spatial data to exploit.

Bitrate calculator

Pick your scenario. See how codec choice changes file size.

Resolution

Use case

Duration10 min

1m30m60m

H.264

6.0 Mbps450 MB

H.265

4.0 Mbps300 MB

AV1

3.0 Mbps225 MB

H.265 saves

150 MB

vs H.264

AV1 saves

225 MB

vs H.264

File sizes assume constant bitrate. Real-world VBR encoding produces smaller files for static scenes.

This is why codec choice matters more at higher resolutions. At 720p, the difference between H.264 and AV1 is maybe 40%. At 4K, it's the difference between a file you can stream on mobile and one that needs a fiber connection.

The bigger picture

Hardware encoding keeps getting better. NVENC AV1 on Ada Lovelace already matches or beats x264 medium in quality per bit while running 15x faster. Each generation narrows the gap.

The codecs themselves keep improving too. The move from H.264 to H.265 saved roughly 40% bandwidth. H.265 to AV1 saves another 25-30%. That compounds. A 4K stream that needed 15 Mbps with H.264 needs about 5 Mbps with AV1. That's the difference between requiring fiber and working on a decent mobile connection.

Every time you watch a video, an encoder somewhere made thousands of decisions per frame about how to compress it. Whether that was a CPU grinding through x265 overnight or an NVENC ASIC spitting out frames in microseconds, the point was the same: make the file small enough to send without you noticing the compression.

A sliver of dedicated silicon on your GPU doing this hundreds of times per second while the rest of the chip runs a game at full frame rate. That's a neat trick.

¹ NVENC quality comparisons are based on Turing and later architectures. Earlier generations (Kepler, Maxwell, Pascal) had noticeably worse quality per bit compared to software encoders.

² VMAF (Video Multi-Method Assessment Fusion) is Netflix's perceptual quality metric, scored 0-100. It correlates better with human perception than PSNR or SSIM. A VMAF of 93+ is generally considered "transparent" (indistinguishable from the source).

³ AV1 software encoding with libaom is extremely slow. SVT-AV1 (developed by Intel and Netflix) is significantly faster while maintaining competitive quality. It's the recommended AV1 encoder for most use cases.

⁴ The "dual NVENC" feature on RTX 40-series GPUs means two independent encoder blocks can work in parallel. FFmpeg can split the input across both using -2pass or separate instances, roughly doubling throughput.

Written by Harshit Sharma. If you want to know when new posts are out, follow me on Twitter.