Summary

This report benchmarks GPU options for deploying Scope's real-time video diffusion inference pipelines. We evaluate performance, memory fit (OOM risk), and cost trade-offs across multiple resolutions and four pipelines: reward-forcing, longlive, streamdiffusionv2, and krea-realtime-video.

Key Takeaways

H200 SXM is the fastest in every measured case where it runs, typically delivering about 5% - 15% higher FPS than H100 SXM and materially higher FPS than RTX 5090.
H100 SXM is the best “default”: it supports high resolutions reliably (large VRAM), performs close to H200 SXM, and avoids the 5090’s earlier OOM failures.
RTX 5090 offers strong value when workloads fit in 32 GB VRAM, but it hits memory limits at higher resolutions and on memory-heavy pipelines (notably krea-realtime-video).
Provider choice can matter at low resolutions (underutilized GPU): RTX 5090 on TensorDock was 10–30% faster than on RunPod in several low-res tests, while results converge at high resolutions where GPUs saturate.
Based on current results, reward-forcing is the strongest performing pipeline based on FPS: it offers the highest-FPS pipeline across GPUs and resolutions in this benchmark set.

Next Steps to Optimize Performance

Accelerate inference: evaluate TensorRT, torch.compile, and quantization (FP16/INT8 where feasible) to reduce latency and boost throughput.
Lower-res + upscale: run diffusion at a smaller base resolution, then apply an upscaler to reach the target output size—balancing quality against cost.
Increase GPU packing: at low resolutions, GPU utilization often falls below 90% with free VRAM remaining. Test running multiple concurrent streams per GPU to maximize throughput per dollar.

Which GPU should you choose?

Use this as the default selection logic; detailed evidence appears in later sections.

Situation	Recommended default	Why
Lowest cost for workloads that fit (no OOM)	RTX 5090	Best economics when VRAM is sufficient
High resolution and/or memory-heavy pipelines	H100 SXM	Large VRAM headroom and strong throughput
Maximum throughput / lowest latency, cost secondary	H200 SXM	Highest FPS (usually 5% - 15% above H100 SXM)
Need ultra-high resolution where 5090 OOMs, but want cheaper than Hopper	RTX A6000 Ada (situational)	More VRAM than 5090; slower but can run cases where 5090 fails (can skip for the current pipelines)

Benchmark

Methodology

Pipelines tested: