NVIDIA unveils GPU for long-context AI inference

16:28 / 10.09.2025·312·Technology

NVIDIA has announced a new GPU, the GB200, designed to handle long-context inference in AI, TechCrunch reported. It is built specifically for large language models (LLMs) and generative AI workloads.

The GB200 can process up to 1 million tokens in a single prompt. This allows AI models to maintain long conversational history, work with extensive documents, and generate code more effectively — all without losing context.

The chip is based on NVIDIA’s GH200 Grace Hopper Superchip architecture, enabling better performance, energy efficiency, and scalability for LLMs. Servers powered by the GB200 are expected to hit the market in 2026.