Faster than GPUs: Cerebras launches record-breaking Kimi K2.6 model

Following the largest tech IPO of 2026, California-based Cerebras Systems has taken a bold step toward leadership in the AI inference market. The chipmaker introduced the one-trillion-parameter Kimi K2.6 open model, developed by Beijing-based Moonshot AI, for enterprise clients. On Cerebras hardware, this neural network reached speeds of nearly 1000 tokens per second, a result unattainable for any existing GPU-based solution. This is reported by Ixbt.com reports .
An independent audit by research firm Artificial Analysis recorded a result of 981 tokens per second. This showed the Cerebras platform to be 6.7 times faster than the nearest GPU cloud provider and 23 times faster than the market average. In practice, a request to write code consisting of 10,000 tokens took just 5.6 seconds on the Cerebras system, whereas the official Kimi cloud service took 163.7 seconds.
The Kimi K2.6 model is based on a Mixture-of-Experts (MoE) architecture, where 32 billion of its 1 trillion parameters are activated per pass. On the prestigious SWE-Bench Pro benchmark, the model scored 58.6 points, surpassing Claude Opus 4.6 and matching GPT-5.4. For companies, this open algorithm is becoming a cost-effective and efficient alternative to the expensive closed APIs of Anthropic and OpenAI.
Cerebras' phenomenal speed is due to its unique hardware approach. While traditional inference runs on dozens of separate GPUs, the Cerebras Wafer-Scale Engine 3 processor is a monolithic chip. It contains 44 gigabytes of ultra-fast SRAM memory, and the on-chip network bandwidth is 200 times higher than competitors' NVLink interfaces. Currently, the innovation is offered only as a private cloud service for large Fortune 500 corporations.











