Google Unveils New Gemma 4 12B Model

Google has introduced a new multimodal artificial intelligence model called Gemma 4 12B, as reported by the Google Blog.
The model is designed to launch high-performance multimodal capabilities on laptops. It combines efficiency suitable for mobile devices with advanced reasoning abilities.
Gemma 4 12B fills the gap between the company's compact E4B model and the more powerful 26B Mixture of Experts model. The new model offers robust capabilities while requiring relatively little memory.
This is Google's first mid-sized model to support local audio input. According to the company, Gemma 4 models have been downloaded more than 150 million times to date.
A key feature of Gemma 4 12B is its unified architecture. Image and audio data are fed directly into the base large language model without separate multimodal encoders.
The model can handle complex multi-step reasoning and agent workflows. Test results indicate that its performance is close to that of Google's 26B model.

Gemma 4 12B can run locally on standard laptops. It is noted that 16 GB of VRAM or unified memory is sufficient for this purpose.
The model was released openly under the Apache 2.0 license. It is also significant for its broad support within the developer ecosystem.
Gemma 4 12B is equipped with Multi-Token Prediction technology to reduce latency. This feature helps increase the model's response speed.
According to Google, the new architecture improves efficiency in processing images and audio. For example, a lightweight embedding module is used for images, while audio signals are mapped directly into a space with the same dimensions as text tokens.
Thus, Gemma 4 12B is expected to bring modern multimodal artificial intelligence capabilities to everyday devices. The model is designed to run on laptops while maintaining speed and reasoning quality.

















Comments 0
…