13:30 / 27.03.2026

Google introduced a more compact AI method

Google has introduced a new algorithm called TurboQuant that can reduce the memory use of large language models by as much as six times. According to the company, the method preserves accuracy and does not noticeably hurt performance, which could make AI systems cheaper and easier to run. Reports Tech.onliner.by.

The main target of TurboQuant is the cache that language models rely on during conversations. This cache stores important data so the system does not have to repeat the same calculations. But as a user’s dialogue grows longer, the cache also expands, which can slow responses and increase hardware demands.

Google said TurboQuant works in several stages by compressing stored data and correcting errors introduced during that process. The algorithm is designed to reduce memory pressure and computing costs at the same time. Another key point is that it can be applied to existing models without additional training.

The development could be especially useful for AI tools running on smartphones and other devices with limited resources. If widely adopted, TurboQuant may help lower operating costs for AI services while making advanced models more practical on smaller and less powerful hardware.