How Google’s “turboquant” caused a mass sell-off in the semiconductor space

Charlie Youlden, March 27, 2026

The mass sell-off in the semiconductor space

The semiconductor sector came under pressure, with Nvidia down 2.4%, AMD falling 5.5%, TSMC down 4.5%, and Micron dropping 4.6%, leaving it down 14% over the past five trading days. Western Digital fell 4.7%, and Sandisk dropped 8%.

The selloff was driven by Google’s recent TurboQuant announcement, which uses vector quantisation. In simple terms, it is an algorithm designed to compress massive data sets more efficiently, reducing the amount of memory needed to store and run large AI models.

That immediately raised the key question for investors. If models can be compressed more efficiently, does that mean companies will need less memory, or does it simply free up capacity for even larger workloads elsewhere? That is especially important for a company like Micron, where memory demand is already forecast to remain strong over the next few years.

What are the Best ASX Stocks to invest in right now?

Check our buy/sell tips

Understanding AI models

AI models are built on extremely large data sets. Modern large language models are trained on enormous volumes of information, often reaching hundreds of terabytes or even petabytes. That gives you a sense of how much memory is required to train, store, and run these systems.

These models work with vectors, where information is stored as long lists of numbers. The larger and richer those lists are, the more the model can learn and understand. The problem is that handling all of that raw numerical data consumes a huge amount of memory, which is one reason DRAM and NAND pricing have risen so significantly, upwards of 300%.

What did Google build, and why is it important

Vector quantisation is basically a way of compressing those lists. Instead of storing a number with full precision, you round it into a simpler form that takes up less space. You lose a small amount of precision, but you save a lot of memory.

The clever part of Google’s approach is that older compression methods often needed a separate set of instructions explaining how each block of data had been compressed, which reduced the benefit. TurboQuant appears to remove much of that overhead, which means AI systems can keep larger working memory or scratchpads while using less physical memory.

For the memory sector, that does not automatically mean demand disappears. More efficient compression can reduce the memory needed for a given workload, but it can also make AI cheaper and faster to run, which may lead to even more usage. That is why the real question is not whether memory demand goes away, but whether efficiency gains slow the pace of growth enough to change the current supply-demand story.

This is what investors need to know

Our niche is the semiconductor industry, and the way we see it, the memory backlog is a demand problem, not just a compression problem.

What we mean by that, in Nvidia’s case, as its GPUs become more powerful and more efficient, usage does not fall; it usually surges. We have seen that pattern repeatedly across semiconductors. When computing becomes cheaper, faster, and more capable, demand tends to expand rather than contract. Efficiency gains in resource use usually drive more consumption, not less.

That is why we do not think TurboQuant changes the core demand story for memory. It is a valuable improvement in AI and memory efficiency, but it does not change the bigger direction of travel. Models are getting larger, context windows are expanding from 8K to 128K to 1M tokens, and the number of concurrent users is growing rapidly. The memory wall keeps moving.

More usage also means more data is being created, stored, and processed, which means more memory is still required over time. So while compression helps, it mainly improves how efficiently that memory is used rather than eliminating the need for it.

For inference at scale, especially when a model is being run for millions of users at the same time, KV cache memory is already one of the real bottlenecks. This is not just theoretical. Small reductions in memory use per operation can create meaningful savings and throughput improvements for hyperscalers like Google.

That is why we think about this less as a threat to memory demand and more as a commercial efficiency gain. Companies will likely run more efficient models, but as models improve and usage rises, total demand for memory can still keep growing. In our view, this is more about lowering cost per query than it is about weakening the long-term demand outlook for memory itself.