ResearchMarket Analysis
Market Analysis

Capacity vs. Demand: The Mathematics of AI Compute Scarcity

By Ahijah Ireland·August 27, 2025·5 min read
Share:
Capacity vs. Demand: The Mathematics of AI Compute Scarcity

The Scarcity Question

Every major capital cycle eventually confronts the scarcity question: how long does the supply shortage last, and when does supply catch up to demand? Getting this question right is the difference between a well-timed position and an overstayed one.

For AI compute infrastructure, the scarcity question requires thinking about supply and demand as mathematical systems operating on different timescales. Supply adds capacity in discrete, large increments — new GPU chip manufacturing capacity requires years to plan, build, and validate. Demand grows continuously as AI research programs, enterprise deployments, and inference workloads multiply. The gap between these two growth curves determines the duration and magnitude of the investment opportunity.

The Supply Side: Why Capacity Additions Are Slow

GPU manufacturing is not a factory floor that can be scaled by adding production workers. It is an extraordinarily capital-intensive, technically complex process that depends on:

Semiconductor fabrication capacity: AI accelerators are manufactured on leading-edge process nodes — currently 4nm and 3nm. The global leading-edge semiconductor fabrication capacity is concentrated in a small number of facilities, primarily operated by TSMC. New leading-edge fabrication facilities cost $10 to $20 billion to build and take 3 to 5 years from groundbreaking to full commercial production.

CoWoS packaging capacity: Advanced AI chips require chip-on-wafer-on-substrate (CoWoS) advanced packaging — a manufacturing step that connects the GPU die to the HBM memory. This packaging step is a separate supply chain bottleneck. CoWoS capacity has been a gating factor on GPU production even when raw silicon capacity was available.

HBM manufacturing: As discussed in prior GZC research, HBM production is concentrated among three manufacturers with limited ability to rapidly expand capacity. GPU production is constrained not just by chip fabrication but by the availability of HBM to pair with it.

Each of these supply chain elements has a 2-to-5 year capacity addition timeline. This means that supply cannot respond to demand signals on a short horizon. Even if GPU demand doubled tomorrow, supply cannot double in response — the physical manufacturing infrastructure does not exist, and building it takes years.

The Demand Side: Why Growth Continues

AI workload demand has several structural characteristics that support sustained growth independent of near-term business cycle dynamics:

Model scaling: AI research has consistently demonstrated that larger models trained on more data produce better capabilities. This creates an academic and competitive pressure to scale training compute continuously, independent of the business cycle. Research programs at major AI labs are not paused when stock markets decline.

Inference multiplication: For every AI model trained on expensive GPU clusters, many instances of that model run in inference — serving individual requests from users, applications, and automated systems. Inference workload is growing faster than training workload as AI capabilities are deployed in products. Each inference server is less GPU-intensive than a training server, but the number of inference servers is growing rapidly.

Enterprise adoption lag: Enterprise adoption of AI has, by most assessments, barely begun. The large-scale deployment of AI in enterprise workflows — in financial services, healthcare, industrial automation, and consumer applications — represents a multi-year demand driver that will sustain GPU procurement independently of the research and hyperscaler demand that has driven the current cycle.

The Gap and Its Duration

The mathematics of the supply-demand gap in AI compute is structurally asymmetric: supply can grow only at the pace of semiconductor manufacturing capacity additions (measured in years), while demand grows continuously from multiple independent drivers.

Our assessment is that the compute scarcity dynamic is likely to persist for longer than the consensus expects — not permanently, but for a duration measured in years rather than quarters. The supply chain investments required to close the gap are being made, but they cannot be completed instantaneously.

This duration is the foundation of the multi-year investment thesis in AI infrastructure supply chain companies. The GPU manufacturer, the HBM supplier, the advanced packaging operator, the power delivery equipment company — these businesses are serving a scarcity condition with a multi-year timeline to resolution. The revenue visibility, margin durability, and pricing power they command during this period are the investment characteristics BTT analysis identifies as most valuable.

When Scarcity Resolves

The scarcity condition resolves when supply capacity additions catch up to demand growth. The signals that this is happening: GPU allocation becomes unconstrained (no waiting lists for hyperscaler orders), HBM pricing normalizes to historical levels, advanced packaging lead times return to pre-AI-buildout norms.

None of these signals are present as of mid-2025. We monitor them continuously. When they appear, they will signal a shift in the supply-demand dynamic that will require us to reassess the duration and magnitude of the positions we hold.

Until then, the mathematics of constrained supply meeting sustained demand produce the conditions we identified through BTT analysis — and the forced-spend characteristics that make these positions compelling to hold.

Topics
Market AnalysisAI ComputeSupply and DemandGPU InfrastructureCapital Cycle
Share: