Optical Interconnect: The Bandwidth Constraint

By Ahijah Ireland·March 18, 2025·4 min read

The Data Movement Problem

As AI model architectures grow in parameter count and inference workloads scale across distributed compute clusters, the bottleneck has shifted from compute density to data movement speed. Training a large language model requires not just processing power, but continuous high-bandwidth communication between thousands of GPUs operating in parallel — exchanging gradients, activations, and parameter updates at rates that traditional copper-based interconnects cannot sustain at the required distances and densities.

This is the optical interconnect opportunity. Light moves through fiber faster than electrons move through copper, with lower loss, less heat generation, and immunity to electromagnetic interference. For the long-distance runs connecting racks across a data center floor, optical transceivers have been the standard for years. The new frontier is bringing optical connections closer to the chip — reducing the point at which electrical signals must be converted to light and back again.

The Bandwidth Demand Curve

The scale of interconnect bandwidth required by modern AI training runs is difficult to overstate. NVIDIA's NVLink switching fabric — used to connect GPUs within a node — delivers 900 gigabytes per second of bidirectional bandwidth per GPU in the H100 generation. When scaling beyond a single node, the inter-node fabric must sustain comparable bandwidths to prevent communication bottlenecks from degrading GPU utilization.

InfiniBand, the dominant inter-node fabric in AI training clusters, has evolved through successive generations to reach HDR (200Gb/s) and NDR (400Gb/s) speeds. Ethernet-based alternatives, particularly RoCE (RDMA over Converged Ethernet), are gaining adoption as hyperscalers build proprietary networking fabrics optimized for their specific workloads.

The common thread across all these architectures is optical transceivers: the components that convert electrical signals to light at the transmission end and back to electrical signals at the receiving end. Every switch port, every server uplink, every inter-rack connection in a large AI cluster requires a transceiver. A 10,000-GPU cluster can require over 50,000 optical transceivers. At scale, transceiver procurement becomes a significant line item in cluster construction budgets.

Co-Packaged Optics: The Next Architecture

The industry is converging on co-packaged optics (CPO) as the long-term solution to the bandwidth problem. In CPO architecture, the optical components are packaged directly with the switching silicon, eliminating the copper trace runs between the switch ASIC and the external transceiver. This reduces signal loss, lowers power consumption, and enables higher port speeds than are achievable with pluggable transceiver designs.

CPO is not a small incremental improvement — it is a fundamental change in how networking hardware is designed and manufactured. The transition from pluggable transceivers to CPO will require new manufacturing processes, new supply chain relationships, and new qualification programs. It is a multi-year technology transition that creates both disruption risk for incumbent pluggable transceiver suppliers and opportunity for companies positioned to enable CPO adoption.

Intel, Marvell, and Broadcom are all developing CPO platforms. TSMC is building manufacturing capacity for the advanced silicon photonics processes required for integrated optical components. The supply chain for CPO includes traditional semiconductor companies, optical component makers, and advanced packaging providers.

Transceiver Market Dynamics

The near-term opportunity in optical interconnect is concentrated in the pluggable transceiver market, which will remain the dominant architecture until CPO transitions at scale in the late 2020s. The transceiver market is served by a mix of vertically integrated companies and fabless designers.

Key dynamics to monitor include: the migration from 100G and 400G interfaces toward 800G and beyond (driven by switch ASIC bandwidth growth), the ongoing qualification programs at hyperscalers for next-generation transceivers, and the competitive dynamics between traditional optical component suppliers and newer fabless entrants.

The investment thesis in this space is strongest for companies with: deep relationships with hyperscaler procurement teams, validated products on current-generation switch platforms, and manufacturing scale advantages that newer entrants cannot quickly replicate. We also give significant weight to gross margin trajectory in constrained supply environments — optical transceiver suppliers with allocated capacity can sustain margins that are not achievable in commodity volumes.

Framework Summary

We view optical interconnect as a durable, multi-year spend category driven by non-discretionary bandwidth requirements. The structural thesis does not require a specific outcome from any single company or product cycle — it follows from the physics of high-bandwidth distributed computing, which is not going away. Our framework prioritizes qualification status, customer concentration in hyperscale AI accounts, and the pace of technology transitions in the transceiver ecosystem.

Topics

Market AnalysisNetworkingTechnologySemiconductors

Download PDF