Jun. 18 at 5:27 PM
$AMZN in talks to sell its Trainium AI chips & racks directly to 3rd parties for use in their data centers
This allows Amazon to target international markets seeking "sovereign AI" solutions that sit completely outside the standard AWS cloud framework.
--- If 2025/2026 was the year of
$GOOGL TPU then 2026/2027 will be the year of Trainium
Why??
Modern LLMs (like GPT-4 & Claude) use Mixture of Experts (MoE) architectures. Instead of running the whole model, a prompt gets dynamically routed to specialized "expert" sub-networks on the fly. This requires massive, unpredictable, instant data routing across chips.
Google's TPU uses a point-to-point "torus mesh" topology. If Chip A needs to send data to Chip Z, that data has to hop sequentially thru Chips B, C, & D - creates a bottleneck
AWS built Trainium 3 on a centralized, all-to-all switched network (NeuronLink v2). Every single chip can talk directly to every other chip w/ zero intermediate hops
Note: Globally, only
$NVDA (via NVLink) & Amazon Trainium utilize this specialized switched scaleup network
This is why Amazon is seeing a higher ROI on its AI cap-ex than other hyperscalers
TBF Google is addressing this w/ TPU 8i & select enterprise cloud accs will get initial cluster access to test in late FY26 w/ broad volume availability expected ramps into 2027