AI is routinely described as a software revolution. Economically, that framing is wrong.
The modern LLM is the visible interface on top of a capital-intensive industrial supply chain: power generation, advanced lithography, foundry capacity, high-bandwidth memory, GPUs, data-center infrastructure, model training and, finally, inference (Inference is when you give an AI model a new input and it uses what it learned during training to produce an output). The result is a product that looks like software to the user but behaves like a utility or telecom network in its cost structure.
AI is not a software business. It is a supply chain with a software interface and profit pools follow the constraints of that supply chain.
Today, the most attractive economics sit in scarce upstream bottlenecks: Nvidia, SK Hynix, TSMC, advanced packaging, power equipment. The weakest economics sit closest to the user: foundation models burning billions per quarter and AI applications running at 25â60% gross margins rather than the 80â90% that defined traditional SaaS.
For two decades, software commanded extraordinary valuations because code, talent, and distribution were scarce. AI removes that scarcity.
This creates a transitional subsidy system. Ratepayers absorb grid expansion. Device buyers absorb memory shortages. Hyperscaler shareholders and bondholders absorb data-center capex approaching $700 billion in 2026. Venture investors absorb model-layer losses. Enterprises and consumers enjoy subsidized AI until pricing catches up with cost.
Why This Time Is Different To SaaS
The economic model of AI diverges fundamentally from traditional SaaS, and that distinction matters for valuation and strategy. SaaS benefited from near-zero marginal costs, where each additional user expanded margins and reinforced operating leverage. AI, by contrast, reintroduces a real cost of goods sold through inference, meaning that scale can pressure margins rather than expand them.
Where SaaS built moats through distribution and customer acquisition, AI moats are increasingly defined by access to compute, proprietary data, and embedded workflow integration. The result is a shift from software as a high-margin, asset-light business to AI as a capital-intensive, throughput-constrained system where profitability depends less on scaling users and more on controlling costs and owning the right layer of the stack.
The AI Margin Stack

Source: Company filings, IEA, Bessemer Venture Partners, ICONIQ Capital. Reference Capital compilation, April 2026.
ENERGY: THE HIDDEN SUBSIDY LAYER
Every token begins with electricity.
The IEAâs April 2026 update found that data-center electricity demand surged 17% in 2025 alone, far outpacing global electricity demand growth of roughly 3%.š Under its base case, global data-center consumption is projected to double to approximately 945 TWh by 2030, with AI-focused facilities tripling their power draw over the same period. In the United States, data centers are expected to consume more electricity than all energy-intensive manufacturing combined, aluminum, steel, cement and chemicals, by the end of the decade.

The margin profile at this layer is unremarkable for the utility itself: regulated returns typically sit in the mid-to-high single digits. But the economic impact is large because grid cost is often borne by all citizens. When a hyperscale campus requires new transmission capacity, that cost flows into the rate bases and household bills rather than being borne solely by the AI user.
The long-term winners are not electricity retailers. They are equipment suppliers positioned around current physical bottlenecks: gas turbines, grid infrastructure, transformers, switchgear, cooling systems and backup generation.
CHIPS AND MEMORY: WHERE THE SUPERNORMAL MARGINS SIT
The clearest profit pool in the entire AI stack.
Nvidia reported FY2026 revenue of $215.9 billion, up 65% year-on-year, with data-center revenue of $62.3 billion in the final quarter alone. Full-year GAAP gross margin was 71.1%, expanding to 75.0% in Q4 as Blackwell production matured. This is the inverse of the model layer: Nvidia sells a constrained physical product into urgent demand, with limited substitutes, long qualification cycles and deep software lock-in through CUDA (a software that lets developers run AI workloads efficiently on Nvidia chips).

Memory has become equally strategic. High-bandwidth memory (HBM) Â is a type of memory used in AI chips that allows data to be processed much faster than conventional memory. Manufacturing one unit of HBM requires forgoing production of roughly three units of conventional DRAM (standard memory used in most electronics), a zero-sum trade-off that has triggered the most severe memory shortage in the industryâs history.
SK Hynix, one of the worldâs leading producers of AI memory chips, reported Q1 2026 results that illustrate the consequence: revenue of 52.6 trillion won (~$37.9 billion), operating profit of 37.6 trillion won at a 72% operating margin, and net margin of 77%. The company reported that customer demand for HBM exceeds its supply capacity for the next three years. SK Groupâs chairman stated in March that the global chip wafer shortage is likely to persist until 2030.
TSMC completes the upstream picture, being the worldâs leading manufacturer of advanced semiconductor chips. Q1 2026 gross margin hit 66.2%, with advanced nodes (3nm and 5nm) accounting for 74% of wafer revenue. Its advanced packaging capacity, Â required to assemble high-performance AI chips, remains sold out through 2027. When you manufacture approximately 90% of the worldâs most advanced AI chips, customers accept premium pricing because there is no viable alternative.
The downstream consequence of upstream scarcity is stark. DRAM contract prices surged 90â95% quarter-on-quarter in Q1 2026, the largest quarterly increase on record. Smartphone memory costs have risen comparably. Every buyer of an electronic device is effectively cross-subsidizing the AI build-out.
DATA CENTRES: SOFTWARE COMPANIES BECOME HEAVY INDUSTRY
The capital commitments in 2026 are without historical precedent.
The five largest U.S. cloud and AI infrastructure providers â Amazon, Alphabet, Microsoft, Meta and Oracle â have collectively committed to spending between $660 billion and $750 billion in capex this year, nearly doubling 2025 levels. Amazon alone guided $200 billion. Alphabet guided $175â185 billion. Meta guided $115â135 billion. Microsoft is tracking toward $120 billion or more.


Capital intensity has reached extraordinary levels: Metaâs capex-to-revenue ratio is now approximately 54%, and Oracleâs approaches 86%. Amazon is looking at negative free cash flow of up to $28 billion in 2026 on some analyst estimates, while Metaâs free cash flow is projected to drop nearly 90%.
This changes the character of hyperscalers. Historically, these businesses looked like extraordinary software platforms: high returns, asset-light distribution, enormous free cash flow. AI pushes them toward a utility-capex model. The margin profile depends less on software elegance and more on utilization: a GPU cluster that is fully rented out to demanding AI inference tasks can be attractive. A cluster stranded by weak demand, outcompeted by better chips or lower model prices becomes a depreciating asset with large, fixed costs.
As Thomas Tunguz, General Partner at Theory Ventures, mentions in his post, at current AI revenue run-rates, hyperscalers are spending roughly $12 for every $1 they earn from AI. A five-year payback at 60% gross margins requires AI revenue to grow 5x in 5 years; if accelerated chip cycles compress useful life to three years, the required growth rises to nearly 8x in 5 years. This is the future we need to buy to justify this spending (with no margin).
The key metric is no longer cloud revenue growth. Itâs cloud revenue growth per dollar of capex deployed.
MODELS: WHY LLMS STRUGGLE TO BECOME PROFITABLE
Revenue is real. Margin is not. But the path to margin is becoming clearer.
Foundation models occupy the hardest economic position in the stack because they face both capital intensity and price deflation simultaneously. A common misconception is that training is the primary expense. It isnât. Training is a periodic, amortizable cost. Inference, running the model every time a user sends a prompt, is continuous, and at scale it dwarfs training. DigitalOceanâs February 2026 survey found that 44% of organizations now allocate 76â100% of their AI budget to inference rather than training. The real cost problem is not building the model. Itâs serving it.
OpenAI confirmed $2 billion in monthly revenue in April 2026, implying a $24 billion annualized run-rate. Internal projections forecast a $14 billion loss for 2026 and cumulative losses of $44 billion through 2028, with profitability not expected until 2029. Anthropicâs trajectory is different in pace but similar in tension: annualized revenue surged from $1 billion in late 2024 to $30 billion by April 2026, the fastest ramp in enterprise software history. Claude Code alone reached a $2.5 billion run-rate within nine months of launch. Yet Anthropic has raised over $18 billion in funding, and neither has demonstrated sustained positive unit economics on inference.

Additionally, models face continual threats from formidable open-source models and even some application layer companies that are building their own LLMs to protect against future expected pricing pressure.
Using first principles approach, there seem to be three viable paths to profitability. Each requires a different kind of business.
Path 1: Cost deflation outpaces price deflation. Prices are under pressure as most application layer companies have orchestration layers that can switch between models in real-time. So as the cost of inference declines, there is pressure to drop prices to win more market share. We need to believe that the prices will eventually drop at a slower rate than the cost of inference. This is the semiconductor playbook: Mooreâs Law made chips cheaper, but manufacturers were profitable because their costs fell faster than their prices. The evidence is mixed. Inference costs per token are declining roughly 10x per year through hardware improvements, quantization, model routing and caching (See a16z graph below). But usage grows at least as fast, called the Jevons paradox (that increase the efficiency of a resource’s use leads to a rise, rather than a fall, in total consumption of that resource). Cheaper inference and improved model performance encourages more complex tasks to be completed. This path works only if labs resist the arms race toward ever-larger models, or if efficiency gains can structurally outrun usage growth. On its own, it is probably not sufficient.

Path 2: Vertical integration into applications. This is whatâs actually happening. OpenAI has acquired Windsurf (coding IDE, $3 billion), Hiro Finance (personal finance), and io Products (hardware), while launching Operator, Codex, Deep Research and a shopping feature. Anthropic built Claude Code, Computer Use, and a marketplace for enterprise tools. Theyâre moving up the stack because the margin on a resolved coding task, a financial plan, or an automated workflow is fundamentally higher than the margin on a million tokens. A model lab that owns the application captures the spread between what the task is worth to the customer and what it costs to run. A lab that only sells API tokens captures the spread between what the token costs and what itâs priced at, which is converging toward zero. The risk is that vertical integration puts labs in direct competition with the customers building on their APIs.
Path 3: Becoming infrastructure too embedded to replace. This is the AWS playbook. If a labâs model becomes the default runtime for thousands of enterprise workflows, and switching costs accumulate through fine-tuning, proprietary integrations and data, then pricing power comes not from the model itself but from the ecosystem around it. Anthropicâs Model Context Protocol, with 97 million installs and adoption by every major AI provider, is an early example. So is its customer concentration: over 1,000 companies now spend more than $1 million annually on Claude. Lock-in of this kind takes years to build, but once established, it supports durable pricing.
Path 1 alone is probably insufficient. Paths 2 and 3 are where the real margin story sits, and both require model labs to stop being model companies and become platform companies. Thatâs a different business with different economics, and itâs why the labs that survive the current cash-burn phase will look very different from the ones that started it. This can also cause tension and a problem for investors in the application that might be at risk of cannibalization.
Model providers are in the âUberizationâ phase, however, there seem to be paths to create durable moats for longer term profitability.Â
APPLICATIONS: SAAS MARGINS ARE NOT THE RIGHT BENCHMARK
AI reintroduces cost of goods sold.
Traditional SaaS enjoyed 80â90% gross margins because the marginal cost of serving each additional user was close to zero. AI changes that. Bessemer Venture Partnersâ 2026 pricing research found that AI-first companies typically operate at 50â60% gross margins, with the fastest-growing âsupernovaâ cohort averaging only about 25%. ICONIQ Capitalâs 2026 State of AI survey found that AI product builders now expect average gross margins of approximately 52%.

This is not just a margin compression story. It is a product design problem. Early AI products were built in a way that scales cost with usage, not value.
This is the transition from Phase 1 to Phase 2 of AI investing. Phase 1 rewarded brute force: differentiate at a single layer, typically the application layer, by bolting a large model onto a chat interface or lightweight feature, scaling usage, and absorbing the token burn. Phase 2 is different. It rewards companies that convert raw intelligence into sticky workflow value, where the AI surfaces exactly where the user already works, learns from every nudge or correction, exposes itself as modular building blocks for adjacent tasks and remembers context without spraying sensitive data across the open web. Miss any one of these and you have a demo, not a moat.
Critically, this Phase 2 dynamic favors incumbents in slow-moving, regulated industries more than many investors appreciate. It will be far easier for an established platform, one that already owns the workflow front-end and the data back-end, to integrate an off-the-shelf AI agent than for a disruptor to take market share from scratch. Companies with proprietary datasets in healthcare, financial services, industrial operations and legal workflows create self-reinforcing data flywheels: better models produce a superior user experience, which attracts more users, which generates more data, which improves the model further. The switching cost becomes the moat, not the model itself.

The case studies illustrate the Phase 1 problem. GitHub Copilot, priced at roughly $10 per user per month, was reportedly losing more than $20 per user per month for average subscribers as of mid-2023, with heavy users costing as much as $80. Subsequent model efficiency gains have narrowed but not eliminated the gap. Replit saw revenue surge from $2 million to $144 million ARR in a year, but gross margins sat below 10%, dipping negative during usage surges, before pricing changes lifted them into the 20â30% range.

Products built on subsidized model pricing, free chatbots, low-cost AI assistants, flat-rate creative tools, may face an existential moment when upstream costs normalize. If inference subsidies end, many consumer AI products either raise prices dramatically, degrade quality by switching to cheaper models, or shut down. The user base built on subsidized pricing may not convert at sustainable rates. It should be noted that several application layer companies are rumored to be developing their own in-house LLMs to be implemented if prices were to increase to an unsustainable level. Though its still unclear how they would maintain performance versus existing models.
The industry is also adapting. Ninety-two percent of AI software companies now use mixed pricing models combining subscriptions with usage fees. Intercom charges per AI-resolved ticket, linking revenue directly to value delivered. But Bessemer warns of a ârenewal cliffâ: in 2025, most enterprises adopted AI in âat-all-costsâ mode. As those contracts enter renewal in 2026, pricing must reflect actual value, not promise.
The implication is not that software disappears. It is that software loses the right to be valued as a single high-margin asset class.
This is now visible in public markets:
- Revenue multiples compressing structurally
- A widening gap between top-quartile and median companies
The market is repricing software from âscarce assetâ to âcompetitive layer.â
If AI inputs normalize to real costs rather than subsidized rates, many applications may need to charge multiples of their current prices to be sustainable. This will be existential to some companies.
TODAY: OWN SCARCITY, NOT HYPE
AI will almost certainly become a general-purpose technology. But the profit pool will not be evenly distributed.
The near-term winners are those who control physical scarcity: GPU accelerators, high-bandwidth memory, advanced foundry capacity, packaging, power equipment and energized data-center capacity. These businesses share a common trait: demand exceeds supply, substitutes are limited, qualification cycles are long, and pricing power is structural rather than cyclical. Nvidiaâs 75% Q4 gross margin, SK Hynixâs 72% operating margin and TSMCâs 66% gross margin are not accidents. They are the economic signature of bottleneck control.
The contested middle is hyperscale cloud and foundation models. Revenue growth is real, Anthropicâs trajectory from $1 billion to $30 billion in fifteen months is unprecedented, but return on invested capital is deeply uncertain. OpenAIâs projected $14 billion loss in 2026 and cumulative $44 billion in losses through 2028 illustrate the gap between demand and sustainable economics.
The weakest layer today is generic AI software without proprietary data, workflow lock-in or pricing power. A company that simply resells model output with a thin markup is writing an option on usage where the user is incentivized to exercise it.
TOMORROW: WHERE THE PROFITS SHIFT NEXT
Todayâs margin map is not permanent. The bottlenecks will move.
The upstream scarcity story, Nvidia, HBM, TSMC, is real but temporal. Over $700 billion in hyperscaler capex is building supply. New chip architectures are eroding Nvidiaâs monopoly position: Broadcom custom ASICs for Google and Anthropic, AMDâs MI300, Amazonâs Trainium, Googleâs TPUs. When supply catches demand, pricing power compresses. This has happened in every hardware cycle in history: DRAM, solar panels, fiber optic cable after the 2000s build-out. Supernormal margins attract supernormal capital, and supernormal capital eventually creates overcapacity.
Meanwhile, the model labs and application companies that survive the current cash-burn phase and successfully build workflow lock-in could become the next bottleneck. If your enterprise runs its compliance, coding, customer support and financial planning through a specific AI platform, with years of fine-tuning, proprietary data, and integrations that would take months to rebuild, that platform has pricing power regardless of what happens to GPU prices. The switching cost becomes the moat, not the model itself.
This creates a dynamic that matters for venture capital. The current moment, where hardware captures supernormal margins and software burns cash, may actually represent an optimal entry point for software-layer investments, provided you back companies building genuine lock-in rather than thin API wrappers. The historical parallel is cloud infrastructure in 2008â2012: AWS was a capital drain for Amazon for years, but it became the highest-margin business in technology. The investors who entered during the cash-burn phase, when the economics looked worse on paper, captured the largest returns.

The question is not whether AI will create value. It is whether the value accrues to the companies spending the capital today, or to the next generation of entrants who inherit cheaper infrastructure and build on top of it. Both outcomes have historical precedent. Railways built wealth, but mostly for the companies that used them, not the ones that laid the track. Cloud computing built wealth, but AWS, the track-layer, was the exception that proved the rule, because it achieved infrastructure lock-in that made switching prohibitively expensive.
The AI supply chain will follow one of these patterns. The investment discipline is to understand which layer youâre in, whether the scarcity is temporary or structural, and whether the company youâre backing is building the kind of lock-in that survives the transition from scarcity to abundance. Own the bottleneck today if you can. But position for where the bottleneck moves tomorrow.
Sources
- Andreessen Horowitz. “LLMflation: LLM Inference Cost.” a16z.com. https://a16z.com/llmflation-llm-inference-cost/
- “The Execution Era of AI: 5 Key Takeaways from ICONIQ’s State of AI Report.” saastr.com. https://www.saastr.com/the-execution-era-of-ai-5-key-takeaways-from-iconiqs-state-of-ai-report/
- ICONIQ Capital. “2026 State of AI: Bi-Annual Snapshot.” iconiq.com. https://www.iconiq.com/growth/reports/2026-state-of-ai-bi-annual-snapshot
- Dediu, Horace. “The Most Brilliant Move in Corporate History?” Asymco.
- Bank of America. Hyperscaler Capex Estimates.
- “Big Tech’s AI Bond Binge.”
- “Google, Meta, & Oracle’s $1 Trillion Borrowing Spree.”
- “Alphabet Plans Tech’s First 100-Year Bond Since Dot-Com Era.”
- Hyperscaler Depreciation Policies (various filings).
- International Energy Agency. Key Questions on Energy and AI, April 2026; Energy and AI, April 2025.
- NVIDIA Corporation. FY2026 Annual Results, February 2026.
- SK Hynix. Q1 2026 Earnings Release, April 22, 2026.
- Q1 2026 Earnings Release, April 2026.
- Q1 2026 DRAM/NAND Contract Price Data.
- CreditSights; Futurum Group; Epoch AI. Hyperscaler Capex 2026 Estimates, JanuaryâFebruary 2026.
- “AI Inference vs Training.” Currents Research, February 2026.
- Financial disclosures via Reuters, The Information, Sacra, JanuaryâApril 2026.
- Revenue disclosures via SaaStr, Sacra, Bloomberg, FebruaryâApril 2026. Anthropic Blog, April 7, 2026.
- Bessemer Venture Partners. The AI Pricing and Monetization Playbook, February 2026. https://www.bvp.com/atlas/the-ai-pricing-and-monetization-playbook
- ICONIQ Capital. 2026 State of AI.
- Tunguz, Tomasz. tomtunguz.com. https://tomtunguz.com/
- Marcho Partners LLP. “Software Is Dead, Long Live Software.” Marcho Partners Research, April 2026.
- “The Economics of AI-First B2B SaaS in 2026.” getmonetizely.com, October 21, 2025. https://www.getmonetizely.com/blogs/the-economics-of-ai-first-b2b-saas-in-2026
- Bessemer Venture Partners. “The State of AI 2025.” Atlas, August 13, 2025. https://www.bvp.com/atlas/the-state-of-ai-2025