State of AI 2025 Report

14 Oct

Written By GAPx

What this means for your digital transformation

Air Street’s latest State of AI report (https://www.stateof.ai/) argues that outcomes now depend less on owning “the biggest model” and more on building reliable systems around AI: routing tasks to the cheapest capable model; making your facts machine-readable so AI answer engines can find and cite you; and planning for real-world constraints like power and water that will affect price and availability. In short: keep it simple, measure results, and design for change.

The big shift: from one clever model to a sensible system

For years, strategy AI conversations fixated on “what’s the best model?” The report’s message is simpler: don’t overbuy. Many day-to-day tasks can be handled by small, cheaper models, while only the tricky, high-stakes work should “escalate” to a top model. Studies in the report show you can often shift 40 - 70% of calls to small models without losing quality, cutting both cost and latency. Build your systems so you can switch models easily as prices and performance evolve.

This matters because the capability-to-cost curve is moving faster than most budgets: model capability is rising while prices fall, with independent trackers showing months-not-years doubling times in value for money. Lock-in is expensive; a routing mindset pays back.

What to do:

Start “small-first, escalate when needed” on two or three workflows (e.g., customer emails, FAQ answers, first-line IT or HR responses). Track cost per successful answer, not just accuracy.
Architect for heterogeneity (multiple providers, open + closed options). Treat model choice as a replaceable part, not a platform marriage.

Your new front door: AI answer engines

The report shows AI “answer engines” (ChatGPT, Gemini, Perplexity and others) are becoming a major discovery channel. People have longer sessions and make more considered decisions inside these tools than in traditional search, and early retail data suggests AI referrals convert unusually well (circa 11%, up ~5 percentage points year-on-year). That is comparable to paid search in many categories.

Two important points follow. First, to be visible to answer engines you need Answer Engine Optimisation (AEO): structured data, clean citations, and small authenticated APIs that provide canonical facts (prices, delivery windows, returns, certifications). Secondly, most answer engines still rely heavily on Google’s index, so your classic SEO hygiene still matters, but AEO adds a layer designed for extraction and citation, not just ranking.

What to do:

Publish machine-readable product and policy schemas and a mini API for canonical facts. This helps answer engines quote you correctly and powers agent-driven checkout flows the report highlights.
Refresh key help pages to be citation-friendly (clear sources, updated dates, concise summaries). You’re writing for humans and for models.

Making AI dependable: standard “plugs” and guardrails

If you want AI to do real work - look up an order, create a ticket, check a price, schedule a delivery - it must call your systems safely. The report notes that the Model Context Protocol (MCP) is rapidly becoming the “USB-C of AI” for this: a common way for different AIs to plug into tools, with governance and registries emerging as it scales. That means less bespoke wiring and easier model swaps in future.

Agent reliability also improves when you treat work as a simple graph of steps (plan → tool call → check → memory update) rather than a single prompt. This isn’t flashy, but it’s how you go from a demo to a dependable workflow.

What to do:

Wrap one or two internal tools (e.g., “create/close ticket”, “order lookup”) behind MCP. Add a “checks-before-action” step for risky operations. Log every step for audit.
Define memory rules (what the system should remember, consolidate, or forget). This keeps agents consistent without leaking sensitive info.

Reality check: power, water, and permitting will shape the road ahead

AI isn’t just software. It runs on electricity and, at scale, uses water for cooling. The report flags a growing gap between AI demand and grid capacity in major markets, with projections of shortfalls and price volatility. It also highlights creative carbon accounting in the industry and the shift to siting data centres in regions with easier permits and better energy economics. This affects latency, price, and SLAs for everyone further up the stack.

For EU businesses, this isn’t cause for alarm, just plan like you do for any constrained resource. Build fallbacks (cache popular answers; keep a small on-device model for basics), and ask suppliers for clear disclosures on energy/water assumptions. It’s good risk management and good stakeholder communication.

Open vs closed models: stop the culture war, use both

The report’s snapshot is straightforward: closed models still lead at the very frontier, while open-weight models have become strong, especially from China, making open options credible for many tasks. Most organisations will sensibly run a hybrid: open where you want control/cost advantages, closed for the most demanding or regulated workloads. The key is proper evaluation: some headline “reasoning breakthroughs” shrink or vanish under multi-seed, standardised tests. Evaluate like an engineer, not a press office.

What to do:

Write down house rules: which data can leave your boundary, which can’t; when to require human review; what “good enough” looks like per task.
Re-test quarterly. Prices and capabilities are moving; treat models like tariffs you renegotiate.

Culture, data, and change: keep it human and boring

A thread running through the report is that the organisations making real progress have done the boring things well:
Data plumbing: a small, accurate source of truth for prices, stock, policies - exposed in consistent schemas that both people and AI can use. This underpins service quality and discoverability.
Evaluation habits: golden test sets, routine A/Bs, and cost+quality dashboards. Beware headline gains that disappear under more rigorous testing.
Change management: explain what the tools will and won’t do, where humans stay in the loop, and how you’ll measure impact. Confidence rises when people can see the checks and the numbers.
These moves aren’t flashy, and that’s the point. They make AI useful and predictable.

The bottom line for boards

If you remember one thing from State of AI 2025, make it this: system design beats model worship. Route work to the cheapest capable model; make your facts legible for answer engines and agents; standardise the plug (MCP) so you can change models without tearing out wiring; and plan for real-world constraints on power and capacity. Do these well and you’ll move faster, at lower cost, with less risk.

GAPx

State of AI 2025 Report

What this means for your digital transformation

The big shift: from one clever model to a sensible system

Your new front door: AI answer engines

Making AI dependable: standard “plugs” and guardrails

Reality check: power, water, and permitting will shape the road ahead

Open vs closed models: stop the culture war, use both

Culture, data, and change: keep it human and boring

The bottom line for boards

Change the Game: Knowledge Ops for AI Authority and Enterprise Trust