Book a demo

Our product

How it works

Resources

About us

Book a demo

→

Benedict Evans’ 2025 AI Deck — What It Actually Means for Enterprises

Insight

→

Benedict Evans’ 2025 AI Deck — What It Actually Means for Enterprises

Insight

→

Benedict Evans’ 2025 AI Deck — What It Actually Means for Enterprises

Insight

→

Benedict Evans’ 2025 AI Deck — What It Actually Means for Enterprises

Insight

Benedict Evans’ 2025 AI Deck — What It Actually Means for Enterprises

AI won’t replace your company. It will remove steps between the 4–500 apps you already use.

Published

November 25, 2025

Kristian Luoma

, Co-founder

kristian@in-parallel.com

Linkedin ↗

Contents

Invest Now (Even If You “Pre-Build”)

From Experiments to Daily Work

Invest Now (Even If You “Pre-Build”)

From Experiments to Daily Work

Evans argues that AI’s value comes from removing steps across existing enterprise workflows, not adding new apps. Companies that invest now in narrow, measurable automations will build the capabilities and data advantage others will lack later. Read the full report here.

Why Listen to Benedict Evans?

New Platforms, Fewer Steps

Mainframes → PCs → SaaS exploded the number of apps. AI flips the question from “what new app do we build?” to “what step can we remove?”

For enterprises, that means:

Treat LLMs as cross-app operators
Read → reason → validate → write → log, across tools.
Aim at swivel-chair work
Any task that forces humans to copy/paste between CRM, ITSM, ERP, email, spreadsheets.
Own the interface, swap the brains
Keep models interchangeable behind one orchestration layer you control.
Make knowledge a versioned asset
Version your corpora, scope retrieval tightly, and require the system to either cite its sources or refuse.

The “new platform” isn’t a chatbot. It’s a thin intelligence layer across your existing estate.

Invest Now (Even If You “Pre-Build”)

Evans’ implicit warning to large companies: waiting is also a strategy—just usually a bad one.

How to read this as a corporate policy:

Waiting taxes you twice
You lose automation gains now and arrive late to the learning curve (guardrails, metrics, playbooks).
Early investment pre-builds capability
Even if you expect another model wave in 12–24 months, you’re banking:
- Real usage data and error patterns
- Evaluation suites and safety rails
- Internal talent that understands how to ship AI safely

Fund AI like strategy, not as a demo budget Treat AI as a multi-year capability bet (like CRM or cloud), not a stack of pilots chasing headlines.

Why Usage Still Lags

Evans points out a puzzle: consumer and enterprise AI both show massive top-of-funnel curiosity, but shallow habit.

Why corporate rollouts stall:

Chat-first UX
Great for a conference demo; weak for repeatable daily work. People don’t wake up wanting “chat”—they want “this ticket closed faster.”
Metric theater
Tokens, prompts, and “active users” don’t move the board. CFOs care about cycle time, error rates, and cost per resolved task.
Shadow knowledge
Dumping “all our docs” into retrieval leads to leaks, hallucinations, and compliance headaches.
Vendor glue risk
Over-coupling to a single stack makes every model upgrade a breaking change project.

If you don’t design for habit and governance from day one, AI remains a sideshow.

Why Usage Still Lags

Evans points out a puzzle: consumer and enterprise AI both show massive top-of-funnel curiosity, but shallow habit.

Why corporate rollouts stall:

Chat-first UX
Great for a conference demo; weak for repeatable daily work. People don’t wake up wanting “chat”—they want “this ticket closed faster.”
Metric theater
Tokens, prompts, and “active users” don’t move the board. CFOs care about cycle time, error rates, and cost per resolved task.
Shadow knowledge
Dumping “all our docs” into retrieval leads to leaks, hallucinations, and compliance headaches.
Vendor glue risk
Over-coupling to a single stack makes every model upgrade a breaking change project.

If you don’t design for habit and governance from day one, AI remains a sideshow.

From Experiments to Daily Work

Principle: make the safe path the fast path.

1) Ship skinny, production-grade flows

Pick one narrow job.
Use deterministic retrieval and strict validators before write-back.
Log every query, response, decision, and override.

If confidence is low or the blast radius is high, the system should refuse and escalate with evidence.

2) Instrument like a growth team

Track metrics that matter to executives:

Time-to-first-answer
% of answers grounded in approved sources
Human edit rate
Deflection rate + CSAT for support
SLA adherence
Cost per resolved task

Publish weekly deltas. Kill or rework flows that don’t pay their way.

3) Assemble, don’t over-platform

Your stack should separate concerns:

Orchestration & policy (your code): tools, routing, approvals, rate limits.
Pluggable cognition: at least two external model providers + one local/edge path.
Authoritative context: versioned corpora, scoped retrieval, per-field provenance.
Validation: schema checks, reference matching, policy enforcement.

You’re building a control plane for intelligence, not another monolith.

4) Aim at high-leverage loops

Start where Evans’ “platform shift” logic bites hardest: repeating, information-heavy workflows with clear truth.

Examples:

Support
Classify → retrieve → draft → validate entitlements → escalate with cited evidence.
CRM & sales ops
Summarize calls, extract next steps, update fields, draft follow-ups for approval.
Finance operations
Parse invoices, run 3-way match, flag variances with plain-language explanations.
Engineering
PR summaries, test scaffolds, incident timelines, runbook suggestions.

Policy & compliance Cited answers, exception flags, full audit trails for every AI-assisted action.

From Experiments to Daily Work

Principle: make the safe path the fast path.

1) Ship skinny, production-grade flows

Pick one narrow job.
Use deterministic retrieval and strict validators before write-back.
Log every query, response, decision, and override.

If confidence is low or the blast radius is high, the system should refuse and escalate with evidence.

2) Instrument like a growth team

Track metrics that matter to executives:

Time-to-first-answer
% of answers grounded in approved sources
Human edit rate
Deflection rate + CSAT for support
SLA adherence
Cost per resolved task

Publish weekly deltas. Kill or rework flows that don’t pay their way.

3) Assemble, don’t over-platform

Your stack should separate concerns:

Orchestration & policy (your code): tools, routing, approvals, rate limits.
Pluggable cognition: at least two external model providers + one local/edge path.
Authoritative context: versioned corpora, scoped retrieval, per-field provenance.
Validation: schema checks, reference matching, policy enforcement.

You’re building a control plane for intelligence, not another monolith.

4) Aim at high-leverage loops

Start where Evans’ “platform shift” logic bites hardest: repeating, information-heavy workflows with clear truth.

Examples:

Support
Classify → retrieve → draft → validate entitlements → escalate with cited evidence.
CRM & sales ops
Summarize calls, extract next steps, update fields, draft follow-ups for approval.
Finance operations
Parse invoices, run 3-way match, flag variances with plain-language explanations.
Engineering
PR summaries, test scaffolds, incident timelines, runbook suggestions.

Policy & compliance Cited answers, exception flags, full audit trails for every AI-assisted action.

What to Build in 90 Days

You don’t need a “five-year AI vision deck.” You need one credible 90-day plan.

Weeks 1–2: Choose and baseline

Pick one revenue surface and one cost surface with:
- Repeatable work
- Clear definition of “done”
- Strong ground truth (systems of record)
Baseline:
- Minutes per task
- Error cost
- Backlog and SLA performance

Weeks 3–6: Ship the skinny version

Constrain sources aggressively; add validators before anything touches core systems.
Roll to 10–50 real users.
Capture human edits as training and evaluation signals.

Weeks 7–10: Stabilize and prove ROI

Cut variance; tighten retrieval scopes; cap context window size.
Cache frequent queries.
Share weekly cycle-time, accuracy, and error reductions with leadership.

Weeks 11–13: Productize

Replace chat with one-click actions inside existing tools (ticket sidebar, CRM panel, email plug-in).
Add admin controls for scopes, thresholds, approvals, and audits.
Run your first controlled model upgrade and document the impact end-to-end.

The Takeaway

Evans’ decks are a pattern: he shows that platforms win by changing how work gets done, not by adding more screens.

Translate that to AI:

LLMs are workflow compressors, not another department.
Use them to remove steps across your SaaS estate—not to create a new island.
Measure behavior change and business outcomes, not tokens.
Quietly stack “boring” advantages: scoped knowledge, evals, guardrails, and model optionality.

The companies that start now won’t just have better tooling in three years. They’ll have better workflows, cleaner data, and an organisation that already knows how to ship AI safely—while everyone else is still rewriting their first pilot.

References

https://www.ben-evans.com/presentations

Share on LinkedIn

All Articles

All

All Articles

All

All Articles

All

All Articles

All

All Articles

All