25 feb 2026
8 min leer
From Pilot to Production: What Goldman Sachs, Salesforce, and OpenAI's New Alliance Reveal About Enterprise AI Agents in 2026

There's a persistent narrative in enterprise AI that agents are "almost ready" for production. That they need another year of development, another round of model improvements, another quarter of piloting before they can handle real business workflows.
That narrative died this month.
Goldman Sachs is running autonomous agents for transaction reconciliation. Salesforce restructured their entire support organization around agentic AI. Cisco launched production-grade agent infrastructure. And on February 23, OpenAI announced multiyear deals with McKinsey, BCG, Accenture, and Capgemini to push its Frontier agent platform into enterprises at scale.
The signal is unmistakable. The bottleneck is no longer "can AI agents do the work." It's "can your organization deploy them." Here's what the companies that have already moved are learning.
The OpenAI Frontier Alliance: Deployment Is the New Bottleneck
OpenAI's announcement on February 23 is the clearest signal yet that enterprise AI agents have shifted from a technology problem to a deployment problem.
The Frontier Alliance pairs four of the world's largest consulting firms with OpenAI's agent platform. McKinsey and BCG handle strategy, operating model design, and change management. Accenture and Capgemini handle technical implementation, system integration, and lifecycle support. Each firm is building dedicated certified practice groups.
The structure tells you everything about where the industry is. OpenAI didn't need more model capability. It needed an integration and change management layer. The fact that the world's leading AI lab is calling in consultants to handle deployment confirms what enterprises have been discovering on their own: getting AI agents into production requires organizational change, not just technical capability.
Confirmed Frontier platform users already include HP, Intuit, Oracle, State Farm, Thermo Fisher Scientific, and Uber, with BBVA, Cisco, and T-Mobile in active pilots. This isn't experimental. These are production deployments at companies processing millions of transactions daily.
Goldman Sachs: AI Agents in Financial Operations
Goldman Sachs deployed autonomous agents built on Anthropic's Claude for two of their highest-stakes workflows: transaction reconciliation and client onboarding.
Transaction reconciliation, the process of matching and verifying financial records across systems, is exactly the kind of work where AI agents excel. It's rule-based but complex, involving high data volume, requiring cross-referencing multiple sources, and demanding accuracy that human teams struggle to maintain consistently at scale.
The agents don't just flag discrepancies. They investigate them. When a reconciliation mismatch appears, the agent traces the transaction across systems, identifies the root cause, categorizes the issue type, and either resolves it automatically or routes it to the appropriate human reviewer with full context. The result is faster resolution, fewer manual touches, and consistent audit trails.
Client onboarding follows a similar pattern. The agents handle document collection, KYC verification, data extraction, and compliance checks. A process that previously required multiple handoffs between teams and days of elapsed time now runs as a continuous agentic workflow with human oversight at decision points.
The takeaway: Goldman didn't start with experimental use cases. They deployed agents on core financial operations where accuracy and compliance are non-negotiable, and the agents performed.
Salesforce: From 9,000 Support Staff to 3,000
Marc Benioff's disclosure that Salesforce reduced their customer support headcount from 9,000 to approximately 3,000 using agentic AI was one of the most concrete workforce-impact numbers any major enterprise has shared publicly.
The reduction wasn't a layoff story. It was a restructuring around a fundamentally different operating model. Salesforce's AI agents handle first-line customer inquiries, case classification, knowledge base retrieval, and resolution of common issues. The remaining human agents handle complex escalations, relationship management, and cases requiring judgment that agents can't provide.
This restructuring aligns with what Gartner reported on February 18: 91% of customer service leaders are under executive pressure to implement AI in 2026. But Gartner also found that 50% of companies that cut service staff due to AI will rehire by 2027, just under different job titles. Only 20% of service leaders have actually reduced headcount so far. The rest are restructuring roles, not eliminating them.
What makes Salesforce's case significant for other enterprises isn't the headcount number. It's the proof that AI agents can handle first-line work at enterprise scale, across thousands of customer interactions daily, with sufficient quality that the organization was comfortable restructuring around them. And that the new roles, focused on complex problem-solving and relationship management, are arguably higher-value than the ticket-processing work they replaced.
Cisco and Fujitsu: Agent Infrastructure at Scale
Cisco and Fujitsu represent a different angle of the production story: building AI agent infrastructure designed for enterprise-grade deployment from the start.
At their Live Conference in Amsterdam earlier this month, Cisco launched agentic AI innovations across network operations, IT service management, and security operations. These are domains where autonomous agents can monitor systems, detect anomalies, diagnose issues, and take corrective action without waiting for a human operator.
Network operations is a particularly strong fit for AI agents. Networks generate massive volumes of telemetry data, require fast response to issues, and follow established playbooks for common problems. An agent monitoring network health can detect a degradation pattern, correlate it with recent changes, identify the likely cause, and execute the remediation before a human operator finishes triaging the alert.
Fujitsu's approach tackles supply chain resilience through multi-agent coordination. Their platform, launched in February, deploys specialized agents for demand forecasting, supplier risk monitoring, logistics optimization, and inventory management. These agents operate as a coordinated system: when a supplier signals potential delays, the risk monitoring agent alerts the logistics agent, which recalculates delivery schedules, which triggers the inventory agent to adjust safety stock, which informs the demand forecasting agent to update delivery commitments. The entire cascade happens in minutes rather than the days it takes when each function operates in silos.
The Supporting Cast: Infosys, Rackspace, Typewise
Goldman, Salesforce, Cisco, and Fujitsu aren't outliers. The production deployment pattern is spreading across industries and company sizes.
Infosys announced a partnership with Anthropic at the India AI Summit on February 22 to deliver enterprise AI solutions across telecommunications, financial services, manufacturing, and software development. Rackspace partnered with Palantir to deploy Foundry and AIP in governed production environments, starting with 30 Palantir-trained engineers and scaling to 250+ over the next 12 months.
On the product side, Typewise launched their AI Supervisor Engine on February 23, a multi-agent orchestration layer for enterprise customer service. It uses a reasoning model as supervisor, coordinating multiple autonomous agents via natural language configuration, with a dedicated "Review AI agent" that checks outputs against protocols.
Typewise's launch included a data point worth noting: only 1 in 10 agentic AI pilots currently make it to live production environments. That conversion rate is the central problem the entire industry is trying to solve, and it underscores why OpenAI built an alliance specifically focused on the deployment gap.
Five Patterns from Companies That Made It to Production
Across these deployments, five patterns separate companies that reached production from those stuck in perpetual piloting.
1. They Started with High-Volume, Rule-Based Workflows
None of these companies started with open-ended, creative tasks. Goldman deployed on transaction reconciliation. Salesforce on first-line support triage. Cisco on network monitoring. Fujitsu on supply chain logistics. Every successful deployment targeted workflows that are high volume, follow established rules, and generate measurable outputs. That's the sweet spot.
2. They Kept Humans at Decision Points
Every deployment maintains human-AI collaboration at critical junctures. Goldman's agents route complex discrepancies to human reviewers. Salesforce's agents escalate cases requiring judgment. Cisco's agents execute playbooks but flag novel situations. The agents handle volume. The humans handle judgment.
3. They Measured Business Outcomes, Not AI Metrics
None of these companies talk about model accuracy scores or benchmark performance. They report on time-to-resolution, cost-per-transaction, cases handled per agent, and throughput improvements. The metrics are business metrics, not AI metrics.
4. They Built Platform Infrastructure, Not Point Solutions
Goldman built an agentic platform, not a single automation. Salesforce restructured their support model, not just one workflow. Cisco built enterprise-grade agent infrastructure. OpenAI assembled a four-firm alliance for deployment support. The upfront investment is higher, but the platform approach scales across dozens of use cases.
5. They Moved Despite Imperfection
None of these agents are perfect. They make mistakes, require oversight, and have limitations. The companies deployed them anyway because the alternative, human teams executing every step manually, is slower, more expensive, and also imperfect. The comparison isn't "agent vs. perfect." It's "agent with human oversight vs. human alone."
The Conversion Rate Problem
The 1-in-10 pilot-to-production conversion rate reported by Typewise is the number every enterprise leader should focus on. It means 90% of agentic AI projects stall between proof-of-concept and production deployment.
This aligns with Gartner's prediction that over 40% of agentic AI projects will be canceled by end of 2027. The failures won't be technology failures. They'll be deployment failures: inadequate change management, unclear business cases, missing governance frameworks, and organizational resistance.
OpenAI building a consulting alliance for deployment, not for model development, confirms where the industry's real challenge sits. The models work. The question is whether your organization can absorb the change that production deployment requires.
What This Means for Your Enterprise
The proof points are no longer theoretical. Goldman Sachs, Salesforce, Cisco, Fujitsu, Infosys, Rackspace, and now OpenAI's entire alliance network represent financial services, technology, networking, manufacturing, consulting, and telecommunications. They've demonstrated that AI agents work in production across different industries, different use cases, and different organizational structures.
The question has shifted from "Can AI agents handle enterprise workflows?" to "When will your enterprise deploy them?" With worldwide AI spending projected to hit $2.52 trillion in 2026 (a 44% increase year-over-year), the investment is flowing. The companies that convert that investment into production deployments will compound their advantages in cost, speed, and quality. The ones stuck in perpetual piloting will fall behind.
These companies had the same concerns you have about accuracy, governance, change management, and ROI. They moved anyway. Their results validate the decision.





